Istio Ingress resulting in "no healthy upstream"

Although this is a somewhat general error resulting from a routing issue within an improper Istio setup, I will provide a general solution/piece of advice to anyone coming across the same issue.

In my case the issue was due to incorrect route rule configuration, the Kubernetes native services were functioning however the Istio routing rules were incorrectly configured so Istio could not route from the ingress into the service.


Just in case, like me, you get curious... Even though in my scenario it was clear the case of the error...

Error cause: I had two versions of the same service (v1 and v2), and an Istio VirtualService configured with traffic route destination using weights. Then, 95% goes to v1 and 5% goes to v2. As I didn't have the v1 deployed (yet), of course, the error "503 - no healthy upstream" shows up 95% of the requests.

Ok, even so, I knew the problem and how to fix it (just deploy v1), I was wondering... But, how can I have more information about this error? How could I get a deeper analysis of this error to find out what was happening?

This is a way of investigating using the configuration command line utility of Istio, the istioctl:

# 1) Check the proxies status -->
  $ istioctl proxy-status
# Result -->
  NAME                                                   CDS        LDS        EDS        RDS          PILOT                       VERSION
  ...
  teachstore-course-v1-74f965bd84-8lmnf.development      SYNCED     SYNCED     SYNCED     SYNCED       istiod-86798869b8-bqw7c     1.5.0
  ...
  ...

# 2) Get the name outbound from JSON result using the proxy (service with the problem) -->
  $ istioctl proxy-config cluster teachstore-course-v1-74f965bd84-8lmnf.development --fqdn teachstore-student.development.svc.cluster.local -o json
# 2) If you have jq install locally (only what we need, already extracted) -->
  $ istioctl proxy-config cluster teachstore-course-v1-74f965bd84-8lmnf.development --fqdn teachstore-course.development.svc.cluster.local -o json | jq -r .[].name
# Result -->
  outbound|80||teachstore-course.development.svc.cluster.local
  inbound|80|9180-tcp|teachstore-course.development.svc.cluster.local
  outbound|80|v1|teachstore-course.development.svc.cluster.local
  outbound|80|v2|teachstore-course.development.svc.cluster.local

# 3) Check the endpoints of "outbound|80|v2|teachstore-course..." using v1 proxy -->
  $ istioctl proxy-config endpoints teachstore-course-v1-74f965bd84-8lmnf.development --cluster "outbound|80|v2|teachstore-course.development.svc.cluster.local"
# Result (the v2, 5% of the traffic route is ok, there are healthy targets) -->
  ENDPOINT             STATUS      OUTLIER CHECK     CLUSTER
  172.17.0.28:9180     HEALTHY     OK                outbound|80|v2|teachstore-course.development.svc.cluster.local
  172.17.0.29:9180     HEALTHY     OK                outbound|80|v2|teachstore-course.development.svc.cluster.local

# 4) However, for the v1 version "outbound|80|v1|teachstore-course..." -->
$ istioctl proxy-config endpoints teachstore-course-v1-74f965bd84-8lmnf.development --cluster "outbound|80|v1|teachstore-course.development.svc.cluster.local"
  ENDPOINT             STATUS      OUTLIER CHECK     CLUSTER
# Nothing! Emtpy, no Pods, that's explain the "no healthy upstream" 95% of time.