How to read client IP addresses from HTTP requests behind Kubernetes services?

You can get kube-proxy out of the loop entirely in 2 ways:

  1. Use an Ingress to configure your nginx to balance based on source ip and send traffic straight to your endpoint (https://github.com/kubernetes/contrib/tree/master/ingress/controllers#ingress-controllers)

  2. Deploy the haproxy serviceloadbalancer(https://github.com/kubernetes/contrib/blob/master/service-loadbalancer/service_loadbalancer.go#L51) and set the balance annotation on the serivce so it uses "source".


Right now, no.

Services use kube_proxy to distribute traffic to their backends. Kube-proxy uses iptables to route the service IP to a local port where it is listening, and then opens up a new connection to one of the backends. The internal IP you are seeing is the IP:port of kube-proxy running on one of your nodes.

An iptables only kube-proxy is in the works. That would preserve the original source IP.


As of 1.5, if you are running in GCE (by extension GKE) or AWS, you simply need to add an annotation to your Service to make HTTP source preservation work.

...
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/external-traffic: OnlyLocal
...

It basically exposes the service directly via nodeports instead of providing a proxy--by exposing a health probe on each node, the load balancer can determine which nodes to route traffic to.

In 1.7, this config has become GA, so you can set "externalTrafficPolicy": "Local" on your Service spec.

Click here to learn more


externalTrafficPolicy: Local

Is a setting you can specify in the yaml of Kubernetes Services of type Load Balancer or type NodePort. (Ingress Controllers usually include yaml to provision LB services.)

externalTrafficPolicy: Local

Does 3 things:
1. Disables SNAT so that instead of ingress controller pod seeing source IP as the IP of a Kubernetes Node it’s supposed to see the real source IP.
2. Gets rid of an extra network hop by adding 2 rules:
-if traffic lands on nodeport of node with no ingress pods it’s dropped.
-if traffic lands on nodeport of node with ingress pods it’s forwarded to pod on the same node.
3. Updates Cloud Load Balancer’s HealthCheck with a /healthz endpoint that’s supposed to make it so the LB won’t forward to nodes were it would have been dropped, and only forward to nodes with ingress pods.
(Rephrasing for clarification sake: by default aka "externalTrafficPolicy: Cluster", traffic gets loadbalanced between the NodePorts of every worker node. "externalTrafficPolicy: Local" allows traffic to only be sent to the subset of nodes that have Ingress Controller Pods running on them. So if you have a 100 node cluster, instead of the cloud load balancer sending traffic to 97 nodes, it'll only send it to the ~3-5 nodes that are running Ingress Controller Pods)


Important Note!:
"externalTrafficPolicy: Local" is not supported on AWS.
(It supposedly works fine on GCP and Azure, that being said I also recall reading that there was a regression that broke it in a minor version of Kubernetes 1.14 + there were some versions of Cilium CNI were it breaks as well, so be aware that the default externalTrafficPolicy: Cluster is rock solid stable and should usually be preferred if you don't need the functionality. also be aware that if you have a WAF as a Service in front of it anyways then you may be able to leverage that to see where client traffic is coming from.)

(It causes issues with kops and EKS, other distros running on AWS might be unaffected actually, more on that below.)

"externalTrafficPolicy: Local" not being supported on AWS is an issue that's known by the Kubernetes maintainers, but not well documented. Also, an annoying thing is that if you try it you'll have some luck with it/it'll appear to be working and this tricks enough people into thinking it works.

externalTrafficPolicy: Local is broken in 2 ways on AWS and both breaks have workarounds to force it to work:
1st break + workaround: /healthz endpoint initial creation is flaky + reconciliation loop logic is broken.
Upon initial apply it’ll work for some nodes not others and then never get updated.
https://github.com/kubernetes/kubernetes/issues/80579
^describes the problem in more detail.
https://github.com/kubernetes/kubernetes/issues/61486
^describes a workaround to force it to work using a kops hook
(When you solve the /healthz endpoint reconciliation loop logic you unlock benefits #2 and #3 less hop + LB only sending traffic to subset of worker nodes. but, benefit #1 source IP still won’t be right.)

2nd break + 2 workaround options:
The desired end result is an ingress pod seeing true client’s IP.
But what really happened is ingress pod shifted from seeing source IP of k8s node, to seeing the source IP of the Classic ELB.

workaround option 1.)

Switch to network LB (L4 LB) that works more like Azure LB. This comes at the cost of not being able to use ACM (AWS Certificate Manager) to terminate TLS at the AWS LB / handle TLS cert provisioning and rotation for you.

workaround option 2.)
Keep using AWS classic ELB, (and you get to keep using ACM), you’ll just need to add configuration to both the classic ELB (in the form of annotation of the LB service) + add configuration to the ingress controller. So that both use proxy protocol or x forward headers, I recall another Stack Overflow post covering this, so I won't repeat it here.