Why an iptables NAT does not happen in the network namespace separated transparent proxy setup?

I have to assume that the transparent proxy is acting as a router, at least for ICMP, so will route back the ICMP echo where it came from (veth0).

Finding the problem

When reproducing your setup and witnessing your problem, I added a TRACE on the host using iptables (legacy, which might have slight differences with iptables-nft's version) like this (I also forced the creation of the filter table (iptables -S) to have it in the traces):

iptables -t raw -A PREROUTING -j TRACE

And a single ping shows in kernel logs (hint, if host isn't the actual initial host: sysctl -w net.netfilter.nf_log_all_netns=1):

TRACE: raw:PREROUTING:policy:2 IN=client-veth0 OUT= MAC=66:f2:08:79:d0:df:be:1e:05:c1:c1:4b:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=7200 DF PROTO=ICMP TYPE=8 CODE=0 ID=3508 SEQ=1 
TRACE: nat:PREROUTING:policy:1 IN=client-veth0 OUT= MAC=66:f2:08:79:d0:df:be:1e:05:c1:c1:4b:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=7200 DF PROTO=ICMP TYPE=8 CODE=0 ID=3508 SEQ=1 
TRACE: filter:FORWARD:policy:1 IN=client-veth0 OUT=proxy-veth0 MAC=66:f2:08:79:d0:df:be:1e:05:c1:c1:4b:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=7200 DF PROTO=ICMP TYPE=8 CODE=0 ID=3508 SEQ=1 
TRACE: nat:POSTROUTING:policy:2 IN=client-veth0 OUT=proxy-veth0 MAC=66:f2:08:79:d0:df:be:1e:05:c1:c1:4b:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=7200 DF PROTO=ICMP TYPE=8 CODE=0 ID=3508 SEQ=1 
TRACE: raw:PREROUTING:policy:2 IN=proxy-veth0 OUT= MAC=16:c9:3c:d4:ad:8c:8a:84:06:5d:88:e2:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=7200 DF PROTO=ICMP TYPE=8 CODE=0 ID=3508 SEQ=1 
TRACE: filter:FORWARD:policy:1 IN=proxy-veth0 OUT=enp4s0 MAC=16:c9:3c:d4:ad:8c:8a:84:06:5d:88:e2:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=61 ID=7200 DF PROTO=ICMP TYPE=8 CODE=0 ID=3508 SEQ=1 

At the same time, having conntrack -E running on the host shows the matching:

# conntrack -E
    [NEW] icmp     1 30 src=10.10.1.1 dst=8.8.8.8 type=8 code=0 id=3508 [UNREPLIED] src=8.8.8.8 dst=10.10.1.1 type=0 code=0 id=3508

What happened:

  • conntrack (which handles NAT) doesn't care about routes (eg: there's no interface in the conntrack database), only about addresses,
  • the nat table will only see packets in NEW states,
  • the time when conntrack added a NEW entry in its database was when the packet was routed from client-veth0 to proxy-veth0: not matching the POSTROUTING rule,
  • the second round when routing from proxy-veth0 to enp4s0 the packet matched an entry in conntrack and the nat table was not called again,
  • packet leaves to Internet non-NATed.

Since this conntrack's limitation hindered some use cases in the past, like yours, an additional feature was added:

conntrack zones

A zone is simply a numerical identifier associated with a network device that is incorporated into the various hashes and used to distinguish entries in addition to the connection tuples.

[...]

This is mainly useful when connecting multiple private networks using the same addresses (which unfortunately happens occasionally) to pass the packets through a set of veth devices and SNAT each network to a unique address, after which they can pass through the "main" zone and be handled like regular non-clashing packets and/or have NAT applied a second time based f.i. on the outgoing interface.

It allows to sort-of duplicate the conntrack facility, including NAT handling, but has to be done manually and match the problem: here the routing topology.

So here the client <-> proxy traffic, in conntrack's point of view, must be split from other traffic.

I would have preferred to also split the proxy <-> Internet traffic from the generic host traffic, but this is too difficult, because the raw table, where zones must be assigned to a packet, sees only the non-de-NATed traffic, so Internet replies will all arrive with destination 172.16.202.30). Anyway There's no duplicated flow here between both like with the client <-> proxy flow, so that's not really needed.

  • zone 0 (0 means no special zone): generic host traffic along with proxy <-> Internet traffic.

    Nothing special to do, this is the default.

  • zone 1: client <-> proxy traffic. The CT --zone target is used. The value here is chosen arbitrarily and not needed anywhere else for this case.

    iptables -t raw -A PREROUTING -i client-veth0 -j CT --zone 1
    iptables -t raw -A PREROUTING -i proxy-veth0 -d 10.10.1.0/24 -j CT --zone 1
    

The correct results (I merged both tools' outputs) are now:

TRACE: raw:PREROUTING:rule:2 IN=client-veth0 OUT= MAC=4e:e7:2f:3f:a3:6c:4a:b9:40:66:60:32:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1 
TRACE: raw:PREROUTING:policy:4 IN=client-veth0 OUT= MAC=4e:e7:2f:3f:a3:6c:4a:b9:40:66:60:32:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1 
TRACE: nat:PREROUTING:policy:1 IN=client-veth0 OUT= MAC=4e:e7:2f:3f:a3:6c:4a:b9:40:66:60:32:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1 
TRACE: filter:FORWARD:policy:1 IN=client-veth0 OUT=proxy-veth0 MAC=4e:e7:2f:3f:a3:6c:4a:b9:40:66:60:32:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1 
TRACE: nat:POSTROUTING:policy:2 IN=client-veth0 OUT=proxy-veth0 MAC=4e:e7:2f:3f:a3:6c:4a:b9:40:66:60:32:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1 
    [NEW] icmp     1 30 src=10.10.1.1 dst=8.8.8.8 type=8 code=0 id=4079 [UNREPLIED] src=8.8.8.8 dst=10.10.1.1 type=0 code=0 id=4079 zone=1
TRACE: raw:PREROUTING:policy:4 IN=proxy-veth0 OUT= MAC=86:c8:4b:5f:16:fc:ba:76:80:0f:20:7d:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1 
TRACE: nat:PREROUTING:policy:1 IN=proxy-veth0 OUT= MAC=86:c8:4b:5f:16:fc:ba:76:80:0f:20:7d:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1 
TRACE: filter:FORWARD:policy:1 IN=proxy-veth0 OUT=enp4s0 MAC=86:c8:4b:5f:16:fc:ba:76:80:0f:20:7d:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=61 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1 
TRACE: nat:POSTROUTING:rule:1 IN=proxy-veth0 OUT=enp4s0 MAC=86:c8:4b:5f:16:fc:ba:76:80:0f:20:7d:08:00 SRC=10.10.1.1 DST=8.8.8.8 LEN=84 TOS=0x00 PREC=0x00 TTL=61 ID=58185 DF PROTO=ICMP TYPE=8 CODE=0 ID=4079 SEQ=1 
    [NEW] icmp     1 30 src=10.10.1.1 dst=8.8.8.8 type=8 code=0 id=4079 [UNREPLIED] src=8.8.8.8 dst=172.16.202.30 type=0 code=0 id=4079
TRACE: raw:PREROUTING:policy:4 IN=enp4s0 OUT= MAC=5e:e8:0c:bf:96:d9:b2:e7:bc:df:1f:8e:08:00 SRC=8.8.8.8 DST=172.16.202.30 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=12099 PROTO=ICMP TYPE=0 CODE=0 ID=4079 SEQ=1 
TRACE: filter:FORWARD:policy:1 IN=enp4s0 OUT=proxy-veth0 MAC=5e:e8:0c:bf:96:d9:b2:e7:bc:df:1f:8e:08:00 SRC=8.8.8.8 DST=10.10.1.1 LEN=84 TOS=0x00 PREC=0x00 TTL=61 ID=12099 PROTO=ICMP TYPE=0 CODE=0 ID=4079 SEQ=1 
 [UPDATE] icmp     1 30 src=10.10.1.1 dst=8.8.8.8 type=8 code=0 id=4079 src=8.8.8.8 dst=172.16.202.30 type=0 code=0 id=4079
TRACE: raw:PREROUTING:rule:3 IN=proxy-veth0 OUT= MAC=86:c8:4b:5f:16:fc:ba:76:80:0f:20:7d:08:00 SRC=8.8.8.8 DST=10.10.1.1 LEN=84 TOS=0x00 PREC=0x00 TTL=60 ID=12099 PROTO=ICMP TYPE=0 CODE=0 ID=4079 SEQ=1 
TRACE: raw:PREROUTING:policy:4 IN=proxy-veth0 OUT= MAC=86:c8:4b:5f:16:fc:ba:76:80:0f:20:7d:08:00 SRC=8.8.8.8 DST=10.10.1.1 LEN=84 TOS=0x00 PREC=0x00 TTL=60 ID=12099 PROTO=ICMP TYPE=0 CODE=0 ID=4079 SEQ=1 
TRACE: filter:FORWARD:policy:1 IN=proxy-veth0 OUT=client-veth0 MAC=86:c8:4b:5f:16:fc:ba:76:80:0f:20:7d:08:00 SRC=8.8.8.8 DST=10.10.1.1 LEN=84 TOS=0x00 PREC=0x00 TTL=59 ID=12099 PROTO=ICMP TYPE=0 CODE=0 ID=4079 SEQ=1 
 [UPDATE] icmp     1 30 src=10.10.1.1 dst=8.8.8.8 type=8 code=0 id=4079 src=8.8.8.8 dst=10.10.1.1 type=0 code=0 id=4079 zone=1

Here a single first packet from a new flow triggers twice iptables' nat table, the first time with no effect. Actually conntrack considers there are two flows, because the first flow has the additional attribute zone=1.