How to redirect requests on port 80 to localhost:3000 using nftables?

I'll try to address and complete OP's own working answer and further comments, which include some remaining questions:

  • why is net.ipv4.conf.eth0.route_localnet=1 needed?
  • Why does port 3000 need to be allowed on eth0 rather than lo?

and will also address a minor security concern while at it.

First here's a mandatory schematic about Packet flow in Netfilter and General Networking:

Packet flow in Netfilter and General Networking

This schematic was made with iptables in mind, but nftables can (and does in most default rulesets) use the same hooks in the same places.

When a packet arrives in the network layer (IP layer 3), it is handled by various subsystems. Normally there would be only the routing stack, but here Netfilter provides hooks for itself (conntrack, including NAT handling after initial packet) or for nftables.

Netfilter (conntrack) or nftables don't care about routing (unless if for example nftables uses specialized expressions related to routing), they leave this to the routing stack: they manipulate addresses and ports and nftables then checks available properties like interfaces, addresses and ports.

So:

  • a packet in a new connection (thus also traversing ip nat prerouting) arrives from eth0 with (for example) source address 192.0.2.2 and port 45678 destination to address 192.168.0.1 and port 80 (or 443).

  • the ip nat prerouting dnat rule matches and tells netfilter (its conntrack subsystem) to change the destination address to 127.0.0.1 and the destination port to 3000. This doesn't change any other property of the packet. In particular the packet still arrived from eth0.

  • the routing stack (routing decision in the schematic) doesn't depend on Netfilter, so is logically kept independent of it and not aware of the previous alteration. It now has to handle a packet from 192.0.2.2 and destination 127.0.0.1.

    This is an anomaly: it would allow an address range reserved for loopback to be seen "on Internet", as stated in RFC 1122:

    (g) { 127, <any> }

    Internal host loopback address. Addresses of this form MUST NOT appear outside a host.

    which is explicitly handled in Linux kernel's routing stack: treat it as martian destination (ie: drops the packet), unless relaxed by using route_localnet=1 on the related interface. That's why for this specific case net.ipv4.conf.eth0.route_localnet=1 must be set.

  • likewise, the next nftables rule, this time from filter input hook, sees a packet with input interface still eth0 but with destination port now 3000. It must thus allow destination port 3000, and doesn't have anymore to allow 80 (or 443) to accept it. So the rule should be shortened to:

    iifname "eth0" tcp dport {4489, 3000} counter accept
    

    because it will never see packets from eth0 with destination tcp port 80 or 443: they were all changed to port 3000 in the previous nat prerouting hook. Moreover, for the sake of explanation, supposing such packets were seen, they would be accepted but as there would be no listening process on ports 80 or 443 (it's listening on port 3000), the tcp stack would emit back a TCP reset to reject the connection.

    Also while the routing stack enforces some relations between 127.0.0.0/8 and the lo interface (further relaxed with route_localnet=1), as told before this doesn't concern netfilter or nftables which don't mind anything about routing. In addition if such was the case, for the input interface this would be the source address which didn't change, not the destination address which would relate to the output interface which doesn't even have a real meaning in the input path: oif or oifname can't be used here. The mere fact to be in the filter input hook already means the evaluated packet is arriving on the host for a local process, as seen on the schematic.

    UPDATE: Actually the previously given rule should be further changed for security reasons: port 3000 gets allowed, but not just for destination 127.0.0.1. A connection to 192.168.0.1:3000 can thus receive a TCP RST which hints there's something special here, rather than not getting any reply. To address this case:

    • either use this (which includes a very strange looking 2nd rule):

      iifname "eth0" tcp dport 4489 counter accept
      iifname "eth0" ip daddr 127.0.0.1 tcp dport 3000 counter accept
      

      which, because route_localnet=1, still allows a tweaked system in the same 192.168.0.0/24 LAN to access the service without using NAT at all, by sending packets with 127.0.0.1 on the wire, even if there's probably no gain doing this. For example an other Linux system, with these 4 commands:

      sysctl -w net.ipv4.conf.eth0.route_localnet=1
      ip address delete 127.0.0.1/8 dev lo # can't have 127.0.0.1 also local
      ip route add 127.0.0.1/32 via 192.168.0.1 # via, that way no suspicious ARP *broadcast* for 127.0.0.1 will be seen elsewhere.
      socat tcp4:127.0.0.1:3000 -
      
    • or instead, also protecting for the case above, way more generic and to be preferred:

      iifname "eth0" tcp dport 4489 counter accept
      ct status dnat counter accept
      
      • it keeps the unrelated port 4489/tcp allowed as before
      • ct status dnat matches if the packet was previously DNATed by the host: it will thus accepts any prior alteration without having to restate explicitly which port it was (it's still possible to also state it or anything else to further narrow the scope of what is accepted): now the port value 3000 also doesn't have to be explicitly stated anymore.
      • it thus also won't allow direct connections to port 3000 since this case wouldn't have been DNATed.
  • just to be complete: the same things happens in (not quite) reverse order for output and replies. net.ipv4.conf.eth0.route_localnet=1 allows initially generated outgoing packets from 127.0.0.1 to 192.0.2.2 to not be treated as martian sources (=> drop) in output path's routing decision, before they have a chance to be "un-DNATed" back to the original intended source address (192.168.0.1) by netfilter (conntrack) alone.


Of course, using route_localnet=1 is kind of relaxing security (not really relevant with adequate firewall rules, but not all systems are using a firewall) and requires associated knowledge on its use (eg: copying the nftables ruleset alone elsewhere won't work anymore without the route_localnet=1 setting).

Now that the security concerns were addressed in the explanations above (see "UPDATE"), if the application were allowed to listen to 192.168.0.1 (or to any address) rather than only 127.0.0.1, an equivalent configuration could be done without enabling route_localnet=1, by changing in ip nat prerouting:

iif eth0 tcp dport { 80, 443 } counter dnat 127.0.0.1:3000

to:

iif eth0 tcp dport { 80, 443 } counter dnat to 192.168.0.1:3000

or simply to:

  iif eth0 tcp dport { 80, 443 } counter redirect to :3000

which don't differ much: redirect changes the destination to the host's primary IP address on the interface eth0 which is 192.168.0.1, so most cases would behave the same.

Tags:

Nftables