slow iptables, reverse dns
Last week, I investigated an issue1 where iptables -L
(the command
to list firewall rules in Linux) took over 60 seconds to complete.
This was surprising because the machine was using very little CPU and there were
only a few firewall rules. So why was it so slow?
The Linux kernel stores firewall rules in memory. From userspace, iptables
retrieves
the rules from the kernel via netlink. After retrieving the rules, iptables
simply writes them to stdout. None of this requires disk or external network access…
or so I thought.
One of my coworkers found some posts on the Internet saying that
iptables -L
uses reverse DNS to resolve IP addresses
in the firewall rules to hostnames. A lot of these posts were very old, so
we weren’t sure at first if this was still true in 2024.2 But then I checked the iptables code
and found a call to getnameinfo
,
which is invoked by both iptables-legacy
and iptables-nft
.3
To confirm this theory, we tried passing the -n
(numeric) flag to disable
reverse DNS lookup, like this:
iptables -nL
and sure enough with -n
the command completed in a fraction of a second.
In a correctly configured system, reverse DNS should be fast. Even if the DNS server fails to resolve the IP to a hostname, it should at least respond with NXDOMAIN or SERVFAIL so the client doesn’t wait on a response. However, if the DNS server never responds, or responds slowly, then the client will wait, retry, and eventually timeout.
It’s easy to reproduce the slowdown using Linux network namespaces. Create two network namespaces, one for the DNS client and one for the DNS server, connected by a virtual Ethernet (veth) pair. To simulate an unresponsive DNS server, configure the firewall in the server namespace to drop all inbound traffic.4
# Override resolv.conf inside the client netns to control the client's nameserver.
# Also override /etc/nsswitch.conf to prevent the DNS client from using systemd-resolved.
mkdir -p /etc/netns/client
echo "nameserver 192.168.99.2" > /etc/netns/client/resolv.conf
echo "hosts: dns" > /etc/netns/client/nsswitch.conf
# Create network namespaces for the DNS client and server.
ip netns add "client"
ip netns add "server"
# Create veth pair linking DNS client and server netns.
# Assign the client IP 192.168.99.1 and the server IP 192.168.99.2.
ip -n "client" link add dev "eth0" type veth peer name "eth0" netns "server"
ip -n "client" addr add dev "eth0" "192.168.99.1/24"
ip -n "client" link set dev "eth0" up
ip -n "server" addr add dev "eth0" "192.168.99.2/24"
ip -n "server" link set dev "eth0" up
# Configure the firewall on the server to drop everything in the INPUT chain.
# The server won't send any reply to the client, not even an ICMP "port unreachable".
ip netns exec "server" iptables -P INPUT DROP
# Add some firewall rules on the client.
# We could use any IP address in the rules, so arbitrarily choose IPs from 10.0.0.0/8.
ip netns exec "client" iptables -N TEST
for i in {1..3}; do
ip netns exec "client" iptables -A TEST -p tcp -d 10.0.0.$i -j ACCEPT
done
Then time the iptables list command:
[root@fedora]# time ip netns exec client iptables -L TEST
Chain TEST (0 references)
target prot opt source destination
ACCEPT tcp -- anywhere 10.0.0.1
ACCEPT tcp -- anywhere 10.0.0.2
ACCEPT tcp -- anywhere 10.0.0.3
real 1m0.069s
user 0m0.001s
sys 0m0.004s
Here’s the packet capture from the client side:
[root@fedora]# ip netns exec client tcpdump -nv udp port 53
dropped privs to tcpdump
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
09:59:22.139083 IP (tos 0x0, ttl 64, id 8849, offset 0, flags [DF], proto UDP (17), length 67)
192.168.99.1.45598 > 192.168.99.2.domain: 4498+ PTR? 1.0.0.10.in-addr.arpa. (39)
09:59:27.144524 IP (tos 0x0, ttl 64, id 8850, offset 0, flags [DF], proto UDP (17), length 67)
192.168.99.1.45598 > 192.168.99.2.domain: 4498+ PTR? 1.0.0.10.in-addr.arpa. (39)
09:59:32.150005 IP (tos 0x0, ttl 64, id 18829, offset 0, flags [DF], proto UDP (17), length 67)
192.168.99.1.39721 > 192.168.99.2.domain: 22173+ PTR? 1.0.0.10.in-addr.arpa. (39)
09:59:37.155323 IP (tos 0x0, ttl 64, id 18830, offset 0, flags [DF], proto UDP (17), length 67)
192.168.99.1.39721 > 192.168.99.2.domain: 22173+ PTR? 1.0.0.10.in-addr.arpa. (39)
09:59:42.160757 IP (tos 0x0, ttl 64, id 42006, offset 0, flags [DF], proto UDP (17), length 67)
192.168.99.1.42513 > 192.168.99.2.domain: 18542+ PTR? 2.0.0.10.in-addr.arpa. (39)
09:59:47.166116 IP (tos 0x0, ttl 64, id 42007, offset 0, flags [DF], proto UDP (17), length 67)
192.168.99.1.42513 > 192.168.99.2.domain: 18542+ PTR? 2.0.0.10.in-addr.arpa. (39)
09:59:52.171543 IP (tos 0x0, ttl 64, id 46047, offset 0, flags [DF], proto UDP (17), length 67)
192.168.99.1.43565 > 192.168.99.2.domain: 15599+ PTR? 2.0.0.10.in-addr.arpa. (39)
09:59:57.176895 IP (tos 0x0, ttl 64, id 46048, offset 0, flags [DF], proto UDP (17), length 67)
192.168.99.1.43565 > 192.168.99.2.domain: 15599+ PTR? 2.0.0.10.in-addr.arpa. (39)
10:00:02.182315 IP (tos 0x0, ttl 64, id 12270, offset 0, flags [DF], proto UDP (17), length 67)
192.168.99.1.45392 > 192.168.99.2.domain: 29590+ PTR? 3.0.0.10.in-addr.arpa. (39)
10:00:07.187633 IP (tos 0x0, ttl 64, id 12271, offset 0, flags [DF], proto UDP (17), length 67)
192.168.99.1.45392 > 192.168.99.2.domain: 29590+ PTR? 3.0.0.10.in-addr.arpa. (39)
10:00:12.193058 IP (tos 0x0, ttl 64, id 22543, offset 0, flags [DF], proto UDP (17), length 67)
192.168.99.1.35956 > 192.168.99.2.domain: 53102+ PTR? 3.0.0.10.in-addr.arpa. (39)
10:00:17.198478 IP (tos 0x0, ttl 64, id 22544, offset 0, flags [DF], proto UDP (17), length 67)
192.168.99.1.35956 > 192.168.99.2.domain: 53102+ PTR? 3.0.0.10.in-addr.arpa. (39)
The client sends PTR queries to resolve each IP to a hostname, waiting in vain for responses that never arrive. Notice that the client queries each IP address four times (initial query plus three retries), with a five second timeout between retries:
3 IPs x 4 queries/IP x 5 seconds/query = 60 seconds
iptables isn’t the only Linux networking tool that uses reverse DNS by default: tcpdump and traceroute do this too. It’s a curious decision, because these are often the very tools used to debug DNS problems.5
In conclusion, if iptables -L
is slow, try iptables -nL
instead to skip reverse DNS lookups!
This eventually led to a one-character bugfix in Azure CNI. ↩︎
I later discovered that this behavior is documented in the man page for
iptables
under the section about-L
: “Please note that it is often used with the -n option, in order to avoid long reverse DNS lookups.” ↩︎The call path in iptables-nft v1.8.9 is
nft_ipv4_print_rule
→print_ipv4_addresses
→ipv4_addr_to_string
→xtables_ipaddr_to_anyname
→ipaddr_to_host
→getnameinfo
. ↩︎Without the inbound DROP rule, the Linux kernel will send the client an ICMP “port unreachable” packet because the server netns has no process listening on port 53. The DNS client will see this and stop waiting for a response, so
iptables -L
still completes relatively quickly. ↩︎To their credit, the netfilter developers fixed this in
nft
, the successor ofiptables
. By default,nft list
won’t perform reverse DNS lookups unless given the-N
flag. ↩︎