Some Fragments are more equal than other fragments


[Update of a post from January 28’th 2010 to fix broken links and add details]

My last post was about being able to receive fragmented DNS answers over UDP.

Last week (that is on January 20’th 2010) ,  I did a minor revision to a script of mine that checks to see if all of the name servers for a zone have the same SOA record and support EDNS0. This will indicate whether the servers are in sync and can serve DNSSEC. During testing of the updated script all the servers for .US timed out. This had never happened before, but .US turned on DNSSEC last month, so I started to investigate.
First I issued this query in dig format:
      dig @a.gtld.biz US. SOA +dnssec +norec
Time-out confirms my script is reporting correctly. Then I tried again: (not asking for DNSSEC records)
      dig @a.gtld.biz US. SOA +norec
This worked, the server is alive and the question has an answer, Then I tried getting the DNSSEC answer using TCP:
        dig @a.gtld.biz US. SOA +norec +dnssec +tc
This gave me a 1558 byte answer (too large for one UDP packet) so fragmentation is needed to get over Ethernet links.  But my system can handle big answers, so what’s wrong?

At this point I had number of questions:

  • “What is NeuStar (the .US operator) doing?”
  • “Are any other signed domains showing the same behavior?”
  • “Are the fragments from these servers different from other fragments?”

First things first; “What is .US returning in their SOA answer?”

  • In the answer section they give out the signed SOA record.
  • In the authority section there is a signed NS set.So far, so good.
  • In an additional section there are the glue A and AAAA records for the name servers, AND a signed DNSKEY set for the US. zone.

Conclusion: NeuStar is not doing anything wrong, but there is no need for the DNSKEY in the additional section. As this is turning up some fragmentation issues, I consider this a good thing.
At this point I moved on to the second question; “Is anyone else showing the same behavior?” All the signed TLD’s that I queried returned answers of less than 1500 bytes, so no fragmentation was taking place. I modified my script to ask for larger answers in order to force fragmentation. Instead of asking for type “SOA”, I ask for type “ANY”. For signed domains this should always give answers that are larger than 1500 bytes, since signed SOA, DNSKEY, NS and NSEC/NSEC3 are guaranteed to be in the answer and possibly others. Now things got interesting.
Almost all of the signed TLDs had one or more servers that timed out. .Org, for example, had 4 out of 6, .CH had 2, and .SE had one. So the issue is not isolated to anything that the .US operator is doing, but an indicator of something bigger.

I copied my script over to another machine I have and ran it there, and lo and behold, I got answers – no time-outs. The first machine is a FreeBSD-7.2 with PF firewall, the second machine is an old FreeBSD-4.12 with IPFW.

Now on to the next question; “Are the fragments different?” A quick tcpdump verified that answers were coming back. I fired up wireshark tool to do full packet capture. To make the capture smaller I activated the built-in DNS filter, and ran my script against the .ORG servers and the .US servers.
Looking at the DNS headers there was nothing different, and the UDP headers looked fine. On the other hand I noticed that the IP header flags field from the servers that timed out all had DF (Do Not Fragment) set on all packets, and on the first packet in the fragmentation sequence both the DF and MF (More Fragments) bits were set. MF tells IP that this is a fragmented upper layer packet and this is not the last packet. At this point I wondered if somehow this bit combination was causing problems in my IP stack or the firewall running on the host.
I disabled the PF firewall and suddenly the packets flowed through. After doing some searching on the Internet I discovered that I can tell PF to not drop these packets.
My old PF configuration had
scrub in all
The new one is
scrub in all no-df

 Removing the scrub command also allows the packets to flow, but I do not recommend that solution.

I started asking around to find out what is causing the DF flag to be set on UDP packets, and the answer was “Linux is attempting MTU discovery”. I also noticed that for the last 2 years Bind releases have added code to attempt to force OS to not set DF on UDP packets. See file: lib/isc/unix/socket.c look for IP_DONTFRAG.
Conclusion: the tools I mentioned in the last post do not test to see if these malformed fragments are getting through. I have updated my perl  script to ask for a large answer from an affected site, and to report on that. Please let me know if you have the same issue and what you did to fix it on your own systems and firewalls.

, ,

Comments are closed.