What is happening with g.root-servers.net.?

Solution 1:

I always was under the impression that those root servers are redundant to the point where it is impossible that there is downtime. According to http://root-servers.org there are worldwide six locations where servers are located, so I would assume I am correct with that assumption.

Even were there not an undocumented outage for G, that's an incorrect assumption:

  • The Anycast IP addresses may represent multiple physical sites, but it is undesirable for abuse events in one region to cascade into failures in others. If a site buckles, that traffic is not going to be shifted into another.
  • Shared network links where abuse directed at a root server is present may very well choke before infrastructure closer to a root server does.

Lastly, we have the human element. G was down across the board, but there has been no officially disclosed reason for why at this time. A widespread failure of this type typically points at a deliberate action or a catastrophic failure in the central administration.

As the users of Serverfault do not represent the administrators of the root servers, your best bet is to watch for an official statement. In the meantime, the link above is sufficient to demonstrate that there was a total outage for G. The internet continued to operate because one root being down doesn't have a significant impact in the larger picture.


Update from the DoD NIC:

Regarding yesterday's G-root outage:

Like many outages, this one resulted from a series of unfortunate events.
These unfortunate events were operational errors;  steps have been taken to
prevent any reoccurrence, and to provide better service in the future.

https://lists.dns-oarc.net/pipermail/dns-operations/2016-April/014765.html

Solution 2:

I was with someone from Ripe in a meeting yesterday afternoon, and my first impression after she had shown me the problems was that the rootservers suffered from a misconfiguration in the firewall.

The things I noticed:

  • TCP responces worked fine. (https://atlas.ripe.net/dnsmon/group/root?dnsmon.session.color_range_pls=0-66-66-99-100&dnsmon.session.exclude-errors=true&dnsmon.type=server-probes&dnsmon.server=192.112.36.4&dnsmon.zone=root&dnsmon.startTime=1460573400&dnsmon.endTime=1460649600&dnsmon.ipVersion=both&dnsmon.isTcp=true)
  • UDP responces were dead.
  • All Ripe Atlas probes reported the same problem. The problem was not specific to one region.
  • BGP routing towards the network was just fine. No problems there.

The fact that UDP didn't work and TCP did work suggests that someone tried to block UDP packets above a certain size or something like that.

During the outage I have done several tests and all UDP tests failed, not only tests with a answer size larger then 512 bytes.