How long does negative DNS caching typically last?

Solution 1:

The TTL for negative caching is not arbitrary. It is taken from the SOA record at the top of the zone to which the requested record would have belonged, had it existed. For example:

example.org.    IN      SOA     master-ns1.example.org. Hostmaster.example.org. (
            2012091201 43200 1800 1209600 86400 )

The last value in the SOA record ("86400") is the amount of time clients are asked to cache negative results under example.org..

If a client requests doesnotexist.example.org., it will cache the result for 86400 seconds.

Solution 2:

This depends on your exact definition of a "negative query", but in either case, this is documented in rfc2308 «Negative Caching of DNS Queries (DNS NCACHE)»:


NXDOMAIN

  • If the resolution is successful, and results in NXDOMAIN, the response will come with a SOA record, which would contain the NXDOMAIN TTL (traditionally known as the MINIMUM field). rfc2308#section-4

SERVFAIL

  • If the resolution is not successful, and results in a timeout ( SERVFAIL), then it may as well not be cached at all, and in all circumstances MUST NOT be cached for longer than 5 minutes. rfc2308#section-7.1

    Note that in practice, caching such results for the full allowable 5 minutes is a great way to diminish the experience of a client should their cache server occasionally suffer brief connectivity issues (and effectively make it easily vulnerable to a Denial-of-Service amplification, where a few seconds of downtime would result in the certain parts of the DNS being down for the five full minutes).

    Prior to BIND 9.9.6-S1 (released in 2014), apparently, SERVFAIL was not cached at all. a878301 (2014-09-04)

    E.g., at the time of your question and in all versions of BIND released prior to 2014, the BIND recursive resolver DID NOT cache SERVFAIL at all, if the above commit and the documentation about the first introduction in 9.9.6-S1 is to be believed.

    In the latest BIND, the default servfail-ttl is 1s, and the setting is hardcoded to a ceiling of 30s (in place of the RFC-mandated ceiling of 300s). 90174e6 (2015-10-17)

    Furthermore, the following are some noteworthy quotes on the matter:

    • https://kb.isc.org/article/AA-01178/ (2014/2016-01-07)

    The outcome of caching SERVFAIL responses has included some situations where it was seen to be detrimental to the client experience, particularly when the causes of the SERVFAIL being presented to the client were transient and from a scenario where an immediate retry of the query would be a more appropriate action.

    • http://cr.yp.to/djbdns/third-party.html (2003-01-11)

    The second tactic is to claim that widespread DNS clients will do something Particularly Evil when they are unable to reach all DNS servers. The problem with this argument is that the claim is false. Any such client is clearly buggy, and will be unable to survive in the marketplace: consider what happens if the client's routers briefly go down, or if the client's network is temporarily flooded.


In summary, an NXDOMAIN response would be cached as specified in the SOA of the applicable zone, whereas SERVFAIL is unlikely to be cached, or, if cached, it'll be at most a double-digit number of seconds.


Solution 3:

There is an RFC dedicated to this topic: RFC 2308 - Negative Caching of DNS Queries (DNS NCACHE).

The relevant section to read is 5 - Caching Negative Answers which states:

Like normal answers negative answers have a time to live (TTL). As there is no record in the answer section to which this TTL can be applied, the TTL must be carried by another method. This is done by including the SOA record from the zone in the authority section of the reply. When the authoritative server creates this record its TTL is taken from the minimum of the SOA.MINIMUM field and SOA's TTL. This TTL decrements in a similar manner to a normal cached answer and upon reaching zero (0) indicates the cached negative answer MUST NOT be used again.

Firstly lets identify the SOA.MINIMUM and SOA TTL described in the RFC. The TTL is the number before the the record type IN (900 seconds in the example below). While the minimum is last field in the record (86400 seconds in the example below).

$ dig serverfault.com soa @ns-1135.awsdns-13.org +noall +answer +multiline

; <<>> DiG 9.11.3-1ubuntu1.8-Ubuntu <<>> serverfault.com soa @ns-1135.awsdns-13.org +noall +answer +multiline
;; global options: +cmd
serverfault.com.    900 IN SOA ns-1135.awsdns-13.org. awsdns-hostmaster.amazon.com. (
                1          ; serial
                7200       ; refresh (2 hours)
                900        ; retry (15 minutes)
                1209600    ; expire (2 weeks)
                86400      ; minimum (1 day)
                )

Now lets look at some examples, the serverfault.com zone is illustrative as it has authoritative servers from two different providers that are configured differently.

Lets find the authoritative nameservers for the serverfault.com zone:

$ host -t ns serverfault.com
serverfault.com name server ns-860.awsdns-43.net.
serverfault.com name server ns-1135.awsdns-13.org.
serverfault.com name server ns-cloud-c1.googledomains.com.
serverfault.com name server ns-cloud-c2.googledomains.com.

Then check the SOA record using an aws nameserver:

$ dig serverfault.com soa @ns-1135.awsdns-13.org | grep 'ANSWER SECTION' -A 1
;; ANSWER SECTION:
serverfault.com.    900 IN  SOA ns-1135.awsdns-13.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400

From this we can see that the TTL of the SOA record is 900 seconds while the negative TTL value is 86400 seconds. The SOA TTL value of 900 is lower so we expect this value to be used.

Now if we query an authoritative server for a non existent domain we should get a response without an answer and with a SOA record in the authority section:

$ dig nxdomain.serverfault.com @ns-1135.awsdns-13.org

; <<>> DiG 9.11.3-1ubuntu1.8-Ubuntu <<>> nxdomain.serverfault.com @ns-1135.awsdns-13.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 51948
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;nxdomain.serverfault.com.  IN  A

;; AUTHORITY SECTION:
serverfault.com.    900 IN  SOA ns-1135.awsdns-13.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400

;; Query time: 125 msec
;; SERVER: 205.251.196.111#53(205.251.196.111)
;; WHEN: Tue Aug 20 15:49:47 NZST 2019
;; MSG SIZE  rcvd: 135

When a recursive (caching) resolver receives this answer it will parse the SOA record in the AUTHORITY SECTION and use the TTL of this record to determine how long it should cache the negative result (in this case 900 seconds).

Now lets follow the same procedure with a google nameserver:

$ dig serverfault.com soa @ns-cloud-c2.googledomains.com | grep 'ANSWER SECTION' -A 1
;; ANSWER SECTION:
serverfault.com.    21600   IN  SOA ns-cloud-c1.googledomains.com. cloud-dns-hostmaster.google.com. 1 21600 3600 259200 300

You can see that the google nameservers have different values for both the SOA TTL and the Negative TTL values. In this case the negative TTL of 300 is lower than the SOA TTL of 21600. Therefore the google server should use the lower value in the AUTHORITY SECTION SOA record when returning an NXDOMAIN response:

$ dig nxdomain.serverfault.com @ns-cloud-c2.googledomains.com

; <<>> DiG 9.11.3-1ubuntu1.8-Ubuntu <<>> nxdomain.serverfault.com @ns-cloud-c2.googledomains.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 25920
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;nxdomain.serverfault.com.  IN  A

;; AUTHORITY SECTION:
serverfault.com.    300 IN  SOA ns-cloud-c1.googledomains.com. cloud-dns-hostmaster.google.com. 1 21600 3600 259200 300

;; Query time: 130 msec
;; SERVER: 216.239.34.108#53(216.239.34.108)
;; WHEN: Tue Aug 20 16:05:24 NZST 2019
;; MSG SIZE  rcvd: 143

As expected the TTL of the SOA record in the NXDOMAIN response is 300 seconds.

The example above also demonstrates how easy it is to get different answers to the same query. The answer that an individual caching resolver ends up using is down to which authoritative namserver was queried.

In my testing I have also observed that some recursive (caching) resolvers do not return an AUTHORITY SECTION with a SOA record with a decrementing TTL for subsequent requests whereas others do.

For example the cloudflare resolver does (note the decrementing TTL value):

$ dig nxdomain.serverfault.com @1.1.1.1 | grep 'AUTHORITY SECTION' -A 1
;; AUTHORITY SECTION:
serverfault.com.    674 IN  SOA ns-1135.awsdns-13.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400
$ dig nxdomain.serverfault.com @1.1.1.1 | grep 'AUTHORITY SECTION' -A 1
;; AUTHORITY SECTION:
serverfault.com.    668 IN  SOA ns-1135.awsdns-13.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400

While the default resolver in an AWS VPC will respond with an authority section only on the first request:

$ dig nxdomain.serverfault.com @169.254.169.253 | grep 'AUTHORITY SECTION' -A 1
;; AUTHORITY SECTION:
serverfault.com.    300 IN  SOA ns-cloud-c1.googledomains.com. cloud-dns-hostmaster.google.com. 1 21600 3600 259200 300
$ dig nxdomain.serverfault.com @169.254.169.253 | grep 'AUTHORITY SECTION' -A 1 | wc -l
0

Note: This answer addresses the behavior of NXDOMAIN answers.

Glossary:

  • Zone
  • SOA
  • TTL
  • Recursive NameServer
  • Authoritative Nameserver