Do intermediate subdomains need to exist?

Solution 1:

TL;DR: yes intermediate subdomains need to exist, at least when queried for, per definition of the DNS; they may not exist in the zonefile though.

A possible confusion to eliminate first; Definition of "Empty Non-Terminal"

You may be confusing two things, as other answers seem also to do. Namely, what happens when querying for names versus how you configure your nameserver and the content of the zonefile.

The DNS is hierarchical. For any leaf node to exist, all components leading to it MUST exist, in the sense that if they are queried for, the responsible authoritative nameserver should reply for them without an error.

As explained in RFC 8020 (which is just a repeat of what was always the rule, but just some DNS providers needed a reminder), if for any query, an authoritative nameserver reply NXDOMAIN (that is: this resource record does not exist), then it means that any label "below" this resource does not exist either.

In your example, if a query for intermediate.example.com returns NXDOMAIN, then any proper recursive nameserver will immediately reply NXDOMAIN for leaf.intermediate.example.com because this record can not exist if all labels in it do not exist as records.

This was already stated in the past in the RFC 4592 about wildcards (which are unrelated here):

The domain name space is a tree structure. Nodes in the tree either
own at least one RRSet and/or have descendants that collectively own
at least one RRSet. A node may exist with no RRSets only if it has
descendants that do; this node is an empty non-terminal.

A node with no descendants is a leaf node. Empty leaf nodes do not exist.

A practical example with .US domain names

Let us take a working example from a TLD with a lot of labels historically, that is .US. Picking any example online, let us use www.teh.k12.ca.us.

Of course if you query for this name, or even teh.k12.ca.us you can get back A records. Nothing conclusive here for our purpose (there is even a CNAME in the middle of it, but we do not care about that) :

$ dig www.teh.k12.ca.us A +short
CA02205882.schoolwires.net.
107.21.20.201
35.172.15.22
$ dig teh.k12.ca.us A +short
162.242.146.30
184.72.49.125
54.204.24.19
54.214.44.86

Let us query now for k12.ca.us (I am not querying the authoritative nameserver of it, but that does not change the result in fact):

$ dig k12.ca.us A

; <<>> DiG 9.11.5-P1-1ubuntu2.5-Ubuntu <<>> k12.ca.us A
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 59101
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1480
;; QUESTION SECTION:
;k12.ca.us.         IN  A

;; AUTHORITY SECTION:
us.         3587    IN  SOA a.cctld.us. hostmaster.neustar.biz. 2024847624 900 900 604800 86400

;; Query time: 115 msec
;; SERVER: 127.0.0.10#53(127.0.0.10)
;; WHEN: mer. juil. 03 01:13:20 EST 2019
;; MSG SIZE  rcvd: 104

What do we learn from this answer?

First, it is a success because the status is NOERROR. If it had been anything else and specifically NXDOMAIN then teh.k12.ca.us, nor www.teh.k12.ca.us could exist.

Second, the ANSWER section is empty. There are no A records for k12.ca.us. This not an error, this type (A) does not exist for this record, but maybe other record types exist for this record or this record is an ENT, aka "Empty Non Terminal": it is empty, but it is not a leaf, there are things "below" it (see definition in RFC 7719), as we already know (but normally the resolution is top down, so we will reach this step before going one level below and not the opposite like we are doing here for demonstration purpose).

This is why in fact, as a shortcut, we say the status code is NODATA: this is not a real status code it just means NOERROR + empty ANSWER section, which means there is no data for this specific record type but there may be for others.

You can repeat the same experiment for the same result if you query with the next "up" label, that is the name ca.us.

Queries' results vs zonefile content

Now from where the confusion can come? I believe it may come from some false idea that any dot in a DNS name means there is a delegation. This is false. Said differently, your example.com zonefile can be like that, and it is totally valid and working:

example.com. IN SOA ....
example.com. IN NS ....
example.com. IN NS ....

leaf.intermediate.example.com IN A 192.0.2.37

With such a zonefile, querying this nameserver you will get exactly the behavior observed above: a query for intermediate.example.com will return NOERROR with an empty answer. You do not need to create it specifically in the zonefile (if you do not need it for other reasons), the authoritative nameserver will take care of synthesizing the "intermediate" replies, because it sees it needs this empty non-terminal (and any others "in-between" if there had been other labels) as it sees the leaf name leaf.intermediate.example.com.

Note that this is a widespread case in fact in some areas, but you might not see it because it targets more "infrastructure" records that people are not exposed to:

  • in reverse zones like in-addr.arp or ip6.arpa, and specifically the last one. You will have records like 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.a.1.d.e.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa. 1h IN PTR text-lb.eqiad.wikimedia.org. and there is obviously not a delegation at each dot, nor resource records attached at each label
  • in SRV records, like _nicname._tcp.fr. 12h IN SRV 0 0 43 whois.nic.fr., a domain can have many _proto._tcp.example.com and _proto._udp.example.com SRV records because by design they must have this form, but at the same time _tcp.example.com and _udp.example.com will remain Empty Non-Terminals because never used as records
  • you have in fact many other cases of specific construction of names based on "underscore labels" for various protocols such as DKIM. DKIM mandates you to have DNS records like whatever._domainkey.example.com, but obviously _domainkey.example.com by itself will never be used, so it will remain an empty non-terminal. This is the same for TLSA records in DANE (ex: _25._tcp.somehost.example.com. TLSA 3 1 1 BASE64==), or URI records (ex: _ftp._tcp IN URI 10 1 "ftp://ftp1.example.com/public")

Nameserver behavior and generation of intermediate replies

Why does the nameserver synthesize automatically such intermediate answers? The core resolution algorithm for the DNS, as detailed in RFC 1034 section 4.3.2 is the reason for that, let us take it and summarize in our case when querying the above authoritative nameserver for the name intermediate.example.com (this is the QNAME in protocol below):

  1. Search the available zones for the zone which is the nearest ancestor to QNAME. If such a zone is found, go to step 3, otherwise step 4.

The nameserver finds zone example.com as nearest ancestor of QNAME, so we can go to step 3.

We have now this:

  1. Start matching down, label by label, in the zone. [..]

a. If the whole of QNAME is matched, we have found the node. [..]

b. If a match would take us out of the authoritative data, we have a referral. This happens when we encounter a node with NS RRs marking cuts along the bottom of a zone. [..]

c. If at some label, a match is impossible (i.e., the corresponding label does not exist), look to see if a the "*" label exists. [..]

We can eliminate cases b and c, because our zonefile has no delegation (hence there will be never a referral to other nameservers, no case b), nor wildcards (so no case c).

We only have to deal here with case a.

We start matching down, label by label, in the zone. So even if we had a long sub.sub.sub.sub.sub.sub.sub.sub.example.com name, at some point, we arrive at case a: we did not find a referral, nor a wildcard, but we ended up at the final name we wanted a result for.

Then we apply the rest of the content of case a:

If the data at the node is a CNAME

Not our case, we skip that.

Otherwise, copy all RRs which match QTYPE into the answer section and go to step 6.

Whatever QTYPE we choose (A, AAAA, NS, etc.) we have no RRs for intermediate.example.com as it does not appear in the zonefile. So the copy here is empty. Now we finish at step 6:

Using local data only, attempt to add other RRs which may be useful to the additional section of the query. Exit.

Not relevant for us here, hence we finish with success.

This exactly explains the behavior observed: such queries will return NOERROR but no data either.

Now, you may ask yourself: "but then if I use any name, like another.example.com then by the above algorithm I should get the same reply (no error)", but observations would instead report NXDOMAIN in that case.

Why?

Because the whole algorithm as explained, starts with this:

The following algorithm assumes that the RRs are organized in several tree structures, one for each zone, and another for the cache

This means that the above zonefile is transformed into this tree:

+-----+
| com |  (just to show the delegation, does not exist in this nameserver)
+-----+
   |
   |
   |
+---------+
| example | SOA, NS records
+---------+
   |
   |
   |
+--------------+
| intermediate | no records
+--------------+
   |
   |
   |
+------+
| leaf | A record
+------+

So when following the algorithm, from the top, you can indeed find a path: com > example > intermediate (because the path com > example > intermediate > leaf exists) But for another.example.com, after com > example you do not find the another label in the tree, as children node of example. Hence we fall into part of choice c from above:

If the "*" label does not exist, check whether the name we are looking for is the original QNAME in the query or a name we have followed due to a CNAME. If the name is original, set an authoritative name error in the response and exit. Otherwise just exit.

Label * does not exist, and we did not follow a CNAME, hence we are in case: set an authoritative name error in the response and exit, aka NXDOMAIN.

Note that all the above did create confusion in the past. This is collected in some RFCs. See for example this unexpected place (the joy of DNS specifications being so impenetrable) defining wildcards: RFC 4592 "The Role of Wildcards in the Domain Name System" and notably its section 2.2 "Existence Rules", also cited in part at the beginning of my answer but here it is more complete:

Empty non-terminals [RFC2136, section 7.16] are domain names that own no resource records but have subdomains that do. In section 2.2.1,
"_tcp.host1.example." is an example of an empty non-terminal name.
Empty non-terminals are introduced by this text in section 3.1 of RFC 1034:

# The domain name space is a tree structure.  Each node and leaf on
# the tree corresponds to a resource set (which may be empty).  The
# domain system makes no distinctions between the uses of the
# interior nodes and leaves, and this memo uses the term "node" to
# refer to both.

The parenthesized "which may be empty" specifies that empty non-
terminals are explicitly recognized and that empty non-terminals
"exist".

Pedantically reading the above paragraph can lead to an
interpretation that all possible domains exist--up to the suggested
limit of 255 octets for a domain name [RFC1035]. For example,
www.example. may have an A RR, and as far as is practically
concerned, is a leaf of the domain tree. But the definition can be
taken to mean that sub.www.example. also exists, albeit with no data. By extension, all possible domains exist, from the root on down.

As RFC 1034 also defines "an authoritative name error indicating that the name does not exist" in section 4.3.1, so this apparently is not the intent of the original definition, justifying the need for an updated definition in the next section.

And then the definition in next section is the paragraph I quoted at the beginning.

Note that RFC 8020 (on NXDOMAIN really meaning NXDOMAIN, that is if you reply NXDOMAIN for intermediate.example.com, then leaf.intermediate.example.com can not exist) was mandated in part because various DNS providers did not follow this interpretation and that created havoc, or they were just bugs, see for example this one fixed in 2013 in one opensource authoritative nameserver code: https://github.com/PowerDNS/pdns/issues/127

People needed then to put specific counter measures just for them: that is not aggressively caching NXDOMAIN because for those providers if you get NXDOMAIN at some node, it may still mean you get something else than NXDOMAIN at another node below it.

And this was making QNAME minimization (RFC 7816) impossible to obtain (see https://indico.dns-oarc.net/event/21/contributions/298/attachments/267/487/qname-min.pdf for longer details), while it was wanted to increase privacy. Existence of empty non-terminals in case of DNSSEC also created problems in the past, around handling of non-existence (see https://indico.dns-oarc.net/event/25/contributions/403/attachments/378/647/AFNIC_OARC_Dallas.pdf if interested, but you really need a good understanding of DNSSEC before).

The following two messages give an example of problems one provider had to be able to properly enforce this rule on Empty Non-Terminals, it gives some perspective of the issues and why we where there:

  • https://mailarchive.ietf.org/arch/msg/dnsop/XIX16DCe2ln3ZnZai723v32ZIjE
  • https://lists.dns-oarc.net/pipermail/dns-operations/2019-April/018640.html

Solution 2:

It's possible that I misunderstand Khaled's answer, but the lack of intermediate records should in no wise be a problem with the resolution of the subzoned name. Note that this dig output is not from, nor directed to, an authoritative DNS server for teaparty.net or any subzone thereof:

[me@nand ~]$ dig very.deep.host.with.no.immediate.parents.teaparty.net
[...]
;; ANSWER SECTION:
very.deep.host.with.no.immediate.parents.teaparty.net. 3600 IN A 198.51.100.200

Indeed, you should be able to do that dig yourself, and get that answer - teaparty.net is a real domain, under my control, and really does contain that A record. You can verify that there are no records for any of those zones between very and teaparty.net, and that it has no impact on your resolution of the above hostname.


Solution 3:

If you are directly querying the authoritative DNS server, you will get answers without problems.

However, you will not get a valid answer if you are querying via another DNS server which does not have a valid cache. Querying for intermediate.example.com will result in NXDOMAIN error.