Threat Mitigation for the Root Server System

Joe Abley

Sept. 30, 2019

2:54 p.m.

Hi all, I noticed this document today, published last month: https://root-servers.org/publications/Threat_Mitigation_For_the_Root_Server_... <https://root-servers.org/publications/Threat_Mitigation_For_the_Root_Server_...> Although I haven't gone through it in detail and could no doubt come up with some comments if I was asked to review it, this looks like a nice document. It'd be nice if it was citable and easier to find. Is there a reason it wasn't published through the RSSAC process? Or is the audience of this document other root server operators, and its publication a (welcome!) exercise in transparency? The introduction contains the phrase "Root Ops community" but I find that slightly ambiguous. I realise I'm crossing the streams a bit by asking here, but it seems like the right people will hear the question, and others here might be interested in the answer. Also if it's entirely obvious to everybody else here why this has happened the resulting public beating with clue bats will surely be therapeutic. Joe

Attachments:

attachment.html (text/html — 1.6 KB)
signature.asc (application/pgp-signature — 195 bytes)

Show replies by date

Wes Hardaker

October 2019

2:48 p.m.

...

Is there a reason it wasn't published through the RSSAC process?

I would personally say that the contents of the document was very technically tied to the *implementation* of the RSS at the various root server operators, and how the current operator deployment architectures address the perceived threats. This is very different (and subtly the same, yes) as the remit of RSSAC which is at the policy level, how should it perform as a whole, what are the requirements of the larger system, what does the community need of the RSS, etc type of questioning. -- Wes Hardaker USC/ISI

Mallory Knodel

6:45 p.m.

On 01/10/2019 17:48, Wes Hardaker wrote:

...

Is there a reason it wasn't published through the RSSAC process?

I would personally say that the contents of the document was very technically tied to the *implementation* of the RSS at the various root server operators, and how the current operator deployment architectures address the perceived threats. This is very different (and subtly the same, yes) as the remit of RSSAC which is at the policy level, how should it perform as a whole, what are the requirements of the larger system, what does the community need of the RSS, etc type of questioning.

I agree with Joe that this document is helpful and interesting, but also that I would have comments or questions were it being developed in an RSSAC process. Thanks Wes for your take on this. Would there be a need or an interest in taking what is in this document into discussions at the policy level? -Mallory -- Mallory Knodel Head of Digital :: article19.org gpg fingerprint :: E3EB 63E0 65A3 B240 BCD9 B071 0C32 A271 BD3C C780

Wes Hardaker

7:51 p.m.

...

Thanks Wes for your take on this. Would there be a need or an interest in taking what is in this document into discussions at the policy level?

(again, speaking only for myself) I think to a large extent similar things are already being discussed in the rssac workshops (specifically, the metrics work party), probably ssac work bodies, and will be taken up as RSSAC037 model development goes forward as the GCG gets created by the ICANN board to discuss the future of the governance models of the root. I suspect this will include discussions on measuring and monitoring the root for its effectiveness in meeting system requirements. So turning the question back around, what aspects do you think should be discussed in the ICANN forums that are currently being missed? -- Wes Hardaker USC/ISI

Mallory Knodel

10:28 a.m.

On 01/10/2019 22:51, Wes Hardaker wrote:

...

Thanks Wes for your take on this. Would there be a need or an interest in taking what is in this document into discussions at the policy level?

(again, speaking only for myself) I think to a large extent similar things are already being discussed in the rssac workshops (specifically, the metrics work party), probably ssac work bodies, and will be taken up as RSSAC037 model development goes forward as the GCG gets created by the ICANN board to discuss the future of the governance models of the root. I suspect this will include discussions on measuring and monitoring the root for its effectiveness in meeting system requirements.

So turning the question back around, what aspects do you think should be discussed in the ICANN forums that are currently being missed?

Hi Wes, So the big over arching question after enumerating threats is, okay so what do we do about them? And I don't think the answer is always a technical or operations one. There may be some larger policy answers or answers that could impact an entire ecosystem like the DNS. That was vague but I was particularly interested in understanding more about the threat of DoH and what mitigation RSSAC proposes. I also have interest in DDoS mitigation. -Mallory -- Mallory Knodel Head of Digital :: article19.org gpg fingerprint :: E3EB 63E0 65A3 B240 BCD9 B071 0C32 A271 BD3C C780

Michael Casadevall

7:30 a.m.

Looking at the document, one thing that jumps out of me is both data integrity and route hijacking. I feel like this document puts far too much faith in validating resolvers to catch compromised root servers and route hijacking. When I did work on creating a root server emulation setup for testing root IDNs, generally speaking, unless I went out of my way to force strict compliance, non-signed/invalidly signed data would just be accepted if it came from the roots even before I inserted my fake KSK into the mix. The short version, is if I can pop a root server and inject malicious traffic, I can't tamper with the root server data without causing signature validation failures. I *can* freely change the NS records to MITN any DNS lookups to a TLD especially if I'm just echoing signed data. That's a pretty darn effective way to essentially cast a dragnet on all lookups to a TLD. Furthermore, I'm not convinced that the recursive resolver infrastructure as a whole will properly fail if someone injected a bad zone into the root. The defaults on far too many things are to soft-fail vs. hard-fail. The canonical example I can bring out is .local/.lan. These TLDs are common in private networks, and if my understanding is correct, the roots get pinged for these TLDs all the time. The root server send NXDOMAIN (with signed NSEC if DO=1) but these hosts still (presumably) work for the locations that use them. If recursive resolvers were hardfailing for non-existant TLDs, we'd see a lot of pain when the root zone was originally signed. Taking the case that I've popped a RSO, and can control one of the root servers zonefile freely, I can delete all DNSSEC data and just send whatever records I like. How much of the validating recursive infrastructure is going to properly hard-fail if they receive unsigned data, or a mix of signed and unsigned data? Speaking from my position on the other end of the stack, I have a bunch of stuff here that either falls over with DNSSEC or actively lies (AD=1) when it's impossible for that to be true. DNSSEC can provide the security we want with regards to data integrity, but I think we're assuming too much of the DNS ecosystem to say that it's effective at preventing root zone tampering without actual data to support that. I could probably re-purpose some of my original root zone emulator work to actually test this behavior more in-depth but I think that's a separate discussion (esp. with regards to what to test and such). Michael On 9/30/19 10:54 AM, Joe Abley wrote:

...

Hi all,

I noticed this document today, published last month:

https://root-servers.org/publications/Threat_Mitigation_For_the_Root_Server_...

Although I haven't gone through it in detail and could no doubt come up with some comments if I was asked to review it, this looks like a nice document. It'd be nice if it was citable and easier to find.

Is there a reason it wasn't published through the RSSAC process? Or is the audience of this document other root server operators, and its publication a (welcome!) exercise in transparency? The introduction contains the phrase "Root Ops community" but I find that slightly ambiguous.

I realise I'm crossing the streams a bit by asking here, but it seems like the right people will hear the question, and others here might be interested in the answer. Also if it's entirely obvious to everybody else here why this has happened the resulting public beating with clue bats will surely be therapeutic.

Joe

_______________________________________________ rssac-caucus mailing list rssac-caucus@icann.org https://mm.icann.org/mailman/listinfo/rssac-caucus

_______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.

Paul Vixie

2:36 p.m.

Michael Casadevall wrote on 2019-10-02 00:30:

...

Looking at the document, one thing that jumps out of me is both data integrity and route hijacking. I feel like this document puts far too much faith in validating resolvers to catch compromised root servers and route hijacking.

When I did work on creating a root server emulation setup for testing root IDNs, generally speaking, unless I went out of my way to force strict compliance, non-signed/invalidly signed data would just be accepted if it came from the roots even before I inserted my fake KSK into the mix.

The short version, is if I can pop a root server and inject malicious traffic, I can't tamper with the root server data without causing signature validation failures. I *can* freely change the NS records to MITN any DNS lookups to a TLD especially if I'm just echoing signed data. That's a pretty darn effective way to essentially cast a dragnet on all lookups to a TLD.

Furthermore, I'm not convinced that the recursive resolver infrastructure as a whole will properly fail if someone injected a bad zone into the root. The defaults on far too many things are to soft-fail vs. hard-fail.

it's vital that we find out how common this behaviour is. data exfiltration by unsigned (delegation) NS RR is not concerning, since the data is public, and since in the earlier part of the story a root name server had to be compromised -- so exfiltration is already on the table, be it from one vantage point or another. however, if soft fails are sometimes ignored, we need to study it closely.

...

The canonical example I can bring out is .local/.lan. These TLDs are common in private networks, and if my understanding is correct, the roots get pinged for these TLDs all the time. The root server send NXDOMAIN (with signed NSEC if DO=1) but these hosts still (presumably) work for the locations that use them. If recursive resolvers were hardfailing for non-existant TLDs, we'd see a lot of pain when the root zone was originally signed.

generally speaking, local names like .lan or .local (or .tor) are handled as "hosts file fallback" or as a plugin, and those namespaces rely on an NXD response from DNS in order to open up the next namespace in their local logic path.

...

Taking the case that I've popped a RSO, and can control one of the root servers zonefile freely, I can delete all DNSSEC data and just send whatever records I like. How much of the validating recursive infrastructure is going to properly hard-fail if they receive unsigned data, or a mix of signed and unsigned data?

all of it. the RDNS config includes a binding from a static key to some subtree of the DNS -- often called a "trust anchor". in bind9 as observed, if it gets unsigned data for a subtree (such as ".") that has a trust anchor defined for it, the stub resolve will get a SERVFAIL answer. if _any_ available validating RDNS does not behave that way, we need to set our hair on fire until we get it either fixed or shamed/decertified.

...

Speaking from my position on the other end of the stack, I have a bunch of stuff here that either falls over with DNSSEC or actively lies (AD=1) when it's impossible for that to be true.

we need details.

...

DNSSEC can provide the security we want with regards to data integrity, but I think we're assuming too much of the DNS ecosystem to say that it's effective at preventing root zone tampering without actual data to support that.

I could probably re-purpose some of my original root zone emulator work to actually test this behavior more in-depth but I think that's a separate discussion (esp. with regards to what to test and such). Michael please either continue, or share more detail about prior experiments.

-- P Vixie

Michael Casadevall

6:42 p.m.

Replies inline. Apologies for wordiness but there's information I didn't want to put in a pastebin in this email. On 10/2/19 10:36 AM, Paul Vixie wrote:

...

Michael Casadevall wrote on 2019-10-02 00:30:

...
Looking at the document, one thing that jumps out of me is both data integrity and route hijacking. I feel like this document puts far too much faith in validating resolvers to catch compromised root servers and route hijacking.

When I did work on creating a root server emulation setup for testing root IDNs, generally speaking, unless I went out of my way to force strict compliance, non-signed/invalidly signed data would just be accepted if it came from the roots even before I inserted my fake KSK into the mix.

The short version, is if I can pop a root server and inject malicious traffic, I can't tamper with the root server data without causing signature validation failures. I *can* freely change the NS records to MITN any DNS lookups to a TLD especially if I'm just echoing signed data. That's a pretty darn effective way to essentially cast a dragnet on all lookups to a TLD.

Furthermore, I'm not convinced that the recursive resolver infrastructure as a whole will properly fail if someone injected a bad zone into the root. The defaults on far too many things are to soft-fail vs. hard-fail.

it's vital that we find out how common this behaviour is.

My original framework for testing IDNs exists as a (somewhat ugly) set of bash scripts and docker instances. It's available here on GitHub: https://github.com/NCommander/idn-root-zone Please note I'm going off memory here so some of the details may be wrong. The original work didn't go anywhere but I'm happy to revive this. The setup creates a fake root zone signed with it's own KSK. Each TLD is delegated to a separate resolver from the root zone, which in turn subdelegates to a second level domain, more or less accurately capturing the state of how TLDs are managed. Each zone is signed with it's own set of keys. As a test, I defined .internic as a TLD to get everything working so you ended up with the following. 1. A docker container with the root zone signed with a fake KSK 2. Container with authoritive BIND server for .internic and handling key signing via BIND's key management features. 3. Second level TLD (nynex.internic) which was authoritative 4. A recursive resolver (hermes.nynex.internic) Each server is loaded with the proper root hints to point at the fake root + bind.keys. I also had part of a .arpa zone setup but looking at github, I either never finished it, or never committed it. This framework is pretty easy to extend to perform a more in-depth test if we can figure out exactly what and how we want to test.

...

data exfiltration by unsigned (delegation) NS RR is not concerning, since the data is public, and since in the earlier part of the story a root name server had to be compromised -- so exfiltration is already on the table, be it from one vantage point or another.

however, if soft fails are sometimes ignored, we need to study it closely.

Fair enough regard exfilitation of data. As for soft fails, or root DNSSEC data just being MIA, I can cite specific examples. The first one I can give is actually my mom's home router. I wrote incorrectly that it returns AD=1, however it still shows erroneous behavior with DNSSEC. The original writeup as part of my DNSCatcher work (https://github.com/NCommander/dnscatcher/blob/master/doc/technical_overview....) is correct but I'll summarize it here. In short, my personal domain (casadevall.pro) is DNSSEC signed. Asking my router for verification returns the following: $ dig @10.0.0.1 casadevall.pro +dnssec ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7107 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags: do; udp: 1280 casadevall.pro. 3600 IN A 96.126.124.51 casadevall.pro. 3600 IN RRSIG A 8 2 3600 *snip* Note that the AD flag is *not* set even though it returned an RRSIG record. It should have returned SERVFAIL and the router failed to validate and CD=0. (I did check with Wireshark, and the router is responding DO=1 in the EDNS section in the query response so this is clearly erroneous behavior; I can provide pcap files). Furthermore, this router's DNS client is broken and can't perform DNSSEC lookups properly. This can be seen if one performs a sigchase (pastebin: https://pastebin.com/A3mKa7rA). If the router did not support DNSSEC at all, it shouldn't be returning DO=1 on the OPT reply. Sigchase passes if I use 8.8.8.8 (pastebin: https://pastebin.com/DDdfiiY6) The router responses to version.bind: ;; ANSWER SECTION: version.bind. 0 CH TXT "dnsmasq-2.39" Short version: Router says it supports DNSSEC, returns DO=1 when queried, fails to be able to actually look up DNSSEC records, and fails to return SERVFAIL regardless of the domain being right or wrong (sigfail.verteiltesysteme.net shows the same behavior). --- Case 2. soylentnews.org SoylentNews is a small business I own. For long and irrelevant reasons, the network it runs on is on split horizon DNS; externally, soylentnews.org and sylnt.us are DNSSEC signed. (https://dnssec-analyzer.verisignlabs.com/soylentnews.org). During testing, I noticed an odd discrepancy: Internally, all servers are networked via IPv6 (there are no A records for most machines) and reference each other via a fake TLD, li694-22. For example: mcasadevall@boron:/etc/bind$ dig hydrogen.li694-22 AAAA +short 2600:3c00::f03c:91ff:fe6e:1ded Directory information is handled by hesiod in the IN class: mcasadevall@boron:~$ dig @helium.li694-22 mcasadevall.passwd.li694-22 TXT +short "mcasadevall:*:2500:2500:Michael Casadevall,SN,+1-917-716-2585:/home/mcasadevall:/bin/bash" Password authentication is handled through Kerberos. boron is one of two recursive resolvers, with helium being the authoritative server for the li694-22 zone. The li694-22 is *not* signed at all (no local anchor or DLV). Helium is the authoritative name for the li694-22 zone. Boron is the backup server and replicates via AXFR from helium. The li694-22 is not signed in any way. For these tests, I'm accessing the recursive resolvers from another machine, specifically neon, one of our database nodes. Helium's named options relating to DNSSEC are only set as follows: named.conf.options: dnssec-validation auto; When requesting hydrogen's AAAA record with DO=1: mcasadevall@neon:~$ dig @helium.li694-22 hydrogen.li694-22 AAAA +dnssec ; <<>> DiG 9.9.5-3ubuntu0.15-Ubuntu <<>> @helium.li694-22 hydrogen.li694-22 AAAA +dnssec ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58352 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 3 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags: do; udp: 4096 ;; QUESTION SECTION: ;hydrogen.li694-22. IN AAAA ;; ANSWER SECTION: hydrogen.li694-22. 3600 IN AAAA 2600:3c00::f03c:91ff:fe6e:1ded ;; AUTHORITY SECTION: li694-22. 3600 IN NS helium.li694-22. li694-22. 3600 IN NS boron.li694-22. ;; ADDITIONAL SECTION: boron.li694-22. 3600 IN AAAA 2600:3c00::f03c:91ff:fe6e:c4bf helium.li694-22. 3600 IN AAAA 2600:3c00::f03c:91ff:fe6e:1d88 ;; Query time: 0 msec ;; SERVER: 2600:3c00::f03c:91ff:fe6e:1d88#53(2600:3c00::f03c:91ff:fe6e:1d88) ;; WHEN: Wed Oct 02 17:45:52 UTC 2019 ;; MSG SIZE rcvd: 171 I get an AA record, but no signed results. This should be SERVFAIL because there's no chain from root. Notably, running a test against a known bad domain fails as expected: mcasadevall@neon:~$ dig @helium.li694-22 sigfail.verteiltesysteme.net AAAA ; <<>> DiG 9.9.5-3ubuntu0.15-Ubuntu <<>> @helium.li694-22 ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 35765 As such BIND is properly validating DNSSEC requests given the SERVFAIL, but if it's authoritative and returns AA, it returns the record. Helium is running the latest LTS of Ubuntu 18.04: mcasadevall@helium:/etc/bind$ dpkg -l bind9 ii bind9 1:9.11.3+dfsg-1ubun amd64 Short version: Commonly deployed version of BIND shows incorrect behavior and will return a record if its authoritive even if DNSSEC information is MIA or broken. --- Case 2.5 Neon Oddity When I wrote this up, I was originally testing on the backup recursive resolver, and got very strange behavior. Boron is running Ubuntu 14.04 and is overdue for upgrade. It's pointed at helium. It's dnssec options are set somewhat differently. named.conf.local: auto-dnssec maintain; named.conf.local: auto-dnssec maintain; named.conf.options: dnssec-enable yes; named.conf.options: dnssec-validation yes; named.conf.options: dnssec-lookaside auto; Neon shows the same behavior resolving the li694-22 domain. It also will resolve the known broken domain: mcasadevall@boron:/etc/bind$ dig @127.0.0.1 sigfail.verteiltesysteme.net AAAA +short +dnssec 2001:638:501:8efc::139 AAAA 5 3 60 20191230020003 20190930020003 30665 verteiltesysteme.net. //This+RRSIG+is+deliberately+broken///For+more+informati on+please+go+to/http+//www+verteiltesysteme+net///////// //////////////////////////////////////////////////////// //8= mcasadevall@boron:/etc/bind$ dpkg -l bind9 ii bind9 1:9.9.5.dfsg amd64 Internet Domain Name Server Boron is slated for upgrade this weekend (I didn't realize it was THAT out of date), but I wanted to hilight a case where DNSSEC validation is not working the way it's supposed to.

...

...
The canonical example I can bring out is .local/.lan. These TLDs are common in private networks, and if my understanding is correct, the roots get pinged for these TLDs all the time. The root server send NXDOMAIN (with signed NSEC if DO=1) but these hosts still (presumably) work for the locations that use them. If recursive resolvers were hardfailing for non-existant TLDs, we'd see a lot of pain when the root zone was originally signed.

generally speaking, local names like .lan or .local (or .tor) are handled as "hosts file fallback" or as a plugin, and those namespaces rely on an NXD response from DNS in order to open up the next namespace in their local logic path.

See counter-example above. I also know Microsoft Active Directory for .local (or any domain) is handled through DNS in addition to LDAP and WINS and not as a fallback. I'm unfamiliar with any setup that actually depends on an NXD from the root; in all cases and clients I've worked with, the setup breaks if their resolvers aren't first in resolv.conf (or are loading hosts from NIS/LDAP via nsswitch. However, in MS environments, it's always been DNS in my experience).

...

...
Taking the case that I've popped a RSO, and can control one of the root servers zonefile freely, I can delete all DNSSEC data and just send whatever records I like. How much of the validating recursive infrastructure is going to properly hard-fail if they receive unsigned data, or a mix of signed and unsigned data?

all of it. the RDNS config includes a binding from a static key to some subtree of the DNS -- often called a "trust anchor". in bind9 as observed, if it gets unsigned data for a subtree (such as ".") that has a trust anchor defined for it, the stub resolve will get a SERVFAIL answer.

See above, this isn't the observed behavior in limited testing on a recent BIND9 instance, nor did I run into this with my IDN root zone emulator. I admit in the case I gave, there was no dedicated recursive resolver, but dnssec-validation yes should have caused SERVFAIL regardless if AA is true or not. Setting DO=1 didn't return AD=1 (no path to root), but it didn't return SERVFAIL either.

...

if _any_ available validating RDNS does not behave that way, we need to set our hair on fire until we get it either fixed or shamed/decertified.

...
Speaking from my position on the other end of the stack, I have a bunch of stuff here that either falls over with DNSSEC or actively lies (AD=1) when it's impossible for that to be true.

we need details.

I misremembered with the AD=1. I can build more comprehensive tests here. As far as I understand the specs, in any case where DNSSEC validation is enabled, I should get SERVFAIL on essentially all of the above and not what I'm seeing. I'll admit these are scattershot data points, but there's enough red flags based on my own work with the root zone emulator and the data points I collected for this email that there is a real concern that recursive resolvers are either soft-failing or showing outright incorrect behavior.

...

...
DNSSEC can provide the security we want with regards to data integrity, but I think we're assuming too much of the DNS ecosystem to say that it's effective at preventing root zone tampering without actual data to support that.

I could probably re-purpose some of my original root zone emulator work to actually test this behavior more in-depth but I think that's a separate discussion (esp. with regards to what to test and such). Michael please either continue, or share more detail about prior experiments.

What I'd like to do, with help with the caucus is devise a set of scenarios, and the expected DNSSEC behavior in each one, and run a full battery of tests to see essentially if they bend, testing the lastest version of popular DNS resolvers (dnsmasq, BIND, PowerDNS, Unbind, and Microsoft AD are probably good candidates, dnsmasq is extremely common in routers) and seeing if they actually properly go SERVFAIL when they're supposed to. This should also extend to distribution supplied versions of resolvers (i.e. BIND as shipped in Ubuntu 16.04/18.04/latest). The framework I'd recommend is my fake Internet root zone with it's own KSK so I fully control the environment and can test various scenarios. The root server software can be consistent because we're looking at the behavior of the recursive resolvers. Outside of the control tests, I believe all these cases SHOULD return SERVFAIL. Any others or other cases that should pass/fail, please note. 1. Control: DNSSEC validation pass of an existing domain in the root from the client to a resolver 2. Control: DNSSEC validation NSEC+NXDOMAIN of non-existent domain in the root 3. Hijacked root pointing at unsigned TLD 4. Hijacked root serving bad data with incorrect RRSIGs 5. Hijacked root not returning RRSIGs. 6. TLD hijack serving bad signatures (represents Case 2 above). 7. TLD injection (Case 2 above), with no chain to root 8. Domain hijack between root and recursive resolver (traffic injection, cache poisioning, etc.) 9. Expired/Revoked KSK, or similar scenario. I'm pretty close to the end of my day, but if this sounds like the correct direction to investigate, I'll flesh this out into an actual document and float it on the caucus and start working on building a test suite for it. Any corrections welcome, Michael

Mukund Sivaraman

7:35 a.m.

Hi Michael On Wed, Oct 02, 2019 at 02:42:37PM -0400, Michael Casadevall wrote:

...

Helium is the authoritative name for the li694-22 zone. Boron is the backup server and replicates via AXFR from helium. The li694-22 is not signed in any way. For these tests, I'm accessing the recursive resolvers from another machine, specifically neon, one of our database nodes.

Helium's named options relating to DNSSEC are only set as follows: named.conf.options: dnssec-validation auto;

When requesting hydrogen's AAAA record with DO=1:

mcasadevall@neon:~$ dig @helium.li694-22 hydrogen.li694-22 AAAA +dnssec

; <<>> DiG 9.9.5-3ubuntu0.15-Ubuntu <<>> @helium.li694-22 hydrogen.li694-22 AAAA +dnssec ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58352 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 3

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags: do; udp: 4096 ;; QUESTION SECTION: ;hydrogen.li694-22. IN AAAA

;; ANSWER SECTION: hydrogen.li694-22. 3600 IN AAAA 2600:3c00::f03c:91ff:fe6e:1ded

;; AUTHORITY SECTION: li694-22. 3600 IN NS helium.li694-22. li694-22. 3600 IN NS boron.li694-22.

;; ADDITIONAL SECTION: boron.li694-22. 3600 IN AAAA 2600:3c00::f03c:91ff:fe6e:c4bf helium.li694-22. 3600 IN AAAA 2600:3c00::f03c:91ff:fe6e:1d88

;; Query time: 0 msec ;; SERVER: 2600:3c00::f03c:91ff:fe6e:1d88#53(2600:3c00::f03c:91ff:fe6e:1d88) ;; WHEN: Wed Oct 02 17:45:52 UTC 2019 ;; MSG SIZE rcvd: 171

I get an AA record, but no signed results. This should be SERVFAIL because there's no chain from root. Notably, running a test against a known bad domain fails as expected:

Isn't "li694-22." a fake domain that only exists on your authoritative server "helium"? And it is unclear if you're also running a fake root zone as in the 1st case you'd described. It's not entirely clear without seeing all the zone's contents and nameserver config. Anyway, here you're querying helium directly for "hydrogen.li694-22./AAAA". helium is an authoritative for "li694-22." as you've noted - authoritative server algorithm does not perform DNSSEC validation (it is resolver algorithm that does). Basically helium is serving the unsigned "li694-22." zone in this case in isolation. It serves no DNSSEC records because none exist, and returns AA=1. The nameserver (one that has both authoritative and resolver functionality) prefers to return authoritative data when it is available over cached data. Mukund

Michael Casadevall

8:45 a.m.

Replies inline. On 10/3/19 3:35 AM, Mukund Sivaraman wrote:

...

Hi Michael

Isn't "li694-22." a fake domain that only exists on your authoritative server "helium"? And it is unclear if you're also running a fake root zone as in the 1st case you'd described. It's not entirely clear without seeing all the zone's contents and nameserver config.

Anyway, here you're querying helium directly for "hydrogen.li694-22./AAAA". helium is an authoritative for "li694-22." as you've noted - authoritative server algorithm does not perform DNSSEC validation (it is resolver algorithm that does). Basically helium is serving the unsigned "li694-22." zone in this case in isolation. It serves no DNSSEC records because none exist, and returns AA=1.

The nameserver (one that has both authoritative and resolver functionality) prefers to return authoritative data when it is available over cached data.

I admit that this isn't a perfect representation. The point of the case was to highlight potential abnormalities in handling cases where DNSSEC validation should (at least by spec) return SERVFAIL. We're dependent on a signed root to prevent anyone from hijacking the root servers either via BGP, server compromise, etc. Assuming a compromised root server, an attacker can return AA=1 for whatever they like. Assuming a cold cache scenario, a recursive resolver might try the following: Client -> Resolver: I want A IN for icann.org Resolver -> Root Server: icann.org A IN Compromised Root: Here's an authoritative answer for icann.org A IN Resolver: Oh, it's authoritative, I don't need to worry about DNSSEC. Client gets compromised record. In a compromised root scenario, the root servers can respond DO=0 and pretend that they're unsigned, or generate any sort of packet they want. If there's a case where dnssec validation is enabled, CD=0 and there's an escape hole that allows a record to come back to the stub resolver that isn't SERVFAIL, DNSSEC has effectively been undermined. Long story short, I'm seeing abnormalities and soft-fail behaviors regarding DNSSEC and the root. I'm currently rewriting my existing code to better create a baseline to work through to characterize this behavior to know if we've got an actual problem. Michael

Mukund Sivaraman

9:14 a.m.

Hi Michael On Thu, Oct 03, 2019 at 04:45:08AM -0400, Michael Casadevall wrote:

...

Replies inline.

On 10/3/19 3:35 AM, Mukund Sivaraman wrote:

...
Hi Michael

Isn't "li694-22." a fake domain that only exists on your authoritative server "helium"? And it is unclear if you're also running a fake root zone as in the 1st case you'd described. It's not entirely clear without seeing all the zone's contents and nameserver config.

Anyway, here you're querying helium directly for "hydrogen.li694-22./AAAA". helium is an authoritative for "li694-22." as you've noted - authoritative server algorithm does not perform DNSSEC validation (it is resolver algorithm that does). Basically helium is serving the unsigned "li694-22." zone in this case in isolation. It serves no DNSSEC records because none exist, and returns AA=1.

The nameserver (one that has both authoritative and resolver functionality) prefers to return authoritative data when it is available over cached data.

I admit that this isn't a perfect representation. The point of the case was to highlight potential abnormalities in handling cases where DNSSEC validation should (at least by spec) return SERVFAIL. We're dependent on a signed root to prevent anyone from hijacking the root servers either via BGP, server compromise, etc.

Nod, but do note that this particular case you've described is not a bug as validation does not occur. You're directly querying an authority.

...

Assuming a compromised root server, an attacker can return AA=1 for whatever they like. Assuming a cold cache scenario, a recursive resolver might try the following:

Client -> Resolver: I want A IN for icann.org

Resolver -> Root Server: icann.org A IN

Compromised Root: Here's an authoritative answer for icann.org A IN

Resolver: Oh, it's authoritative, I don't need to worry about DNSSEC.

Client gets compromised record.

In case of a validating resolver, what you are describing will not happen as it will try to construct either a secure or insecure path from the root to the data being validated. A validating resolver has pre-configured trust anchors for the root zone and expects the root zone to be signed. It will then attempt to follow a trust chain to whatever it is validating. Try to replicate what you're stating as an experiment: 1. Setup authoritative server serving the ICANN root zone. 2. Setup a *separate* validating resolver with the ICANN root zone trust anchor (should be present by default) that has its root hints configured to query your authoritative that serves the root zone. 3. Compromise the example.org. domain by adding records directly into your root zone, and any other changes you want to make to the root zone such as deleting the .org delegation to facilitate this. (The public example.org is a signed zone btw.) 4. At the validating resolver, try to query example.org./A and see if you get an AD=1 answer. If your root zone does not return a DS record for org. or example.org., it has to return a signed proof of non-existence. An attacker will not be able to create this signature without the root's private DNSKEY. Similarly for a delegation from org. -> example.org, and so on. It applies to any compromised nameserver, not just the root servers. Mukund

Michael Casadevall

10:55 a.m.

Replies inline. On 10/3/19 5:14 AM, Mukund Sivaraman wrote:

...

...
I admit that this isn't a perfect representation. The point of the case was to highlight potential abnormalities in handling cases where DNSSEC validation should (at least by spec) return SERVFAIL. We're dependent on a signed root to prevent anyone from hijacking the root servers either via BGP, server compromise, etc.

Nod, but do note that this particular case you've described is not a bug as validation does not occur. You're directly querying an authority.

This is arguable because the server is both authoritative and recursive. I'd expect identical behavior. I'll concede this point in general.

...

In case of a validating resolver, what you are describing will not happen as it will try to construct either a secure or insecure path from the root to the data being validated. A validating resolver has pre-configured trust anchors for the root zone and expects the root zone to be signed. It will then attempt to follow a trust chain to whatever it is validating.

Just as a sanity check, I poked a hole in boron's ACL and confirmed this behavior, although I got NXDOMAIN instead of SERVFAIL as the error which surprised me but makes sense since the root zone obviously doesn't have li694-22 so NSEC would dictate NXDOMAIN is the right response. Bind's log was as follows: Oct 03 11:31:45 junkyard named[2500]: running Oct 03 11:31:46 junkyard named[2500]: managed-keys-zone: Key 20326 for zone . acceptance timer complete: key now trusted Oct 03 11:31:46 junkyard named[2500]: resolver priming query complete Oct 03 11:31:47 junkyard named[2500]: resolver priming query complete Oct 03 11:31:47 junkyard named[2500]: validating li694-22/SOA: got insecure response; parent indicates it should be secure Oct 03 11:31:47 junkyard named[2500]: no valid RRSIG resolving 'helium.li694-22/DS/IN': 173.255.194.21#53 This confirms validating setup is infact working (and as an added bonus, is the default on Ubuntu 18.04 LTS). No surprise here, but now I know I can attempt to replicate setups.

...

Try to replicate what you're stating as an experiment:

1. Setup authoritative server serving the ICANN root zone.

2. Setup a *separate* validating resolver with the ICANN root zone trust anchor (should be present by default) that has its root hints configured to query your authoritative that serves the root zone.

3. Compromise the example.org. domain by adding records directly into your root zone, and any other changes you want to make to the root zone such as deleting the .org delegation to facilitate this. (The public example.org is a signed zone btw.)

4. At the validating resolver, try to query example.org./A and see if you get an AD=1 answer. I did infact do exactly this for most of yesterday and this morning and BIND on Ubuntu 18.04 properly returned SERVFAIL or NXDOMAIN no matter how I fiddled with the root. I did learn quite a bit more about how DNSSEC works in practice by manually fiddling with NSEC and other records in my test root.

I happy to be proven wrong and learn something along the way. I also did some testing with other DNS resolvers shipped in Ubuntu. I found unbound and BIND correctly do DNSSEC validation "out of the box", but PowerDNS Recursor I had to go through it's configuration files and edit them before I got DNSSEC validating behavior. I also noticed slight behavioral differences between Unbound and PDNS recursor on how it handled some invalid records but no place where it did the wrong thing. I also did some investigations with dnsmasq (which granted is a stub resolver) and found that it requires a lot of setup to do any DNSSEC validation correctly at all assuming it was even compiled with it. (the version in Ubuntu does have DNSSEC enabled, but doesn't ship with it configured; I need to further investigate things like Youti Linux and buildroot to see how they handle dnsmasq). I believe a lot of the earlier oddness I reported seeing was the behavior of directly asking BIND for information when it both a recursive resolver + authoritative zone and that confused me overall. I'm glad to have that mystery solved and have better confidence on the default behaviors of some resolvers. There's still likely a fair bit of value in strapping DNS servers to a bench and seeing how they behave, as well as looking deeper into how distributions are configuring and shipping popular DNS resolvers in regards to DNSSEC as well as looking deeper into router distribution firmwares and how they in turn handle DNSSEC configuration and similar questions. I'm going to likely sleep on this for a bit but feedback and places in which to look for low hanging fruit are most welcome. Thank you all for your patience, Michael

Mukund Sivaraman

9:24 a.m.

On Thu, Oct 03, 2019 at 04:45:08AM -0400, Michael Casadevall wrote:

...

Resolver: Oh, it's authoritative, I don't need to worry about DNSSEC.

Validating resolvers don't do this. BTW, this is the premise of DNSSEC - the end-to-end validation of data. It doesn't matter how you've received the data, whether through a trusted or untrusted party, whether through a secure or compromised nameserver. A validating client validates the _data_. Mukund

2430

Age (days ago)

2434

Last active (days ago)

List overview

Download

12 comments

6 participants

participants (6)

Joe Abley
Mallory Knodel
Michael Casadevall
Mukund Sivaraman
Paul Vixie
Wes Hardaker

Threat Mitigation for the Root Server System

tags

participants (6)