On Feb 13, 2020, at 5:25 PM, Geoff Huston <gih@apnic.net> wrote:
I am sorry to be late to the party here, but I have some time this morning and I was going through this document and I had some questions in my head after reading it.
Now these three items have probably been discussed in some detail already, so I can quite appreciate that a response may well be “been there, thought of that, nothing more to see” but I thought I should check…
1. UDP Query discard rate
It would be the normal expectation that each root server answers all UDP queries (“normal” in so far as it's a reasonable defence to a DOS attack to discard queries). The point is that root server instances should be maintained to provide a capacity of service that answers the “normal” load of root server queries.
I could imagine a test of sending 10 (or some other not too small, not too large number) back-to-back queries to a root server and checking that all queries receive a response. A highly loaded server instance would not necessarily provide all 10 responses, while a server instance operating with its designed query load paramters would provide all the responses
2. TCP connection completion rate
Similar to UDP Query Discard rate, but looking at the same rate using TCP connections to the server instance
3. ICMP Packet too Big Compliance
Large DNS responses are troublesome to handle well. Is it a reasonable expectation that root server instances react to ICMP packet too big messages in both IPv4 and IPv6?
These are all interesting metrics, and I don't remember anyone bringing them up before now. For the first two, I don't see how they would be different than simply measuring for timeouts like we are now. Wouldn't a timeout indicate the kind of failure you are looking for? If not, we could certainly consider adding those tests. For the third, I would want to see a specific test plan before I could feel comfortable adding the test. What does "react" mean to you here? --Paul Hoffman