48 HOUR LAST CALL: UPDATED RSSAC Advisory on Metrics for the DNS Root Servers and Root Server System

Steve Sheng

Feb. 12, 2020

1:58 a.m.

Dear RSSAC Caucus, Following Duane's last email, the review period for the updated document has concluded. As we have not received additional feedback to the document, the co-chairs of the WP would like to put this document to the 48 hour last call status. Please provide any last feedback you have by close of business everywhere Friday 14 February 2020. Please also note that RSSAC0026v2 (the terminology document) is also being updated concurrently. So prior to the publication, there will be a consistency check of terms used in the metric document against those defined in RSSAC026. Best Steve On 2/4/20, 2:14 AM, "rssac-caucus on behalf of Wessels, Duane via rssac-caucus" <rssac-caucus-bounces@icann.org on behalf of rssac-caucus@icann.org> wrote: Dear RSSAC Caucus, As you know, the RSS Metrics document was recently finalized within the Caucus and to be voted on by RSSAC this week. However, we, the work party leaders, are proposing two small additions to the document in order to resolve some recently discovered deficiencies in the correctness checking rules of section 5.3. The attached PDF file shows the proposed new text, on pages 18 and 19. While we are somewhat hesitant to make these additions after having gone through a last call phase, we feel that these are important enough to get right for the first version. Briefly, the two changes are: (1) to exclude ARPA/NS queries, and (2) to require at least one "glue" RR in a delegation response. To be clear, if (1) is not addressed at this time, all RSIs would, in all likelihood, fail the correctness thresholds as written through no fault of their own. Therefore, we are proposing to delay the RSSAC vote on this document by one month and to request the Caucus' approval to proceed with these additions. Russ and I kindly ask you to focus only the new additions. It is not our intention to open up the entire document for editing or other additions at this time. Please provide any comments regarding the additions here on the list by Monday, February 10th. Duane/Russ

Attachments:

REVISEDFINALRSSMetricsDocument31JAN2020.pdf (application/pdf — 353.9 KB)
smime.p7s (application/pkcs7-signature — 4.6 KB)
ATT00001.txt (text/plain — 698 bytes)

Show replies by date

Geoff Huston

February 2020

1:25 a.m.

I am sorry to be late to the party here, but I have some time this morning and I was going through this document and I had some questions in my head after reading it. Now these three items have probably been discussed in some detail already, so I can quite appreciate that a response may well be “been there, thought of that, nothing more to see” but I thought I should check… 1. UDP Query discard rate It would be the normal expectation that each root server answers all UDP queries (“normal” in so far as it's a reasonable defence to a DOS attack to discard queries). The point is that root server instances should be maintained to provide a capacity of service that answers the “normal” load of root server queries. I could imagine a test of sending 10 (or some other not too small, not too large number) back-to-back queries to a root server and checking that all queries receive a response. A highly loaded server instance would not necessarily provide all 10 responses, while a server instance operating with its designed query load paramters would provide all the responses 2. TCP connection completion rate Similar to UDP Query Discard rate, but looking at the same rate using TCP connections to the server instance 3. ICMP Packet too Big Compliance Large DNS responses are troublesome to handle well. Is it a reasonable expectation that root server instances react to ICMP packet too big messages in both IPv4 and IPv6? thanks, Geoff

...

On 12 Feb 2020, at 12:58 pm, Steve Sheng <steve.sheng@icann.org> wrote:

Dear RSSAC Caucus,

Following Duane's last email, the review period for the updated document has concluded. As we have not received additional feedback to the document, the co-chairs of the WP would like to put this document to the 48 hour last call status.

Please provide any last feedback you have by close of business everywhere Friday 14 February 2020.

Please also note that RSSAC0026v2 (the terminology document) is also being updated concurrently. So prior to the publication, there will be a consistency check of terms used in the metric document against those defined in RSSAC026.

Best Steve

On 2/4/20, 2:14 AM, "rssac-caucus on behalf of Wessels, Duane via rssac-caucus" <rssac-caucus-bounces@icann.org on behalf of rssac-caucus@icann.org> wrote:

Dear RSSAC Caucus,

As you know, the RSS Metrics document was recently finalized within the Caucus and to be voted on by RSSAC this week. However, we, the work party leaders, are proposing two small additions to the document in order to resolve some recently discovered deficiencies in the correctness checking rules of section 5.3. The attached PDF file shows the proposed new text, on pages 18 and 19.

While we are somewhat hesitant to make these additions after having gone through a last call phase, we feel that these are important enough to get right for the first version. Briefly, the two changes are: (1) to exclude ARPA/NS queries, and (2) to require at least one "glue" RR in a delegation response. To be clear, if (1) is not addressed at this time, all RSIs would, in all likelihood, fail the correctness thresholds as written through no fault of their own.

Therefore, we are proposing to delay the RSSAC vote on this document by one month and to request the Caucus' approval to proceed with these additions. Russ and I kindly ask you to focus only the new additions. It is not our intention to open up the entire document for editing or other additions at this time.

Please provide any comments regarding the additions here on the list by Monday, February 10th.

Duane/Russ

<REVISED FINAL RSS Metrics Document 31 JAN 2020.pdf><smime.p7s>_______________________________________________ rssac-caucus mailing list rssac-caucus@icann.org https://mm.icann.org/mailman/listinfo/rssac-caucus

_______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on._______________________________________________ rssac-caucus mailing list rssac-caucus@icann.org https://mm.icann.org/mailman/listinfo/rssac-caucus

_______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.

Paul Hoffman

2:01 a.m.

On Feb 13, 2020, at 5:25 PM, Geoff Huston <gih@apnic.net> wrote:

...

I am sorry to be late to the party here, but I have some time this morning and I was going through this document and I had some questions in my head after reading it.

Now these three items have probably been discussed in some detail already, so I can quite appreciate that a response may well be “been there, thought of that, nothing more to see” but I thought I should check…

1. UDP Query discard rate

It would be the normal expectation that each root server answers all UDP queries (“normal” in so far as it's a reasonable defence to a DOS attack to discard queries). The point is that root server instances should be maintained to provide a capacity of service that answers the “normal” load of root server queries.

I could imagine a test of sending 10 (or some other not too small, not too large number) back-to-back queries to a root server and checking that all queries receive a response. A highly loaded server instance would not necessarily provide all 10 responses, while a server instance operating with its designed query load paramters would provide all the responses

2. TCP connection completion rate

Similar to UDP Query Discard rate, but looking at the same rate using TCP connections to the server instance

3. ICMP Packet too Big Compliance

Large DNS responses are troublesome to handle well. Is it a reasonable expectation that root server instances react to ICMP packet too big messages in both IPv4 and IPv6?

These are all interesting metrics, and I don't remember anyone bringing them up before now. For the first two, I don't see how they would be different than simply measuring for timeouts like we are now. Wouldn't a timeout indicate the kind of failure you are looking for? If not, we could certainly consider adding those tests. For the third, I would want to see a specific test plan before I could feel comfortable adding the test. What does "react" mean to you here? --Paul Hoffman

Wessels, Duane

6:33 p.m.

Hi Geoff, I think these could be considered for the next version of the RSSAC metrics document. Maybe that's what you had in mind anyway. I'm not sure. I know Paul has a "todo list" for the next version. Perhaps we can ask the ICANN staff to keep such a list so these don't get missed. DW

...

On Feb 13, 2020, at 5:25 PM, Geoff Huston <gih@apnic.net> wrote:

I am sorry to be late to the party here, but I have some time this morning and I was going through this document and I had some questions in my head after reading it.

Now these three items have probably been discussed in some detail already, so I can quite appreciate that a response may well be “been there, thought of that, nothing more to see” but I thought I should check…

1. UDP Query discard rate

It would be the normal expectation that each root server answers all UDP queries (“normal” in so far as it's a reasonable defence to a DOS attack to discard queries). The point is that root server instances should be maintained to provide a capacity of service that answers the “normal” load of root server queries.

I could imagine a test of sending 10 (or some other not too small, not too large number) back-to-back queries to a root server and checking that all queries receive a response. A highly loaded server instance would not necessarily provide all 10 responses, while a server instance operating with its designed query load paramters would provide all the responses

2. TCP connection completion rate

Similar to UDP Query Discard rate, but looking at the same rate using TCP connections to the server instance

3. ICMP Packet too Big Compliance

Large DNS responses are troublesome to handle well. Is it a reasonable expectation that root server instances react to ICMP packet too big messages in both IPv4 and IPv6?

thanks,

Geoff

...
On 12 Feb 2020, at 12:58 pm, Steve Sheng <steve.sheng@icann.org> wrote:

Dear RSSAC Caucus,

Following Duane's last email, the review period for the updated document has concluded. As we have not received additional feedback to the document, the co-chairs of the WP would like to put this document to the 48 hour last call status.

Please provide any last feedback you have by close of business everywhere Friday 14 February 2020.

Please also note that RSSAC0026v2 (the terminology document) is also being updated concurrently. So prior to the publication, there will be a consistency check of terms used in the metric document against those defined in RSSAC026.

Best Steve

On 2/4/20, 2:14 AM, "rssac-caucus on behalf of Wessels, Duane via rssac-caucus" <rssac-caucus-bounces@icann.org on behalf of rssac-caucus@icann.org> wrote:

Dear RSSAC Caucus,

As you know, the RSS Metrics document was recently finalized within the Caucus and to be voted on by RSSAC this week. However, we, the work party leaders, are proposing two small additions to the document in order to resolve some recently discovered deficiencies in the correctness checking rules of section 5.3. The attached PDF file shows the proposed new text, on pages 18 and 19.

While we are somewhat hesitant to make these additions after having gone through a last call phase, we feel that these are important enough to get right for the first version. Briefly, the two changes are: (1) to exclude ARPA/NS queries, and (2) to require at least one "glue" RR in a delegation response. To be clear, if (1) is not addressed at this time, all RSIs would, in all likelihood, fail the correctness thresholds as written through no fault of their own.

Therefore, we are proposing to delay the RSSAC vote on this document by one month and to request the Caucus' approval to proceed with these additions. Russ and I kindly ask you to focus only the new additions. It is not our intention to open up the entire document for editing or other additions at this time.

Please provide any comments regarding the additions here on the list by Monday, February 10th.

Duane/Russ

<REVISED FINAL RSS Metrics Document 31 JAN 2020.pdf><smime.p7s>_______________________________________________ rssac-caucus mailing list rssac-caucus@icann.org https://mm.icann.org/mailman/listinfo/rssac-caucus

_______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on._______________________________________________ rssac-caucus mailing list rssac-caucus@icann.org https://mm.icann.org/mailman/listinfo/rssac-caucus

_______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.

_______________________________________________ rssac-caucus mailing list rssac-caucus@icann.org https://mm.icann.org/mailman/listinfo/rssac-caucus

_______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.

Geoff Huston

6:47 p.m.

That’s fine Duane - at some point the document needs to be closed and outstandings pushed forward, and I’m happy with that approach. Geoff

...

On 19 Feb 2020, at 5:33 am, Wessels, Duane <dwessels@verisign.com> wrote:

Hi Geoff,

I think these could be considered for the next version of the RSSAC metrics document. Maybe that's what you had in mind anyway. I'm not sure.

I know Paul has a "todo list" for the next version. Perhaps we can ask the ICANN staff to keep such a list so these don't get missed.

DW

...
On Feb 13, 2020, at 5:25 PM, Geoff Huston <gih@apnic.net> wrote:

I am sorry to be late to the party here, but I have some time this morning and I was going through this document and I had some questions in my head after reading it.

Now these three items have probably been discussed in some detail already, so I can quite appreciate that a response may well be “been there, thought of that, nothing more to see” but I thought I should check…

1. UDP Query discard rate

It would be the normal expectation that each root server answers all UDP queries (“normal” in so far as it's a reasonable defence to a DOS attack to discard queries). The point is that root server instances should be maintained to provide a capacity of service that answers the “normal” load of root server queries.

I could imagine a test of sending 10 (or some other not too small, not too large number) back-to-back queries to a root server and checking that all queries receive a response. A highly loaded server instance would not necessarily provide all 10 responses, while a server instance operating with its designed query load paramters would provide all the responses

2. TCP connection completion rate

Similar to UDP Query Discard rate, but looking at the same rate using TCP connections to the server instance

3. ICMP Packet too Big Compliance

Large DNS responses are troublesome to handle well. Is it a reasonable expectation that root server instances react to ICMP packet too big messages in both IPv4 and IPv6?

thanks,

Geoff

...
On 12 Feb 2020, at 12:58 pm, Steve Sheng <steve.sheng@icann.org> wrote:

Dear RSSAC Caucus,

Following Duane's last email, the review period for the updated document has concluded. As we have not received additional feedback to the document, the co-chairs of the WP would like to put this document to the 48 hour last call status.

Please provide any last feedback you have by close of business everywhere Friday 14 February 2020.

Please also note that RSSAC0026v2 (the terminology document) is also being updated concurrently. So prior to the publication, there will be a consistency check of terms used in the metric document against those defined in RSSAC026.

Best Steve

On 2/4/20, 2:14 AM, "rssac-caucus on behalf of Wessels, Duane via rssac-caucus" <rssac-caucus-bounces@icann.org on behalf of rssac-caucus@icann.org> wrote:

Dear RSSAC Caucus,

As you know, the RSS Metrics document was recently finalized within the Caucus and to be voted on by RSSAC this week. However, we, the work party leaders, are proposing two small additions to the document in order to resolve some recently discovered deficiencies in the correctness checking rules of section 5.3. The attached PDF file shows the proposed new text, on pages 18 and 19.

While we are somewhat hesitant to make these additions after having gone through a last call phase, we feel that these are important enough to get right for the first version. Briefly, the two changes are: (1) to exclude ARPA/NS queries, and (2) to require at least one "glue" RR in a delegation response. To be clear, if (1) is not addressed at this time, all RSIs would, in all likelihood, fail the correctness thresholds as written through no fault of their own.

Therefore, we are proposing to delay the RSSAC vote on this document by one month and to request the Caucus' approval to proceed with these additions. Russ and I kindly ask you to focus only the new additions. It is not our intention to open up the entire document for editing or other additions at this time.

Please provide any comments regarding the additions here on the list by Monday, February 10th.

Duane/Russ

<REVISED FINAL RSS Metrics Document 31 JAN 2020.pdf><smime.p7s>_______________________________________________ rssac-caucus mailing list rssac-caucus@icann.org https://mm.icann.org/mailman/listinfo/rssac-caucus

_______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on._______________________________________________ rssac-caucus mailing list rssac-caucus@icann.org https://mm.icann.org/mailman/listinfo/rssac-caucus

_______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.

_______________________________________________ rssac-caucus mailing list rssac-caucus@icann.org https://mm.icann.org/mailman/listinfo/rssac-caucus

_______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.

Fred Baker

6:57 p.m.

...

On Feb 13, 2020, at 5:25 PM, Geoff Huston <gih@apnic.net> wrote:

I could imagine a test of sending 10 (or some other not too small, not too large number) back-to-back queries to a root server and checking that all queries receive a response. A highly loaded server instance would not necessarily provide all 10 responses, while a server instance operating with its designed query load paramters would provide all the responses

I'll echo Paul and Duane's comments here. On this one, I have a question of statistical validity. RFC 6928 recommends a TCP initial window of ten because that is a number that can be reasonably expected to traverse the open Internet if initiated as a burst. Matt has gone so far as to tell me that his measurements suggest that some TCP Offload Engines appear to successfully send bursts of 60K bytes or about 40 segments back to back. So I wonder whether we would learn anything from a ten packet burst - do we need a 100 packet burst, or something else, and for what reason do we need that? In any event, I think we would need something resembling suggested text, and some evidence that the measurement tests a case that eluded the existing tests. That's not "push back" as much as "what do we learn if we add this one?"

Geoff Huston

10:29 p.m.

...

On 19 Feb 2020, at 5:57 am, Fred Baker <fred@isc.org> wrote:

...
On Feb 13, 2020, at 5:25 PM, Geoff Huston <gih@apnic.net> wrote:

I could imagine a test of sending 10 (or some other not too small, not too large number) back-to-back queries to a root server and checking that all queries receive a response. A highly loaded server instance would not necessarily provide all 10 responses, while a server instance operating with its designed query load paramters would provide all the responses

I'll echo Paul and Duane's comments here. On this one, I have a question of statistical validity. RFC 6928 recommends a TCP initial window of ten because that is a number that can be reasonably expected to traverse the open Internet if initiated as a burst. Matt has gone so far as to tell me that his measurements suggest that some TCP Offload Engines appear to successfully send bursts of 60K bytes or about 40 segments back to back. So I wonder whether we would learn anything from a ten packet burst - do we need a 100 packet burst, or something else, and for what reason do we need that?

In any event, I think we would need something resembling suggested text, and some evidence that the measurement tests a case that eluded the existing tests. That's not "push back" as much as "what do we learn if we add this one?”

So in terms of what "we do we learn if we add this one", if I can address that first, is the extent to which individual service instances are “coping” with the query load that is imposed on them. The other questions are more about the details of the measurement. If a train of UDP packets is injected into the network what is the anticipated success rate of transmission through the network. Will 10 enjoy a higher success probability for all 10 packets than 100? and so on. The ‘signal’ that this measurement would be looking for is missing responses and the inference would be that an overloaded UDP service would load shed by discarding incoming queries. Its late in the process for this particular incarnation of the metrics document to bring this up and if we think that updating metrics of the RSS is an ongoing effort then another response may well be to keep this in mind for the next round of document revisions. Geoff

Brian Dickson

10:44 p.m.

On Tue, Feb 18, 2020 at 2:30 PM Geoff Huston <gih@apnic.net> wrote:

...

...
On 19 Feb 2020, at 5:57 am, Fred Baker <fred@isc.org> wrote:

...
On Feb 13, 2020, at 5:25 PM, Geoff Huston <gih@apnic.net> wrote:

I could imagine a test of sending 10 (or some other not too small, not too large number) back-to-back queries to a root server and checking that all queries receive a response. A highly loaded server instance would not necessarily provide all 10 responses, while a server instance operating with its designed query load paramters would provide all the responses

I'll echo Paul and Duane's comments here. On this one, I have a question of statistical validity. RFC 6928 recommends a TCP initial window of ten because that is a number that can be reasonably expected to traverse the open Internet if initiated as a burst. Matt has gone so far as to tell me that his measurements suggest that some TCP Offload Engines appear to successfully send bursts of 60K bytes or about 40 segments back to back. So I wonder whether we would learn anything from a ten packet burst - do we need a 100 packet burst, or something else, and for what reason do we need that?

In any event, I think we would need something resembling suggested text, and some evidence that the measurement tests a case that eluded the existing tests. That's not "push back" as much as "what do we learn if we add this one?”

So in terms of what "we do we learn if we add this one", if I can address that first, is the extent to which individual service instances are “coping” with the query load that is imposed on them.

The other questions are more about the details of the measurement. If a train of UDP packets is injected into the network what is the anticipated success rate of transmission through the network. Will 10 enjoy a higher success probability for all 10 packets than 100? and so on. The ‘signal’ that this measurement would be looking for is missing responses and the inference would be that an overloaded UDP service would load shed by discarding incoming queries.

Another thing to consider (especially with regards to load shedding or mitigation efforts): What about sending queries with DNS Cookies set, or differentially querying with/without those, when doing multiple UDP packet trains? (That would potentially provide useful data on loss rates and whether there is a suspicion of loss due to non-organic traffic levels, such as DOS/DDOS events, as well as discovery of support for DNS Cookies by instances.) Brian

...

Its late in the process for this particular incarnation of the metrics document to bring this up and if we think that updating metrics of the RSS is an ongoing effort then another response may well be to keep this in mind for the next round of document revisions.

Geoff

_______________________________________________ rssac-caucus mailing list rssac-caucus@icann.org https://mm.icann.org/mailman/listinfo/rssac-caucus

_______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.

2293

Age (days ago)

2299

Last active (days ago)

List overview

Download

7 comments

6 participants

participants (6)

Brian Dickson
Fred Baker
Geoff Huston
Paul Hoffman
Steve Sheng
Wessels, Duane

48 HOUR LAST CALL: UPDATED RSSAC Advisory on Metrics for the DNS Root Servers and Root Server System

tags

participants (6)