RSS Metrics work party -- purpose of metrics
Dear RSSAC Caucus, The RSS Metrics work party has been of two minds somewhat with the purpose of the proposed metrics. On one hand we want metrics that can be used to measure service levels. On the other, we want measurements that could be used to characterize end user experiences and for research into resolver behavior. The purpose of the metrics affects decisions, such as the location and operation of measurement vantage points. In order to make progress, the work party chairs and shepherds recommend that we limit the scope to metrics designed to measure service levels. If we can achieve consensus on this point, then the work party can advance to discussions and decisions about vantage point locations. Resolver behavior and other uses of metrics can be addressed in future revisions of this work or another work party if so desired. If there is any opposition to this proposal, please share your thoughts either on the list, or at the upcoming work party meeting on July 22 so we can discuss, reach consensus and ultimately make progress. Duane / Russ
I prefer another order. 1. Define metrics from any vantage points and how to measure. 2. Perform measurements for a while. (and see results) 3. Define service level. I think current service of root-servers is very good. For example, 10 root-server addresses (of 26) answers within 3 ms at my home. (22/26 answers within 130ms) # com/net nameservers case, 3/13 IPv4 servers answer within 3ms. # all IPv6 servers' RTT >= 107ms. (US west or farther location) I don't oppose metrics designed to measure service levels first. However, service level should not depend on vantage point locations. I proposed to measure and compare the servers of other domain names (for example, .net or cann.org) as well as the root servers in order to avoid the influence of the measurement point. For example, when smallest RTT to icann.org/.net/root servers is larger than 300ms, the vantage point has poor link to major part of the Internet. -- Kazunori Fujiwara, JPRS <fujiwara@jprs.co.jp>
From: "Wessels, Duane via rssac-caucus" <rssac-caucus@icann.org> Dear RSSAC Caucus,
The RSS Metrics work party has been of two minds somewhat with the purpose of the proposed metrics. On one hand we want metrics that can be used to measure service levels. On the other, we want measurements that could be used to characterize end user experiences and for research into resolver behavior. The purpose of the metrics affects decisions, such as the location and operation of measurement vantage points.
In order to make progress, the work party chairs and shepherds recommend that we limit the scope to metrics designed to measure service levels. If we can achieve consensus on this point, then the work party can advance to discussions and decisions about vantage point locations.
Resolver behavior and other uses of metrics can be addressed in future revisions of this work or another work party if so desired.
If there is any opposition to this proposal, please share your thoughts either on the list, or at the upcoming work party meeting on July 22 so we can discuss, reach consensus and ultimately make progress.
Duane / Russ
On Jul 21, 2019, at 10:52 PM, fujiwara@jprs.co.jp wrote:
I don't oppose metrics designed to measure service levels first. However, service level should not depend on vantage point locations.
Speaking strictly for myself, and quite happy to be considered wrong if that's a consensus. I agree. That said, I think a fair bit of the work party discussion has been about possible metrics that are "interesting" from the viewpoint of a Phd candidate writing a dissertation or someone trying to answer a question that is not fundamental, rather than having operational utility. What I would like to focus on first is operational utility. Phd candidates can measure anything they want at any time, using tools such as RIPE Atlas; I'm not sure that's true of anything business-related. Where I get a little crazy is a discussion of "service level agreements" when none are defined or actually being requested. If we're going to even discuss that, I would personally expect that we learn about what we can accurately measure, decide whether that has any utility in characterizing the service, and talk about service expectations and thresholds once we figure that out. In the terms of the document at hand, to my mind we want to - ensure that we are in fact delivering IANA-signed resource records - ensure that we are in fact delivering the latest available set of them - measure the timeliness of delivery within the context the RSO controls - measure RSO availability. To my mind, the first two can be measured anywhere and may as well be measured at or near the root server. The third and fourth are useful in characterizing an RSO in the context of the vantage point (e.g., the servers that are reachable via anycast from that vantage point), and characterize the *RSS* at that vantage point if they are the servers one would expect a resolver at the vantage point to use (which might mean minimum among the set or median among the set). I think the thing that Duane is pushing back on is what I would characterize as research questions - measurements that characterize the service from the viewpoint of a resolver somewhere else. There are several possible problems there, not the least of which is that one is necessarily measuring things that are not fixable by the RSO in question. That's not to say that such measurements are not interesting or useful; it's to say that they lend themselves to "gaming the system", as we have discussed, and are almost by definition an infinite set - there is always something new that someone might think of. It's also to push back on the concept of measuring user experience; to my way of thinking, the "users" of the RSS are primarily resolvers, of which there are thousands, not laptops or cell phones, of which there are bazillions. In the context of "delivering IANA-signed resource records", we are discussing verifying the IANA signature. I think there may also be value in proving that the RSO in question delivers *all* of the resource records that IANA includes in a zone transfer, and that it delivers *only* the resource records that the IANA would include in a zone transfer. Per reports in the media, there are attacks in the DOH space in which TXT resource records are uploaded and used to control botnets. We have also seen cases historically in which certain resource records have been removed from the system entirely by some entity or replaced with CNAMEs to other services; I'm referring to Google names in China, and similar events with the same or other names in other domains. RSSAC001 is fairly explicit in what set of resource records we are delivering (the current set of IANA-signed records, all of them and only them), and we should be measuring that we are meeting that expectation. My two yen...
Sending again using the correct source address... On Jul 21, 2019, at 10:52 PM, fujiwara@jprs.co.jp wrote:
I don't oppose metrics designed to measure service levels first. However, service level should not depend on vantage point locations.
Speaking strictly for myself, and quite happy to be considered wrong if that's a consensus. I agree. That said, I think a fair bit of the work party discussion has been about possible metrics that are "interesting" from the viewpoint of a Phd candidate writing a dissertation or someone trying to answer a question that is not fundamental, rather than having operational utility. What I would like to focus on first is operational utility. Phd candidates can measure anything they want at any time, using tools such as RIPE Atlas; I'm not sure that's true of anything business-related. Where I get a little crazy is a discussion of "service level agreements" when none are defined or actually being requested. If we're going to even discuss that, I would personally expect that we learn about what we can accurately measure, decide whether that has any utility in characterizing the service, and talk about service expectations and thresholds once we figure that out. In the terms of the document at hand, to my mind we want to - ensure that we are in fact delivering IANA-signed resource records - ensure that we are in fact delivering the latest available set of them - measure the timeliness of delivery within the context the RSO controls - measure RSO availability. To my mind, the first two can be measured anywhere and may as well be measured at or near the root server. The third and fourth are useful in characterizing an RSO in the context of the vantage point (e.g., the servers that are reachable via anycast from that vantage point), and characterize the *RSS* at that vantage point if they are the servers one would expect a resolver at the vantage point to use (which might mean minimum among the set or median among the set). I think the thing that Duane is pushing back on is what I would characterize as research questions - measurements that characterize the service from the viewpoint of a resolver somewhere else. There are several possible problems there, not the least of which is that one is necessarily measuring things that are not fixable by the RSO in question. That's not to say that such measurements are not interesting or useful; it's to say that they lend themselves to "gaming the system", as we have discussed, and are almost by definition an infinite set - there is always something new that someone might think of. It's also to push back on the concept of measuring user experience; to my way of thinking, the "users" of the RSS are primarily resolvers, of which there are thousands, not laptops or cell phones, of which there are bazillions. In the context of "delivering IANA-signed resource records", we are discussing verifying the IANA signature. I think there may also be value in proving that the RSO in question delivers *all* of the resource records that IANA includes in a zone transfer, and that it delivers *only* the resource records that the IANA would include in a zone transfer. Per reports in the media, there are attacks in the DOH space in which TXT resource records are uploaded and used to control botnets. We have also seen cases historically in which certain resource records have been removed from the system entirely by some entity or replaced with CNAMEs to other services; I'm referring to Google names in China, and similar events with the same or other names in other domains. RSSAC001 is fairly explicit in what set of resource records we are delivering (the current set of IANA-signed records, all of them and only them), and we should be measuring that we are meeting that expectation. My two yen...
On Jul 22, 2019, at 1:46 AM, Fred Baker <fred@isc.org> wrote:
I think there may also be value in proving that the RSO in question delivers *all* of the resource records that IANA includes in a zone transfer, and that it delivers *only* the resource records that the IANA would include in a zone transfer.
Let me follow up on this. I'm looking at two reports: https://dl.acm.org/authorize.cfm?key=N687421 from https://irtf.org/anrw/2019/program.html Reports in the DoH space about downloading the IANA zone and distributing it in that fashion, and the service also being used to store TXT records used by botnets (https://www.bleepingcomputer.com/news/security/new-spam-campaign-controlled-...). The first is not quite an assertion that it is known that DNS requests and/or responses are being changed in flight, but it states that the researchers in question had a "large" service to test against that was doing that. DNSSEC is probably the proper test for that - data changes in flight would no longer have a valid DNSSEC signature. The second reports that the Google DOH server is being used as a repository and server for TXT records controlling botnets. I'm certain that Google would immediately respond that it is not their intention to support that. That said, there is evidence that it is in fact the case. A relatively simple solution to that might be to take a note out of the OSPF MIB, which reports a checksum the LSA Database. I imagine that a DNS server might hold more than one zone, and *a* zone that it holds might be the TXT records in question. But when a DNS Server is queried, the request doesn't specify a zone; it merely asks whether the server has a resource record for the indicated name. So I find myself thinking of something a little more complicated than a statement regarding the root zone. Imagine: A response listing the zones stored at the server (presumably attained by an XFR), and for each such zone, The relevant sequence number, the number of RRs held, and an MD5 checksum of those RRs. For the purposes of the checksum, the TTL in each record needs to be zeroed. The probe can now ask for that table and compare it to what it believes to be correct. It could also contain the sequence number of the dataset transferred in the XFR. Like the OSPF LSA Database, this should at least in theory be a quick and almost null check - "do you have the right answer? OK, yup". However, in the unusual case that the answer is incorrect we can know quickly and without a lot of doubt. If there is support for that, I'd be happy to sketch text into the Google Doc.
Hi Fred, I have a couple of comments. The "Who is answering my queries" research is indeed interesting. It focuses on interception between stub and recursive, and looks only at large recursive providers. I don't think we should draw the conclusion that the level of interception observed there (10-30%) also exists in the recursive-to-authoritative path. What you propose with respect to checksum sounds an awful lot like the zone message digest internet draft (aka ZONEMD) that I and others have been working on. https://tools.ietf.org/html/draft-ietf-dnsop-dns-zone-digest-00. As envisioned in the ZONEMD draft, a name server would verify the digest before loading the zone for service. If I understand your proposal correctly, you're thinking more along the lines of calculating the digest on demand when it is queried for. I suppose we'd have to have a conversation with operators and implementors if they wanted to incur the CPU hit for that. IMO such checks could alert us to unwanted changes made by malicious third parties. However, if an RSO itself desired to serve incorrect data then it could do so while still providing good checksums. I'm not sure there is a reasonable way to catch those, and I don't think this work party should try to do so at this time. DW
On Jul 23, 2019, at 7:06 PM, Fred Baker <fred@isc.org> wrote:
On Jul 22, 2019, at 1:46 AM, Fred Baker <fred@isc.org> wrote:
I think there may also be value in proving that the RSO in question delivers *all* of the resource records that IANA includes in a zone transfer, and that it delivers *only* the resource records that the IANA would include in a zone transfer.
Let me follow up on this. I'm looking at two reports:
https://dl.acm.org/authorize.cfm?key=N687421 from https://irtf.org/anrw/2019/program.html
Reports in the DoH space about downloading the IANA zone and distributing it in that fashion, and the service also being used to store TXT records used by botnets (https://www.bleepingcomputer.com/news/security/new-spam-campaign-controlled-...).
The first is not quite an assertion that it is known that DNS requests and/or responses are being changed in flight, but it states that the researchers in question had a "large" service to test against that was doing that. DNSSEC is probably the proper test for that - data changes in flight would no longer have a valid DNSSEC signature.
The second reports that the Google DOH server is being used as a repository and server for TXT records controlling botnets. I'm certain that Google would immediately respond that it is not their intention to support that. That said, there is evidence that it is in fact the case.
A relatively simple solution to that might be to take a note out of the OSPF MIB, which reports a checksum the LSA Database. I imagine that a DNS server might hold more than one zone, and *a* zone that it holds might be the TXT records in question. But when a DNS Server is queried, the request doesn't specify a zone; it merely asks whether the server has a resource record for the indicated name. So I find myself thinking of something a little more complicated than a statement regarding the root zone. Imagine:
A response listing the zones stored at the server (presumably attained by an XFR), and for each such zone, The relevant sequence number, the number of RRs held, and an MD5 checksum of those RRs. For the purposes of the checksum, the TTL in each record needs to be zeroed.
The probe can now ask for that table and compare it to what it believes to be correct. It could also contain the sequence number of the dataset transferred in the XFR.
Like the OSPF LSA Database, this should at least in theory be a quick and almost null check - "do you have the right answer? OK, yup". However, in the unusual case that the answer is incorrect we can know quickly and without a lot of doubt.
If there is support for that, I'd be happy to sketch text into the Google Doc. _______________________________________________ rssac-caucus mailing list rssac-caucus@icann.org https://mm.icann.org/mailman/listinfo/rssac-caucus
_______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
I'm happy enough to use your draft. I haven't read it yet, but one hopes it will have no shortage of review. I do see a threat in services acting as RSOs - whether recognized and in the root hints or not - delivering data that is unsigned or signed by another party. We know that it's happening right now, and it's pretty basic to root service.
On Jul 25, 2019, at 11:34 AM, Wessels, Duane <dwessels@verisign.com> wrote:
Hi Fred,
I have a couple of comments.
The "Who is answering my queries" research is indeed interesting. It focuses on interception between stub and recursive, and looks only at large recursive providers. I don't think we should draw the conclusion that the level of interception observed there (10-30%) also exists in the recursive-to-authoritative path.
What you propose with respect to checksum sounds an awful lot like the zone message digest internet draft (aka ZONEMD) that I and others have been working on. https://tools.ietf.org/html/draft-ietf-dnsop-dns-zone-digest-00.
As envisioned in the ZONEMD draft, a name server would verify the digest before loading the zone for service. If I understand your proposal correctly, you're thinking more along the lines of calculating the digest on demand when it is queried for. I suppose we'd have to have a conversation with operators and implementors if they wanted to incur the CPU hit for that.
IMO such checks could alert us to unwanted changes made by malicious third parties. However, if an RSO itself desired to serve incorrect data then it could do so while still providing good checksums. I'm not sure there is a reasonable way to catch those, and I don't think this work party should try to do so at this time.
DW
On Jul 23, 2019, at 7:06 PM, Fred Baker <fred@isc.org> wrote:
On Jul 22, 2019, at 1:46 AM, Fred Baker <fred@isc.org> wrote:
I think there may also be value in proving that the RSO in question delivers *all* of the resource records that IANA includes in a zone transfer, and that it delivers *only* the resource records that the IANA would include in a zone transfer.
Let me follow up on this. I'm looking at two reports:
https://dl.acm.org/authorize.cfm?key=N687421 from https://irtf.org/anrw/2019/program.html
Reports in the DoH space about downloading the IANA zone and distributing it in that fashion, and the service also being used to store TXT records used by botnets (https://www.bleepingcomputer.com/news/security/new-spam-campaign-controlled-...).
The first is not quite an assertion that it is known that DNS requests and/or responses are being changed in flight, but it states that the researchers in question had a "large" service to test against that was doing that. DNSSEC is probably the proper test for that - data changes in flight would no longer have a valid DNSSEC signature.
The second reports that the Google DOH server is being used as a repository and server for TXT records controlling botnets. I'm certain that Google would immediately respond that it is not their intention to support that. That said, there is evidence that it is in fact the case.
A relatively simple solution to that might be to take a note out of the OSPF MIB, which reports a checksum the LSA Database. I imagine that a DNS server might hold more than one zone, and *a* zone that it holds might be the TXT records in question. But when a DNS Server is queried, the request doesn't specify a zone; it merely asks whether the server has a resource record for the indicated name. So I find myself thinking of something a little more complicated than a statement regarding the root zone. Imagine:
A response listing the zones stored at the server (presumably attained by an XFR), and for each such zone, The relevant sequence number, the number of RRs held, and an MD5 checksum of those RRs. For the purposes of the checksum, the TTL in each record needs to be zeroed.
The probe can now ask for that table and compare it to what it believes to be correct. It could also contain the sequence number of the dataset transferred in the XFR.
Like the OSPF LSA Database, this should at least in theory be a quick and almost null check - "do you have the right answer? OK, yup". However, in the unusual case that the answer is incorrect we can know quickly and without a lot of doubt.
If there is support for that, I'd be happy to sketch text into the Google Doc. _______________________________________________ rssac-caucus mailing list rssac-caucus@icann.org https://mm.icann.org/mailman/listinfo/rssac-caucus
_______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
participants (4)
-
Fred Baker -
Fred Baker -
fujiwara@jprs.co.jp -
Wessels, Duane