Comments on the "RSS Metrics WP presentation to RSSAC Workshop"
A big thank-you to Duane and Russ for getting this document out in time for Caucus comments. Here are a few from me; I hope they help the discussion at the workshop The repeated theme for the metrics in the document is "Thresholds for 'what good looks like'" are TBD. That is certainly what I remember from the calls and the face-to-face meeting in Prague: we didn't get anywhere close to agreement on any of them. In fact, we also didn't get anywhere on a process to fill in those TBDs. In the meetings, there were also (unanswered) questions whether the desired threshold is for "good" or "required". Given that the metrics work is supposed to feed back into RSSAC 037/038, that difference is vitally important, given that "good" will most likely be greater than "required". Looking at the published RSSAC002 data, it is obvious that, today, some RSOs carry more of the query load than others. Looking at the number of instances, and the placement of each instance in the Internet topology, some RSOs are more likely to fail sooner under a broad DDoS attack on the RSS than others. This makes the question of which metrics are "required" more difficult to analyze. Under "RSO Metric: Correctness", there is the statement "Each response is validated for correct DNSSEC signatures". On the last call and/or in the face-to-face meeting, a few people brought up correctness in the glue records as well because that would affect non-validating resolver operators, who make up the large majority of the customers of the RSS. In the calls, there appeared to be mostly confusion about why we were being asked about the RSS metrics. If a concrete use case in the 037/038 realm was given, I don't remember it. What I do remember is the confusion, which often boiled down to "if a resolver can consistently get acceptable answers from even one RSO, what the RSS as a whole looks like becomes immaterial to that resolver, so measuring the RSS seems unnecessary". If RSSAC wants RSS metrics, it will need to describe their need better than what we have at this point. --Paul Hoffman
On Mon 2019-04-22 15:51:40+0000 Paul wrote:
In the calls, there appeared to be mostly confusion about why we were being asked about the RSS metrics. If a concrete use case in the 037/038 realm was given, I don't remember it. What I do remember is the confusion, which often boiled down to "if a resolver can consistently get acceptable answers from even one RSO, what the RSS as a whole looks like becomes immaterial to that resolver, so measuring the RSS seems unnecessary".
I don't know if we can 'measure' the RSS, but we can certainly evaluate it based on the measurements gathered from all of the RSOs. The idea that "if a resolver can consistently get acceptable answers from even one RSO [then everything is ok]" idea was one that I was a proponent of as related to the question of whether or not the RSS as a whole was "online". I think there should be another metric, along the lines for "What is the state of the RSS as a whole". This could be a status (Green / Yellow / Orange / Red) based on N of M RSOs responding in a timely and correct manner. -- Robert Story <http://www.isi.edu/~rstory> USC Information Sciences Institute <http://www.isi.edu/>
On Apr 23, 2019, at 10:13 PM, Robert Story <rstory@isi.edu> wrote:
I think there should be another metric, along the lines for "What is the state of the RSS as a whole". This could be a status (Green / Yellow / Orange / Red) based on N of M RSOs responding in a timely and correct manner.
I agree that we need some form of metric in this context. From my perspective, it is a measurement taken by an observer, who might be in a data center or some other "central" point in the topology, and might be in a place of their own choosing and interest. The measurement platform that has been suggested seems like it could be some software downloadable from GitHub or equivalent and used by anyone in any location. We (ISC) get requests for server nodes from time to time, and wind up evaluating the service the party already gets and advising them on what an instance in their location might do for them. Such a measurement might automate part of that, or at least give people access to information from their own perspective. From my perspective, part of this would be a simple access to each of the 26 signal addresses (or however many there are at the time), which could be measured for - did I get a response at all? - was it correct? This might include DNSSEC, comparison to other responses, a "known-good" AXFR, etc. - if I did, what was the latency? In reporting it, the response data would be how many responses I got and whatever might be known about them. As you say, there might be an N of M component in that. But it might tell me, for example, that I don't have IPv6 (or IPv4) access where I am, or possibly in some upstream network, that some RSOs are being blocked, that responses are being mimicked or other wise doctored, etc. Yes, that is in at least some sense a measurement of an RSO being don't 26 times. That's at least one definition of the RSS, though. I personally worry about a situation in which we assume that access to exactly one RSO is equivalent to gaining access to the service; it is, but only in a limited sense. To me, the definition of "access to the RSS" includes being able to access any RSO and get a correct answer, regardless of whether I would *choose* to when latencies etc are taken into account. My two yen.
participants (3)
-
Fred Baker -
Paul Hoffman -
Robert Story