Ken,

Thanks for this.  After reading the document and your email, I have the following comments and questions.
  1. I believe I understand the general idea of measuring both availability and latency to each of the various root server identities (RSIs) multiple times over a 30 minute period.  I had a bit of trouble understanding the precise specifications.  I am completely comfortable with math and math notation, so my trouble is due to some missing pieces in the description.  I suspect this is easily remedied with a few more words.

  2. What is the justification for the numbers, e.g. 20 times, σ = .65, etc?

  3. What conclusion would you draw if the local measurements show there is extremely high availability but latency varies between very low and not so low.  For example, there might be one RSI that is topologically close but suffers intermittent availability and other RSIs that are not as close but extremely reliable?

  4. Adding one or more additional root servers is one way to improve both reliability and availability.  Installing local root service is another way to accomplish the same goals.  How do you expect the measurement process described in this document will affect discussions regarding these two approaches?
Thanks,

Steve


On Sun, Nov 22, 2020 at 10:16 AM Ken Renard <kdrenard2@gmail.com> wrote:
All:

I have brought over the user narrative concerning RSOs wanting to determine underserved areas to the current working document.  Along the way, I overhauled the description, hopefully making it more concise and readable.  I think I have addressed all comments from the Narratives document.  The new text is included below, but the google document will contain the latest edits.  I would appreciate any comments or feedback on this topic and look forward to discussing it with the group at the work party meeting tomorrow (Monday).

https://docs.google.com/document/d/11slZDTqrcwTwywpbi3JwHuU_FoaoN54u0f3B2UFjUv4/edit#heading=h.b3las58y90hk

-Ken Renard


While the RSS has relatively good global coverage, RSOs are interested in deploying additional instances, particularly to areas that are considered “underserved”.  An underserved area has a reasonably-sized user base that perceives poor performance of the RSS due to its closest root server instances being topologically distant, resulting in high latency or low availability of the service.  This is a subjective designation and it is not reasonable to expect that a tool could directly measure it.  The metric described in this section serves only to inform a decision on placing new root server instances in an underserved area.  A RSO would require multiple measurements run in diverse geographical locations.


The goal of this metric is to assess performance of the RSS at a measurement point compared to other measurement points.  A local metric that is much worse than others would be an indicator of an underserved area.  Availability and latency are the primary measurements and the impact of slow links should be discarded.  Since metric is trying to assess the performance of the RSS versus individual RSIs (Root Server Identities), only the best NRSI measurements should be used.  The measurements are direct queries to the individual root servers (versus via a recursive resolver).


A set of measurements should be taken at random intervals during a T=30 minute time period.  All RSIs will be directly queried and their latencies recorded.  Additionally, a normalization factor σ will be measured as the average latency of direct DNS queries to the first NTLD=20 somehow-chosen TLDs.  If fewer than NTLD normalization measurements are available, the entire set of measurements are thrown away.  Availability of an RSI is the success rate of queries to an RSI over the last 20 measurement periods that were not thrown away.


The performance metric is the inverse of the average of the best NRSI=3 RSI latencies, where the availability for each of those RSIs is > p=0.65, divided by the normalization factor.  


Availability of an RSI: percentage of successful responses over the past 20 measurements

Normalization factor:  σ = mean latency of NTLD queries

Normalized Latency  = (best NRSI latencies where availability > p) / σ

Performance Metric = 1 / Normalized Latency


If fewer than NRSI latency measurements are available (due to timeouts), the performance metric should be multiplied by Navail / NRSI, where Navail is the number of RSIs that have availability > p.


When comparing performance metrics among locations, the last 20 successful performance metrics should be averaged before comparison.  The bottom P percentile among a large set of local performance metrics or performance metrics below a certain threshold (Θ) may be used to inform a determination of an underserved area.  Determining P or Θ will require further analysis and will not be defined in this document. 


_______________________________________________
rssac-caucus mailing list
rssac-caucus@icann.org
https://mm.icann.org/mailman/listinfo/rssac-caucus

_______________________________________________
By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.