On Wed, Jan 10, 2018 at 5:33 AM, Petr Špaček <petr.spacek@nic.cz> wrote:
On 5.1.2018 23:12, David Conrad wrote:
On January 5, 2018 at 2:06:10 AM, S Moonesamy (sm+icann@elandsys.com <mailto:sm+icann@elandsys.com>) wrote:
The plan was put on hold because of the data from September 2017. At the moment it is unknown if/when there will be a KSK roll. Is not doing a KSK roll by 2020 [1] a viable option?
Speaking personally, I’m hoping we can do the rollover long before 2020. The key is for the community to provide some sort of guidance to the ICANN Org about how to move forward. So far, my impression is that to date, most of the input from this mailing list has been “do it now”, implying we do NOT need to assess "the impact on users” (as mentioned in https://www.icann.org/news/blog/update-on-the-root-ksk- rollover-project). This means that the plan that will be published on 31 January for public comment will say the input we have received suggests the majority of contributors do not believe we need to take potential negative impact of the KSK rollover into account.
I think this is misunderstanding. I haven't seen anyone saying that "we [do not] need to take potential negative impact of the KSK rollover into account", rather than "people will fix it if it really breaks".
Let me state my interpretation of the discussion (in the following text, "contributors" reads "me"):
Contributors believe that there is no way to reliably measure readiness for the rollover, and that tools for such measurement will not be available in upcoming years.
--- While not having reliable data, contributors believe that KSK rollover process already got sufficient publicity and that breakage will be dealt with swiftly, similarly to other security issues or DDoS attacks. For these reasons risk of postponing KSK rollover indefinitely is deemed to be higher than risk of breakage which will be fixed using usual methods. ---
I hope it helps to explain how others might read this discussion.
I think there will always be breakage, in the old pre-RFC5011 and KSK design discussions there was one case identified as non-solvable --- old OS/Box comes alive i.e. I think we now have a second class of failures that was not "anticipated" -- non-persistence i.e. resolver can not store state in a way that will be used if resolver is restarted. -- operators hard code keys i.e. disable RFC5011 (trusted-keys vs managed-keys) RFC5011 assumes that timings and state of keys can be stored and will survive reboot/restart, this seems to be violated by some operators by design (i.e. configuration information is non-writeable by the Resolver process) and in some cases a mixture of the old OS and use of modern technologies like containers. Having said this I'm going to argue that we should proceed with roll by picking a day and generating a PR outreach to try to minimize outages. Olafur