Re: [ksk-rollover] Starting discussion on acceptable criteria for proceeding with the root KSK roll

Jan. 11, 2018

      On Wed, Jan 10, 2018 at 5:33 AM, Petr Špaček <petr.spacek@nic.cz> wrote:
...
On 5.1.2018 23:12, David Conrad wrote:
...
On January 5, 2018 at 2:06:10 AM, S Moonesamy (sm+icann@elandsys.com
<mailto:sm+icann@elandsys.com>) wrote:
...
The plan was put on hold because of the
data from September 2017. At the moment it is
unknown if/when there will be a KSK roll. Is not
doing a KSK roll by 2020 [1] a viable option?
Speaking personally, I’m hoping we can do the rollover long before 2020.
The key is for the community to provide some sort of guidance to the
ICANN Org about how to move forward. So far, my impression is that to
date, most of the input from this mailing list has been “do it now”,
implying we do NOT need to assess "the impact on users” (as mentioned
in https://www.icann.org/news/blog/update-on-the-root-ksk-
rollover-project).
This means that the plan that will be published on 31 January for public
comment will say the input we have received suggests the majority of
contributors do not believe we need to take potential negative impact of
the KSK rollover into account.
I think this is misunderstanding. I haven't seen anyone saying that "we
[do not] need to take potential negative impact of the KSK rollover into
account", rather than "people will fix it if it really breaks".
Let me state my interpretation of the discussion (in the following text,
"contributors" reads "me"):
Contributors believe that there is no way to reliably measure readiness
for the rollover, and that tools for such measurement will not be
available in upcoming years.
---
While not having reliable data, contributors believe that KSK rollover
process already got sufficient publicity and that breakage will be dealt
with swiftly, similarly to other security issues or DDoS attacks. For
these reasons risk of postponing KSK rollover indefinitely is deemed to
be higher than risk of breakage which will be fixed using usual methods.
---
I hope it helps to explain how others might read this discussion.
I think there will always be breakage, in the old pre-RFC5011 and KSK
design discussions there was one case identified as non-solvable
 --- old OS/Box comes alive i.e.
I think we now have a second class of failures that was not "anticipated"
  -- non-persistence i.e. resolver can not store state in a way that will
be used if resolver is restarted.
  -- operators hard code keys i.e. disable RFC5011 (trusted-keys vs
managed-keys)

RFC5011 assumes that timings and state of keys can be stored and will
survive reboot/restart,
this seems to be violated by some operators by design (i.e. configuration
information is non-writeable by the Resolver process)
and in some cases a mixture of the old OS and use of modern technologies
like containers.

Having said this I'm going to argue that we should proceed with roll by
picking a day and
generating a PR outreach to try to minimize outages.

     Olafur

Re: [ksk-rollover] Starting discussion on acceptable criteria for proceeding with the root KSK roll

Ólafur Guðmundsson