suggestions for deciding on key roll timing
I just mentioned this at a mic, and I'm re-broadcasting it here: I believe we should consider two different elements separately for deciding future key-roll strategies: 1) When we should perform the next roll 2) What frequency we should use between future rolls after the next And for #1, I believe we shouldn't be thinking about time at all, but rather we should create two lists of MUSTs and MAYs for rolling next, and ensures all MUSTs are handled. The MUST and MAY list should include both "do this" type entries, as well as "think about this (again)" type entries. After thinking about the later "think about" items, they may, of course, turn into "do this" type entries. tl;dr: we need a list of blockers that must be dealt with before rolling. -- Wes Hardaker USC/ISI
On 28 Mar 2019, at 9:42 am, Wes Hardaker <wjhns1@hardakers.net> wrote:
I just mentioned this at a mic, and I'm re-broadcasting it here:
ditto - here’s what I croaked at the mic. Learned Lessons: * We rolled the KSK * The roll was not without impact (e.g. at least a few hundred million users were affected when their ISP's resolver turned off DNSSEC validation at critical times in the roll process) * This was not a rehearsal for a catastrophic and unexpected compromise of the KSK. It was a limited exercise in demonstrating that, albeit with some collateral damage, the KSK is malleable under certain conditions * We have some issues with large UDP responses in the DNS. * The DNS is determinedly opaque * Legacy is an issue * Trust Key management procedures operate in highly constrained scenarios * Tolerance for risk is highly variable - there is no point that all parties can clearly tolerate * The DNS continues to be surprising
My own mic comments: 1. Has there been any research yet as to how much overlap exists between the IP addresses that were sending spurious DNSKEY queries after the revocation, and the ones that had already been found via telemetry not to have rolled the keys? This information would be useful in determining how good or bad our telemetry is. 2. Were there significant operational problems observed from the presence of a standby key during the rollover period? (Geoff said yes, there was some impact measured; Paul noted it did not rise to the level of complaints to ICANN.) Given that the effect seems to have been modest, I'd like to suggest reconsideration of the decision not to have a standby key in the zone except during key rolls. 3. A likely candidate for at least some of the spurious DNSKEY traffic is a bug in BIND that was fixed over four years ago. Is there anything we can do to hasten the obsolescence of old broken servers that misbehave during rollovers? Would an algorithm upgrade help, perhaps? 4. While the 2018-2019 key roll did have some unexpected results, they were quite manageable by the existing provisioning, and shouldn't frighten us away from doing it again. Sooner is better, it gets us more data. -- Evan Hunt -- each@isc.org Internet Systems Consortium, Inc.
On Thu 2019-03-28 01:42:35-0700 Wes wrote:
I just mentioned this at a mic, and I'm re-broadcasting it here:
I believe we should consider two different elements separately for deciding future key-roll strategies:
1) When we should perform the next roll 2) What frequency we should use between future rolls after the next
And for #1, I believe we shouldn't be thinking about time at all, but rather we should create two lists of MUSTs and MAYs for rolling next, and ensures all MUSTs are handled.
I agree that we need requirements, but I think the time discussion should happen in parallel. There will still be plenty of time to delay the next rollover if unresolved blockers come up in the requirements. Robert -- Robert Story <http://www.isi.edu/~rstory> USC Information Sciences Institute <http://www.isi.edu/>
participants (4)
-
Evan Hunt -
Geoff Huston -
Robert Story -
Wes Hardaker