root zone KSK rollover operations workshop planning
Greetings, Some notes about our upcoming operational discussions regarding future root zone KSK rollover follow, including a heartfelt request for input on the agenda. Objective The purpose of this workshop is to discuss the operational considerations of future root zone KSK rollovers, to inform the process of planning and executing with maximum visibility and minimum opportunity for disruption. Logistics Attendance: Open, remote participation available Location: Brentwood Room Hyatt Regency Century Plaza, LA (ICANN 51 meeting venue) Date: Thursday, October 16, 2014 Time: TBA (will pass on details as soon as we have them) Remote Participation: TBA (same with this) Agenda Your input and guidance is humbly requested to flesh out a high-level agenda. Possible topics include, in no particular order: - operational gaps in existing validator infrastructure - public communications (who needs to know, how should they be told) - usefulness of a public testbed (what could it tell us, who might use it) - other opportunities for testing prior to a KSK roll - timing and mechanics of the actual KSK roll - how to tell if things Start Going Badly Wrong - opportunities for rollback if Things Go Badly Wrong - factors related to future KSK roll schedules - opportunities for measurement before, during and after a KSK roll - future venues for similar or related discussion Please feel free to add to this list, or to add meat to any of these high-level topics. Once we have the final details of the start and end times, we can prioritise and tetris topics into the available window. The format of the workshop is intended to be that of a guided discussion, rather than a set of canned presentations. Notes will be taken and published after the workshop, without attribution. Go. :-) Joe
Joe, all,
The purpose of this workshop is to discuss the operational considerations of future root zone KSK rollovers, to inform the process of planning and executing with maximum visibility and minimum opportunity for disruption.
great, many thanks for taking the lead and organizing this. Without bothering this list with layer 9 issues too much, and at the very high risk of having missed previous discussion on the issue, where can I find some info on the group/org(s) behind this, a timeline and the intended way of decision making (to the extent applicable)? Best regards, Peter
Hi Peter, On Sep 18, 2014, at 12:17 AM, Peter Koch <pk@DENIC.DE> wrote:
where can I find some info on the group/org(s) behind this,
Not entirely sure what you mean. ICANN is (obviously) sponsoring the workshops and I think I can say folks involved in managing the root KSK are interested in the outcome. Do you mean who is participating in the workshops?
a timeline and the intended way of decision making (to the extent applicable)?
At this point, we’re in exploratory mode, primarily focused on understanding the potential issues. Neither a timeline nor a decision making process have been nailed down (as far as I know). Regards, -drc
Hi David,
Do you mean who is participating in the workshops?
that's probably too early to tell, isn't it. I was looking for the "forces" behind this and I read your response as "an ad-hoc group".
At this point, we?re in exploratory mode, primarily focused on understanding the potential issues. Neither a timeline nor a decision making process have been nailed down (as far as I know).
An interesting question - that should not derail any effort on the ops/tech side of things - is which function is in charge. Oversight and such, topic of the month. -Peter
Hi Peter, On 19 Sep 2014, at 3:08, Peter Koch <pk@denic.de> wrote:
Do you mean who is participating in the workshops?
that's probably too early to tell, isn't it. I was looking for the "forces" behind this and I read your response as "an ad-hoc group".
The context of this effort is not a secret and probably not very surprising. There was some momentum back around the time of the last Berlin IETF to get moving on planning and outreach on the topic of rolling the root zone KSK. I was working for ICANN at that time, and was in some of those meetings. The momentum decayed to near zero around the time when the IANA functions' future detachment from the US government started to be discussed. I don't mean to imply any causality there, because I have no useful, factual insight, and left ICANN around that time anyway. More recently, David Conrad rejoined ICANN, and is putting effort into rejuvenating the project. He asked Roy, Jakob, Dan and me to help out and, distracted by his winning smile, we accidentally said yes. And here we are. I agree with you that the technical/operational aspects that we are here to discuss, and any recommendations we come up with, are orthogonal to layer-9 oversight and governance discussions going on elsewhere. Let's not get distracted :-) Joe
Peter, On Sep 19, 2014, at 12:08 AM, Peter Koch <pk@DENIC.DE> wrote:
I was looking for the "forces” behind this and I read your response as "an ad-hoc group”.
Joe answered this pretty well. Especially the winning smile bit.
At this point, we?re in exploratory mode, primarily focused on understanding the potential issues. Neither a timeline nor a decision making process have been nailed down (as far as I know).
An interesting question - that should not derail any effort on the ops/tech side of things - is which function is in charge. Oversight and such, topic of the month.
For the purposes of the discussion related the implications and effects of key rollover, I don’t think who is in charge/who provides oversight is particularly relevant. When we get to the point of actually changing the key, those questions become more important, but I don’t think we’re there yet (e.g., there are some that argue the key shouldn’t be changed — if that argument wins, questions related to who’s in charge don’t matter). Regards, -drc
David,
Joe answered this pretty well. Especially the winning smile bit.
I wish we could elaborate on that a bit further ;-)
For the purposes of the discussion related the implications and effects of key rollover, I don?t think who is in charge/who provides oversight is particularly relevant. When we get to the point of actually changing the key, those questions become more important, but I don?t think we?re there yet (e.g., there are some that argue the key shouldn?t be changed ? if that argument wins, questions related to who?s in charge don?t matter).
Well, if any argument "wins", then that's a decision in the making and the question who is in charge to call for that decision and determine consensus isn't really part of the equivalence class "don't matter". I'll pause now, though. -Peter
On Sep 18, 2014, at 12:17 AM, Peter Koch <pk@denic.de> wrote:
great, many thanks for taking the lead and organizing this. Without bothering this list with layer 9 issues too much, and at the very high risk of having missed previous discussion on the issue, where can I find some info on the group/org(s) behind this, a timeline and the intended way of decision making (to the extent applicable)?
+1 Given that there is no cryptographic reason to roll the KSK under the current policies, it would be good to have a list of the perceived operational reasons to roll the KSK. With that, we can come up with a better argument for the timeframe. --Paul Hoffman
-----Ursprungligt meddelande----- Från: ksk-rollover-bounces@icann.org [mailto:ksk-rollover- bounces@icann.org] För Paul Hoffman Skickat: den 18 september 2014 16:30 Till: ksk-rollover@icann.org Ämne: Re: [ksk-rollover] root zone KSK rollover operations workshop planning +1
Given that there is no cryptographic reason to roll the KSK under the current policies, it would be good to have a list of the perceived operational reasons to roll the KSK. With that, we can come up with a better argument for the timeframe.
--Paul Hoffman
I believe that it is stated in the DPS, but except from that, there is no obvious reason afaik. But, with the experience we have, it is better to do it now rather than until the root zone and the people dependent on the rot key signing key are bigger in numbers. That is one thing that .SE learned when WE were a trust anchor (before the root was signed). You will most certainly need to do it some time, and from my opinion it is better earlier than later. Anne-Marie Anne-Marie Eklund Löwinder Chief Information Security Officer .SE (The Internet Infrastructure Foundation) Direct: +46(8)-452 35 17 | Mobile: +46(73)-43 15 310 PO Box 7399, SE-103 91 Stockholm, Sweden Twitter: @amelsec Visitors: Ringvägen 100 http://www.iis.se/en/
On 9/18/2014 10:30 AM, Paul Hoffman wrote:
On Sep 18, 2014, at 12:17 AM, Peter Koch <pk@denic.de> wrote:
great, many thanks for taking the lead and organizing this. Without bothering this list with layer 9 issues too much, and at the very high risk of having missed previous discussion on the issue, where can I find some info on the group/org(s) behind this, a timeline and the intended way of decision making (to the extent applicable)? +1
Given that there is no cryptographic reason to roll the KSK under the current policies,
Whenever someone uses "Given" to imply there's no go reason to do something, I tend to try and parse what the writer actually was trying to say. In the above, I would think to translate this as "Given that the cryptographic properties of the key material we're using for the KSKs has sufficient strength to resist attacks on the cryptography for the foreseeable future we have no reason to roll the KSK due to time-outs on key material strength" which is what I think Paul meant. His original statement could be read to imply that "cryptographic reasons" like wanting to shrink signature lengths (e.g. rolling an RSA key to an EC key) don't exist.
it would be good to have a list of the perceived operational reasons to roll the KSK. With that, we can come up with a better argument for the timeframe.
I found this one also written a little too tightly as well. I'd do: "It would be good to have a list of reasons, beliefs, standards of practices, expert guidance and anecdotal experiences either directly on point or with analogues to the DNS system which touch on whether or not its a good idea to roll the KSK" So a good place to start IMHO is NIST SP800-57 Part 1, Rev 3, Section 5.3.4 Cryptoperiods for Asymmetric Keys. Other places to look are: a) What is the expected EOL of the hardware currently used for root signing? i) Is there a transition plan for transition to new hardware? ii) Is the expected security EOL of the hardware earlier than the actual EOL of the hardware (e.g. have hardware attacks improved such as to weaken the security of the root key at rest in the HSM?) b) What affect on the overall security of the system does transition of personnel have? E.g. replacement of ICANN personnel involved with the KSK, replacement of community representatives? Are there exploitable attack surfaces? c) What is the expected impact on security if funding for the KSK function were reduced or interrupted for a period of time or permanently? d) What is the expected impact on security if the KSK is compromised, and we have no way of rolling the KSK? (e.g. single KSK in root zone). Is there an additional real-world cost to end users in this event? e) Can any of the above be mitigated through a single KSK rollover? Through regularly scheduled KSK rollovers? This is where I'd go with the "should we do it" analysis. Later, Mike
--Paul Hoffman _______________________________________________ ksk-rollover mailing list ksk-rollover@icann.org https://mm.icann.org/mailman/listinfo/ksk-rollover
Michael StJohns (msj) writes:
d) What is the expected impact on security if the KSK is compromised, and we have no way of rolling the KSK? (e.g. single KSK in root zone). Is there an additional real-world cost to end users in this event?
e) Can any of the above be mitigated through a single KSK rollover? Through regularly scheduled KSK rollovers?
The operational experience of doing a single e) will surely help determine d). Currently, there's an emergency plan (rolling the KSK), and it has never been tested. That's proof enough we need to do it. Phil
Michael: If you are not sure what I meant, please ask offline. Everything you imputed on what I said was wrong. --Paul Hoffman
On 9/18/14 10:10 AM, Paul Hoffman wrote:
Michael:
If you are not sure what I meant, please ask offline. Everything you imputed on what I said was wrong.
Paul, Rather than being intentionally cryptic, why don't you illuminate your argument for the group? FWIW, I agree with Michael's analysis, and am also in agreement with the thought processes that lead to the conclusions that we need to do a KSK rollover, and that sooner is better. If you have solid reasoning that refutes any of those points I for one would like to hear it. Doug
Hi Mike, Section 6.5 of the Root Zone DPS [1] states that a key rollover will be scheduled on or after five years. At 08:58 18-09-2014, Michael StJohns wrote:
So a good place to start IMHO is NIST SP800-57 Part 1, Rev 3, Section 5.3.4 Cryptoperiods for Asymmetric Keys.
Other places to look are:
a) What is the expected EOL of the hardware currently used for root signing?
That would be at least five years.
i) Is there a transition plan for transition to new hardware?
In my opinion that was to be covered by the key rollover.
b) What affect on the overall security of the system does transition of personnel have? E.g. replacement of ICANN personnel involved with the KSK, replacement of community representatives? Are there exploitable attack surfaces?
I'll skip this one.
e) Can any of the above be mitigated through a single KSK rollover? Through regularly scheduled KSK rollovers?
I am one of the Crypto Officers (West Coast). The key rollover process has never been exercised. The logistics is non-trivial. I raised the question of a KSK rollover previously with ICANN as there isn't any operational experience for some parts of the DPS. Regards, S. Moonesamy 1. https://www.iana.org/dnssec/icann-dps.txt
On Sep 18, 2014, at 8:58 AM, Michael StJohns <msj@nthpermutation.com> wrote:
On 9/18/2014 10:30 AM, Paul Hoffman wrote:
On Sep 18, 2014, at 12:17 AM, Peter Koch <pk@denic.de> wrote:
great, many thanks for taking the lead and organizing this. Without bothering this list with layer 9 issues too much, and at the very high risk of having missed previous discussion on the issue, where can I find some info on the group/org(s) behind this, a timeline and the intended way of decision making (to the extent applicable)? +1
Given that there is no cryptographic reason to roll the KSK under the current policies,
Whenever someone uses "Given" to imply there's no go reason to do something, I tend to try and parse what the writer actually was trying to say.
In the above, I would think to translate this as "Given that the cryptographic properties of the key material we're using for the KSKs has sufficient strength to resist attacks on the cryptography for the foreseeable future we have no reason to roll the KSK due to time-outs on key material strength" which is what I think Paul meant.
It is not. (Hint: "A said B, which I assume means C which is stupid" is not a popular method for discussions in the IETF.) The current policy is to have the KSK be 2048 bits and the ZSK be 1024 bits. There is ~30 bits of effective strength difference between those. You would be literally insane to attack the KSK. If you had the resources to break the KSK, you could break a ZSK a billion times faster. The lifetime of a ZSK is six months. If you attack the KSK, you are assuming that you have literally 500,000 years to do so; otherwise attacking the ZSK would make more sense. Said another way, if we are worried about an attacker who can break the KSK in 100 years, we should be more worried that the same attacker could break the current ZSK in about 3 seconds.
His original statement could be read to imply that "cryptographic reasons" like wanting to shrink signature lengths (e.g. rolling an RSA key to an EC key) don't exist.
Or it could be read to imply that we do not assume that attackers are so insane that they would attack the wrong key. Which way do you think is a better way to read what I said? FWIW, the question of changing key type (which I strongly support) has nothing to do with a KSK *rollover*.
it would be good to have a list of the perceived operational reasons to roll the KSK. With that, we can come up with a better argument for the timeframe.
I found this one also written a little too tightly as well. I'd do:
"It would be good to have a list of reasons, beliefs, standards of practices, expert guidance and anecdotal experiences either directly on point or with analogues to the DNS system which touch on whether or not its a good idea to roll the KSK"
I prefer more concise questions, but your list seems fine too.
So a good place to start IMHO is NIST SP800-57 Part 1, Rev 3, Section 5.3.4 Cryptoperiods for Asymmetric Keys.
Only if you are literally insane. :-)
Other places to look are:
a) What is the expected EOL of the hardware currently used for root signing? i) Is there a transition plan for transition to new hardware? ii) Is the expected security EOL of the hardware earlier than the actual EOL of the hardware (e.g. have hardware attacks improved such as to weaken the security of the root key at rest in the HSM?) b) What affect on the overall security of the system does transition of personnel have? E.g. replacement of ICANN personnel involved with the KSK, replacement of community representatives? Are there exploitable attack surfaces? c) What is the expected impact on security if funding for the KSK function were reduced or interrupted for a period of time or permanently? d) What is the expected impact on security if the KSK is compromised, and we have no way of rolling the KSK? (e.g. single KSK in root zone). Is there an additional real-world cost to end users in this event?
e) Can any of the above be mitigated through a single KSK rollover? Through regularly scheduled KSK rollovers?
This is where I'd go with the "should we do it" analysis.
I would add: f) If there is a plan to change the signature algorithm, does that fact affect the need for the KSK rollover? If so, how? --Paul Hoffman
On 9/18/2014 6:25 PM, Paul Hoffman wrote:
On Sep 18, 2014, at 8:58 AM, Michael StJohns <msj@nthpermutation.com> wrote:
On 9/18/2014 10:30 AM, Paul Hoffman wrote:
On Sep 18, 2014, at 12:17 AM, Peter Koch <pk@denic.de> wrote:
great, many thanks for taking the lead and organizing this. Without bothering this list with layer 9 issues too much, and at the very high risk of having missed previous discussion on the issue, where can I find some info on the group/org(s) behind this, a timeline and the intended way of decision making (to the extent applicable)? +1
Given that there is no cryptographic reason to roll the KSK under the current policies, Whenever someone uses "Given" to imply there's no go reason to do something, I tend to try and parse what the writer actually was trying to say.
In the above, I would think to translate this as "Given that the cryptographic properties of the key material we're using for the KSKs has sufficient strength to resist attacks on the cryptography for the foreseeable future we have no reason to roll the KSK due to time-outs on key material strength" which is what I think Paul meant. It is not. (Hint: "A said B, which I assume means C which is stupid" is not a popular method for discussions in the IETF.)
Hmm... I said that A said B, but that what I thought he meant (because I thought B was a bit overreaching and not necessarily true in all circumstances which I didn't actually mention at the time) was C which is a proper subset of B. . Strangely enough C is pretty much "there's no reason to roll the KSK key just due to key strength lifetime issues" which is what you just said below, but in a less roundabout way. And just to be clear... I actually think "C" is correct/not stupid. I went back to my original email, and nowhere did I find the "C is stupid" statement either explicit or implied.
The current policy is to have the KSK be 2048 bits and the ZSK be 1024 bits. There is ~30 bits of effective strength difference between those. You would be literally insane to attack the KSK. If you had the resources to break the KSK, you could break a ZSK a billion times faster. The lifetime of a ZSK is six months. If you attack the KSK, you are assuming that you have literally 500,000 years to do so; otherwise attacking the ZSK would make more sense.
Said another way, if we are worried about an attacker who can break the KSK in 100 years, we should be more worried that the same attacker could break the current ZSK in about 3 seconds.
His original statement could be read to imply that "cryptographic reasons" like wanting to shrink signature lengths (e.g. rolling an RSA key to an EC key) don't exist. Or it could be read to imply that we do not assume that attackers are so insane that they would attack the wrong key. Which way do you think is a better way to read what I said?
The way I said it. You used the phrase "cryptographic reasons" which AFAIK is not a term of art. So the plain language meaning is "reasons related to cryptography" which cuts a very very large swath. Basically, I disagree with you on your statement that it's a given "there are no cryptographic reasons" to roll the key. I re-wrote what you put out into a statement that I think mostly matches what most of us believe as "given" - that the current KSK key material is probably not vulnerable to cryptographic attacks during its probable lifetime even without rolling - but that's only one small "cryptographic reason".
FWIW, the question of changing key type (which I strongly support) has nothing to do with a KSK *rollover*.
That's an interesting definition. But not one that I would give much credence to. A rollover is nothing more than replacing one key with another. Replacing an RSA key with an EC key is still a rollover. In this particular instance, the issue with doing a key/signature type change as part of a key rollover has more do to with the limitations of the acceptors (the validating resolvers and clients) than with an artificial distinction between replacing an RSA2048 bit key with a key of the same type vs an EC key. We have three choices basically: don't roll the KSK; roll the KSK and replace it with a key of the same signature type; roll the KSK and replace it with a key of a different signature type. And even that isn't exactly correct, as what we're really talking about doing is first adding keys to the root trust anchor set, and then removing keys from that set. It's really not quite the same as rolling a ZSK (due to 5011 timing issues).
it would be good to have a list of the perceived operational reasons to roll the KSK. With that, we can come up with a better argument for the timeframe. I found this one also written a little too tightly as well. I'd do:
"It would be good to have a list of reasons, beliefs, standards of practices, expert guidance and anecdotal experiences either directly on point or with analogues to the DNS system which touch on whether or not its a good idea to roll the KSK" I prefer more concise questions, but your list seems fine too.
So a good place to start IMHO is NIST SP800-57 Part 1, Rev 3, Section 5.3.4 Cryptoperiods for Asymmetric Keys. Only if you are literally insane. :-)
If you don't like the selection of reference material, pick another something. But make it peer reviewed by experts, widely accepted and generally available. The NIST documents meet those criteria.
Other places to look are:
a) What is the expected EOL of the hardware currently used for root signing? i) Is there a transition plan for transition to new hardware? ii) Is the expected security EOL of the hardware earlier than the actual EOL of the hardware (e.g. have hardware attacks improved such as to weaken the security of the root key at rest in the HSM?) b) What affect on the overall security of the system does transition of personnel have? E.g. replacement of ICANN personnel involved with the KSK, replacement of community representatives? Are there exploitable attack surfaces? c) What is the expected impact on security if funding for the KSK function were reduced or interrupted for a period of time or permanently? d) What is the expected impact on security if the KSK is compromised, and we have no way of rolling the KSK? (e.g. single KSK in root zone). Is there an additional real-world cost to end users in this event?
e) Can any of the above be mitigated through a single KSK rollover? Through regularly scheduled KSK rollovers?
This is where I'd go with the "should we do it" analysis. I would add:
f) If there is a plan to change the signature algorithm, does that fact affect the need for the KSK rollover? If so, how?
I'm not quite sure what you're getting at here. DNSSEC pretty much makes it impossible to change a signature algorithm without changing a key, so changing the signature algorithm used in signatures over the root DNSKEY RRSet is going to require first a) adding a trust anchor key with the new signature algorithm, b) signing the DNSKEY RRSet with that key/algorithm, and c) eventually phasing out the old trust anchor keys and signatures. Later, Mike
--Paul Hoffman
I think we disagree on some fundamental points in this discussion, but I could be wrong and don't want to put words in your mouth. Let me state what I think are the cryptographic aspects of the KSK rollover as defined in the Joe's message that started this thread. - It is literally insane to attack a 2048-bit key that is protecting a 1024-bit key that is used for signing the same material (in this case, the root zone). Thus, there is no cryptographic reason to roll the 2048-bit KSK as long as the root is signed with 1024-bit ZSKs. - Changing the signing algorithm (which I strongly support) is not a KSK rollover and thus out of scope for this discussion except insofar as if there is a planned algorithm change, that could affect the perceived need for the KSK rollover. If changing the signing algorithm *is* in scope for this discussion, the title of the discussion should change. - Changing the signing policy to be one where the KSK and ZSK are both 2048 would have a huge effect on the decisions about KSK rollover because we then need to discuss the attack resistance of 2048-bit RSA keys and the value of breaking the root key signatures. If possibly changing that signing policy is in scope, that needs to be stated early. --Paul Hoffman
On 9/19/2014 11:26 AM, Paul Hoffman wrote:
I think we disagree on some fundamental points in this discussion, but I could be wrong and don't want to put words in your mouth. Let me state what I think are the cryptographic aspects of the KSK rollover as defined in the Joe's message that started this thread.
- It is literally insane to attack a 2048-bit key that is protecting a 1024-bit key that is used for signing the same material (in this case, the root zone). Thus, there is no cryptographic reason to roll the 2048-bit KSK as long as the root is signed with 1024-bit ZSKs. There is no reason *based on the cryptographic strength of the KSK* to roll the root irrespective of the strength of the ZSKs.
- Changing the signing algorithm (which I strongly support) is not a KSK rollover and thus out of scope for this discussion
Disagree.
except insofar as if there is a planned algorithm change, that could affect the perceived need for the KSK rollover. If changing the signing algorithm *is* in scope for this discussion, the title of the discussion should change.
- Changing the signing policy to be one where the KSK and ZSK are both 2048 would have a huge effect on the decisions about KSK rollover because we then need to discuss the attack resistance of 2048-bit RSA keys and the value of breaking the root key signatures. If possibly changing that signing policy is in scope, that needs to be stated early.
Irrelevant.
--Paul Hoffman
Hi, On Sep 19, 2014, at 8:26 AM, Paul Hoffman <paul.hoffman@vpnc.org> wrote:
- It is literally insane to attack a 2048-bit key that is protecting a 1024-bit key that is used for signing the same material (in this case, the root zone). Thus, there is no cryptographic reason to roll the 2048-bit KSK as long as the root is signed with 1024-bit ZSKs.
If the argument here is whether or not we should roll the key based on the feasibility of someone to brute forcing a 2048 bit key, I’d agree it is insane. But I don’t think anyone here (or elsewhere) is arguing that. I believe the core idea is that because we don’t know what the future holds (e.g., someone comes up with a novel attack against RSA), we need to be in a position to change the key. In order to change the key, we need: - the key changing policies and processes. - experience with operationally changing the key according to those policies and processes. - some level of assurance that the Internet won’t break if we change the key. That’s the reason for this mailing list (and workshop during the ICANN LA meeting).
- Changing the signing algorithm (which I strongly support) is not a KSK rollover and thus out of scope for this discussion except insofar as if there is a planned algorithm change, that could affect the perceived need for the KSK rollover. If changing the signing algorithm *is* in scope for this discussion, the title of the discussion should change.
I’m not sure arguing the semantics of the terminology used in the name of this mailing list is a good use of time.
- Changing the signing policy to be one where the KSK and ZSK are both 2048 would have a huge effect on the decisions about KSK rollover because we then need to discuss the attack resistance of 2048-bit RSA keys and the value of breaking the root key signatures. If possibly changing that signing policy is in scope, that needs to be stated early.
At this stage of the game, I personally think everything that could impact changing the key and/or the implications of changing the key should be in scope. Regards, -drc (ICANN CTO, but speaking for myself only)
On Sep 19, 2014, at 9:44 AM, David Conrad <david.conrad@icann.org> wrote:
- Changing the signing algorithm (which I strongly support) is not a KSK rollover and thus out of scope for this discussion except insofar as if there is a planned algorithm change, that could affect the perceived need for the KSK rollover. If changing the signing algorithm *is* in scope for this discussion, the title of the discussion should change.
I’m not sure arguing the semantics of the terminology used in the name of this mailing list is a good use of time.
I am. If as you say below, "everything that could impact changing the key and/or the implications of changing the key should be in scope", then simply calling the workshop as being about "KSK change" that would be a lot clearer. And, given that, I propose that there be a major topic on changing the signing algorithm to elliptic curve with 256-bit keys. The CFRG will likely settle on recommendations for fast, constant-time curves for signing; these would be excellent targets for a new signing algorithm. The oft-stated benefits would be: - much smaller DNS response messages for signatures - much greater predictability for the cryptographic lifetimes for the keys --Paul Hoffman
Paul, On Sep 19, 2014, at 10:06 AM, Paul Hoffman <paul.hoffman@vpnc.org> wrote:
I’m not sure arguing the semantics of the terminology used in the name of this mailing list is a good use of time.
I am. If as you say below, "everything that could impact changing the key and/or the implications of changing the key should be in scope", then simply calling the workshop as being about "KSK change" that would be a lot clearer.
While I’m all in favor of clarity, given the challenges I had in arranging for the workshop to be held in LA, I am somewhat reticent to change anything, including even the title in the agenda (particularly since an announcement has already gone out). I will, however, keep this in mind for future workshops (I anticipate these workshops will be a feature for upcoming meetings until they are no longer necessary/useful). With regards to the mailing list, I’ve changed the subject string (and mailing list short and long descriptions) to reference ‘change’ as opposed to ‘rollover’. I hope this suffices (creating mailing lists at ICANN requires a bit of work and I’d prefer not having to change the mailing list name).
And, given that, I propose that there be a major topic on changing the signing algorithm to elliptic curve with 256-bit keys.
Personally, I’m quite interested in this particular topic (and, in fact (teaser :)), hope there will be some data presented during the workshop related to this). Regards, -drc
Hi, In my personal opinion, KSK rollover falls under two categories. One is when the key is compromised in any way (physically or logically) and the other is when it is operationally better to rotate the key. The former is emergency roll and the latter is planned roll. The rollover we are discussing now falls under the latter. It's something like a disaster recovery exercise (not testing on production :-)) so that we can confirm that we can roll the key in case of emergency without major hiccups. Right now, we don't know that as we've never done it. It will be irresponsible for us engineers not to roll the key and defer the issues to the next generation. On the other hand, if we do this right, people will trust DNSSEC even more. Just my two cents. Cheers, Tomofumi On Fri, Sep 19, 2014 at 10:28 AM, David Conrad <david.conrad@icann.org> wrote:
Paul,
On Sep 19, 2014, at 10:06 AM, Paul Hoffman <paul.hoffman@vpnc.org> wrote:
I’m not sure arguing the semantics of the terminology used in the name of this mailing list is a good use of time.
I am. If as you say below, "everything that could impact changing the key and/or the implications of changing the key should be in scope", then simply calling the workshop as being about "KSK change" that would be a lot clearer.
While I’m all in favor of clarity, given the challenges I had in arranging for the workshop to be held in LA, I am somewhat reticent to change anything, including even the title in the agenda (particularly since an announcement has already gone out). I will, however, keep this in mind for future workshops (I anticipate these workshops will be a feature for upcoming meetings until they are no longer necessary/useful).
With regards to the mailing list, I’ve changed the subject string (and mailing list short and long descriptions) to reference ‘change’ as opposed to ‘rollover’. I hope this suffices (creating mailing lists at ICANN requires a bit of work and I’d prefer not having to change the mailing list name).
And, given that, I propose that there be a major topic on changing the signing algorithm to elliptic curve with 256-bit keys.
Personally, I’m quite interested in this particular topic (and, in fact (teaser :)), hope there will be some data presented during the workshop related to this).
Regards, -drc
_______________________________________________ ksk-rollover mailing list ksk-rollover@icann.org https://mm.icann.org/mailman/listinfo/ksk-rollover
Tomofumi, On Sep 19, 2014, at 11:46 AM, Tomofumi Okubo <tomofumi.okubo@gmail.com> wrote:
The former is emergency roll and the latter is planned roll.
The rollover we are discussing now falls under the latter.
Actually, I don’t think we’ve started the discussion on changing the key as yet :). Is there a need to have a different set of policies/processes for an emergency roll vs. a planned roll? Is a planned roll a proper subset of an emergency roll? Regards, -drc
Hello David, Thanks for your reply.
On Sep 20, 2014, at 17:58, David Conrad <david.conrad@icann.org> wrote:
Actually, I don’t think we’ve started the discussion on changing the key as yet :).
My bad. I thought it did seeing the lively discussion on the list :-)
Is there a need to have a different set of policies/processes for an emergency roll vs. a planned roll?
In general, planned operation and emergency (disaster recovery) do have different set of policies and processes but I believe we can come up with something all-in-one. In terms of the mechanics, emergency and planned rollover are the same however, processes, for example, authorization (who gives the green light, in what instances, how it is recorded etc...) and notification (who notifies whom and how) could be slightly different as emergency rollovers might have tighter time constraints depending on the situation.
Is a planned roll a proper subset of an emergency roll?
Yes, I think so. It's just that emergency rollovers would have more provisions. Thanks! Tomofumi
On 21 Sep 2014, at 1:39, Tomofumi Okubo <tomofumi.okubo@gmail.com> wrote:
Thanks for your reply.
On Sep 20, 2014, at 17:58, David Conrad <david.conrad@icann.org> wrote:
Actually, I don’t think we’ve started the discussion on changing the key as yet :).
My bad. I thought it did seeing the lively discussion on the list :-)
Is there a need to have a different set of policies/processes for an emergency roll vs. a planned roll?
In general, planned operation and emergency (disaster recovery) do have different set of policies and processes but I believe we can come up with something all-in-one.
One way that an emergency roll is different from a planned roll is that a planned roll can make use of existing non-compromised KSKs and their corresponding trust anchors, whereas an emergency roll (where the emergency is a consequence of a key compromise) might not have that luxury. Making the procedures essentially the same might benefit from a standby key whose trust anchor is published long in advance, and whose risk profile of compromise is usefully different from that of the active KSK. Having such a standby key available (e.g. as recommended in RFC 5011, and by Mike StJohns in the past) would help align the two procedures, although an approach for mitigating the compromise of both active and standby keys would still be required for the general case of emergency roll due to compromise. Joe
Hello Joe, On Sun, Sep 21, 2014 at 8:41 AM, Joe Abley <jabley@hopcount.ca> wrote:
Having such a standby key available (e.g. as recommended in RFC 5011, and by Mike StJohns in the past) would help align the two procedures, although an approach for mitigating the compromise of both active and standby keys would still be required for the general case of emergency roll due to compromise.
Yes I agree. I like the idea of having standby keys that will help a lot. Although, even with the standby keys, we still need to consider scenarios in which both keys needs to be replaced such as algorithm compromise (if it is the same) or physical compromise of the key (if both key sit on he same HSM). Thanks, Tomofumi
On 9/21/2014 1:21 PM, Tomofumi Okubo wrote:
Hello Joe,
On Sun, Sep 21, 2014 at 8:41 AM, Joe Abley <jabley@hopcount.ca> wrote:
Having such a standby key available (e.g. as recommended in RFC 5011, and by Mike StJohns in the past)
Heh... since I wrote 5011, you might expect those to be similar.
would help align the two procedures, although an approach for mitigating the compromise of both active and standby keys would still be required for the general case of emergency roll due to compromise. Yes I agree. I like the idea of having standby keys that will help a lot.
Although, even with the standby keys, we still need to consider scenarios in which both keys needs to be replaced such as algorithm compromise (if it is the same) or physical compromise of the key (if both key sit on he same HSM).
Worst case is compromise of all trust anchor keys. 5011 allows you to recover from an N-1 compromise (where you have at least one private key associated with the root trust anchor set that hasn't been compromised). Absent DNS size limitations, you could set N to be 3 or 4 and sequester the N-1 standby private keys in a manner to minimize their compromise. The probability of a catastrophic worst case is then Psecure = 1-PRODUCT[for each key](1-Psecure[key])) Say the probability for an active key to stay secure over 10 years is 99% and for a stand by key is 99.9%. For a three key system with two keys in standby, then the math is 1-(.01 * .001 * .001) or a 99.999999% chance of keeping the system secure for 10 years. Of course the hard part is crafting protections to give you those chances. Later, Mike
Thanks, Tomofumi _______________________________________________ ksk-rollover mailing list ksk-rollover@icann.org https://mm.icann.org/mailman/listinfo/ksk-rollover
Hello Mike, More than 1 standby key sounds even better! I think this is something worth exploring as it will give us more options to logically protect the KSK (e.g. standby keys with different algorithms, different key lengths etc...). # Some CA's have multiple root CA's (for the same assurance level) in the cert store with different specs. This is a bit like that. Thanks! Tomofumi On Sun, Sep 21, 2014 at 10:49 AM, Michael StJohns <msj@nthpermutation.com> wrote:
On 9/21/2014 1:21 PM, Tomofumi Okubo wrote:
Hello Joe,
On Sun, Sep 21, 2014 at 8:41 AM, Joe Abley <jabley@hopcount.ca> wrote:
Having such a standby key available (e.g. as recommended in RFC 5011, and by Mike StJohns in the past)
Heh... since I wrote 5011, you might expect those to be similar.
would help align the two procedures, although an approach for mitigating the compromise of both active and standby keys would still be required for the general case of emergency roll due to compromise.
Yes I agree. I like the idea of having standby keys that will help a lot.
Although, even with the standby keys, we still need to consider scenarios in which both keys needs to be replaced such as algorithm compromise (if it is the same) or physical compromise of the key (if both key sit on he same HSM).
Worst case is compromise of all trust anchor keys. 5011 allows you to recover from an N-1 compromise (where you have at least one private key associated with the root trust anchor set that hasn't been compromised). Absent DNS size limitations, you could set N to be 3 or 4 and sequester the N-1 standby private keys in a manner to minimize their compromise. The probability of a catastrophic worst case is then
Psecure = 1-PRODUCT[for each key](1-Psecure[key]))
Say the probability for an active key to stay secure over 10 years is 99% and for a stand by key is 99.9%. For a three key system with two keys in standby, then the math is 1-(.01 * .001 * .001) or a 99.999999% chance of keeping the system secure for 10 years.
Of course the hard part is crafting protections to give you those chances.
Later, Mike
Thanks, Tomofumi _______________________________________________ ksk-rollover mailing list ksk-rollover@icann.org https://mm.icann.org/mailman/listinfo/ksk-rollover
_______________________________________________ ksk-rollover mailing list ksk-rollover@icann.org https://mm.icann.org/mailman/listinfo/ksk-rollover
On 9/21/2014 2:27 PM, David Conrad wrote:
On Sep 21, 2014, at 11:15 AM, Tomofumi Okubo <tomofumi.okubo@gmail.com> wrote:
More than 1 standby key sounds even better! How would this impact the size of responses?
Regards, -drc
_______________________________________________ ksk-rollover mailing list ksk-rollover@icann.org https://mm.icann.org/mailman/listinfo/ksk-rollover
There's some (explicitly designed) weirdness in 5011 related to this. Basically, once a key is added to the trust anchor set, it remains there until revoked. Absence of the key in the DNSKEY RRSet does not affect its inclusion in the TA set. So you could add a deep stand by key leaving it in the DNSKEY RRSet for about 60 days (to ensure its addition as a trust anchor). Then excluding it from further RRSet publications until actually needed. The specific 5011 state is "missing". Mike
On 21 sep 2014, at 20:38, Michael StJohns <msj@nthpermutation.com> wrote:
There's some (explicitly designed) weirdness in 5011 related to this. Basically, once a key is added to the trust anchor set, it remains there until revoked. Absence of the key in the DNSKEY RRSet does not affect its inclusion in the TA set. So you could add a deep stand by key leaving it in the DNSKEY RRSet for about 60 days (to ensure its addition as a trust anchor). Then excluding it from further RRSet publications until actually needed. The specific 5011 state is "missing".
I've noticed this feature in the past, and I believe it is more useful and important than one might think at first. jakob
i believe that was added after discussions of the authors of the alternative draft when trying to deal with devices built & warehoused for a period… the specific period resolved to 60 days. /bill PO Box 12317 Marina del Rey, CA 90295 310.322.8102 On 21September2014Sunday, at 12:27, Jakob Schlyter <jakob@kirei.se> wrote:
On 21 sep 2014, at 20:38, Michael StJohns <msj@nthpermutation.com> wrote:
There's some (explicitly designed) weirdness in 5011 related to this. Basically, once a key is added to the trust anchor set, it remains there until revoked. Absence of the key in the DNSKEY RRSet does not affect its inclusion in the TA set. So you could add a deep stand by key leaving it in the DNSKEY RRSet for about 60 days (to ensure its addition as a trust anchor). Then excluding it from further RRSet publications until actually needed. The specific 5011 state is "missing".
I've noticed this feature in the past, and I believe it is more useful and important than one might think at first.
jakob
_______________________________________________ ksk-rollover mailing list ksk-rollover@icann.org https://mm.icann.org/mailman/listinfo/ksk-rollover
Hello David, On Sun, Sep 21, 2014 at 11:27 AM, David Conrad <david.conrad@icann.org> wrote:
On Sep 21, 2014, at 11:15 AM, Tomofumi Okubo <tomofumi.okubo@gmail.com> wrote:
How would this impact the size of responses?
Yes, this will make the keyset a bit larger. I might be wrong but from what I understand, bigger payload may or may not cause bad things. This is why I suggest exploring the option.
From a security standpoint, standby key is good. But I also understand from a DNS engineering standpoint it could be bad.
I'm sure people in the group already read this but just in case... Appendix B – DNS Response Size Considerations (page 31) https://www.icann.org/en/system/files/files/sac-063-en.pdf Cheers! Tomofumi
On Sep 21, 2014, at 10:49 AM, Michael StJohns <msj@nthpermutation.com> wrote:
Worst case is compromise of all trust anchor keys. 5011 allows you to recover from an N-1 compromise (where you have at least one private key associated with the root trust anchor set that hasn't been compromised).
This has always been my problem with 5011-based rollovers. Given the protections specified in the DPS, all the scenarios in which we have to do an emergency key roll seem ridiculously unlikely. However, I assume we have to be prepared for the worst case scenario. Since 5011 can’t help us with that scenario and emergency key rollovers is a superset of planned rollovers, I’ve been unclear as to the advantage 5011 provides. And then there is the issue of (likely permanent) lack of universal implementation of 5011... Am I missing something? Regards, -drc
On 9/21/2014 2:37 PM, David Conrad wrote:
On Sep 21, 2014, at 10:49 AM, Michael StJohns <msj@nthpermutation.com> wrote:
Worst case is compromise of all trust anchor keys. 5011 allows you to recover from an N-1 compromise (where you have at least one private key associated with the root trust anchor set that hasn't been compromised). This has always been my problem with 5011-based rollovers.
Given the protections specified in the DPS, all the scenarios in which we have to do an emergency key roll seem ridiculously unlikely. However, I assume we have to be prepared for the worst case scenario. Since 5011 can’t help us with that scenario and emergency key rollovers is a superset of planned rollovers, I’ve been unclear as to the advantage 5011 provides.
And then there is the issue of (likely permanent) lack of universal implementation of 5011...
Am I missing something?
I think someone once said that "Democracy is the worst form of government, except for all others". 5011 is pretty much the same with respect to DNSSEC. If you go back to the original requirements document (which was hard fought), you'll see the assumptions we made about what might be possible. 5011 has the advantage of being in-band, is designed to work without dependence on non-DNS (e.g. x.509 PKI) infrastructure, and understands the difference between the DNS transport protocol and the DNS data protocol limitations. It was also designed to work with existing DNSSEC implementations through the addition of a simple program that could do 5011 DNS queries and update the local DNSKEY trust anchor stores. You say "5011 can't help with that scenario" ... but the truth is, NOTHING can help you with that scenario due to the one-way nature of DNS data. The goal of 5011 was to provide a set of mechanisms that could be used to minimize the chance of a complete root trust anchor compromise and recover from anything but that complete compromise. I think it met that goal. The problem I have now is that whoever set up the current root of trust either didn't actually read the "protection only applies if you have more than a single root of trust" or avoided placing a second root of trust for some reason that seemed to make sense at the time. What should have happened when this got started was for 2-3 roots of trust key sets to be created and distributed as part of the initial bring up. Neither 5011 nor DNSSEC require that more than one of those roots of trust be present in the published RRSet if the argument was about space in the replies, but the current model means that - absent 5011 or a year to two (five?) year publicity campaign to add new trust anchors - we've got a problem with superseding the existing KSK key, emergency *or* planned. I designed a pretty good protocol. I was somewhat dismayed when the deployment at the root didn't match 5011 requirements, even though the DPS cited 5011 as the operational model. But that's past. What do you want to do next? WRT to 5011, one way to do a rollover exercise without screwing over the non-5011 resolvers is to do the following: Current KSK is A - it stays. Add B key. (plus 30 day hold down) Add C key. (plus 30 day hold down) Revoke B key. The TA set goes from {A} to {A, B} to {A,B,C} to {A,C}. The servers that only know A continue to work. Other servers update their TA store and we do some post twiddling analysis to see whether or not the updates made it through. Later, Mike
Regards, -drc
Mike, On Sep 21, 2014, at 12:01 PM, Michael StJohns <msj@nthpermutation.com> wrote:
You say "5011 can't help with that scenario" ... but the truth is, NOTHING can help you with that scenario due to the one-way nature of DNS data.
Actually, nothing _in-band to the DNS_ can help. However, IIUC, we must be able to cope with scenarios in which we can’t trust any of the keys. I believe there are two scenarios in which this occurs: bootstrapping and catastrophic compromise of all keys. As far as I am aware, 5011 cannot help either of these cases, so we have to have some mechanism that will allow for key {rollover,change} without the benefit of 5011. Given this, I’m still struggling to see the benefit that 5011 brings. This is not intended as criticism of 5011, rather it is a question related to pragmatics. Regards, -drc
On 9/21/2014 4:03 PM, David Conrad wrote:
Mike,
On Sep 21, 2014, at 12:01 PM, Michael StJohns <msj@nthpermutation.com> wrote:
You say "5011 can't help with that scenario" ... but the truth is, NOTHING can help you with that scenario due to the one-way nature of DNS data. Actually, nothing _in-band to the DNS_ can help.
However, IIUC, we must be able to cope with scenarios in which we can’t trust any of the keys. I believe there are two scenarios in which this occurs: bootstrapping and catastrophic compromise of all keys. As far as I am aware, 5011 cannot help either of these cases, so we have to have some mechanism that will allow for key {rollover,change} without the benefit of 5011.
Given this, I’m still struggling to see the benefit that 5011 brings. This is not intended as criticism of 5011, rather it is a question related to pragmatics.
Regards, -drc
What I think you're saying above is basically, "I don't want a system that can deal with the most likely single compromise scenarios, but that I want to do a full scale trust reboot every so often and require 100 of 1000s (millions?) of manual updates of trust anchors." You seem to be saying that you want the trust reboot system to be the same as the normal key supercession process. Or am I missing what you're trying to do here? 5011 is for the normal update and supercession of keys short of a complete trust reboot. It provides mechanisms for revoking keys, and for updating trust anchor sets in a secure and automated fashion. It explicitly was not designed for a trust reboot as no trust reboot with total compromise can be done in a secure and automated fashion. See discussion below. Axiom: If you lose your last trust anchor, you're screwed. You basically have to reboot trust from scratch. Discussion: At some point along the path, the person configuring a resolver (or trust validator in non DNS terms) has to make a leap of faith; to believe (hopefully with sufficient evidence) that the trust anchor(s) he's been handed are really *the* trust anchors for the system (this is what happened when the root was signed for the first time - or more probably when you turned on DNSSEC in your resolvers and used the vendor provided set). Within that system, all downstream trust (e.g. key rollovers validated by that root of trust or a chain back to that root of trust), is derived from that original leap of faith, and given an unbroken chain and sufficient process, you can continue to update your trust anchors with new trust anchors based on that original trust relationship. Consider what happens if at a single point in time, all of the key material related to current trust anchors becomes compromised: you're now at a clean slate. You can no longer trace your trust back to the original trust relationship. To start again, you need to repeat the original leap of faith and make a conscious (not programmatic) decision to trust a newly minted trust anchor - either using the same process as before, or come up with a new one. Regardless of process, there's a need to make the leap of faith in the process of populating the replacement trust anchors. This applies whether the trust anchors are DNSSEC KSK for the root trust anchors, or a CA signed X.509 cert signing a blob containing trust anchors. At some point along the way, you have to manually make a decision to renew trust. Axiom: If at any time, you have at least one valid trust anchor, you can establish more trust anchors. Discussion: In DNSSEC 5011 its actually, as long as you have at least one uncompromised trust anchor private key which remains uncompromised for the Add hold-down time, you can add new trust anchors. Corollary: If you haven't lost your last trust anchor, you are not screwed. Discussion: As long as you can keep ahead of compromise of trust anchors, you don't have to do a reboot of trust. Corollary: Having only one root of trust is bad. Regardless of whether its PKI or DNSSEC or something else. Discussion: The problem is scalable; in a small local system where one entity controls both the thing to be trusted and those who have to trust, a single root is probably manageable. That, however is not DNS. In the current case, one root of trust (one root KSK in the trust anchor set) plus a single compromise equals a need for a trust reboot on a global scale. We should avoid that. I understand why you're looking at the "oh my god it's all broken" scenario. But you'd be better off looking at how to prevent it to a statistical certainty. There is NO model existing that allows you to recover automatically from a complete compromise of root of trust key material; not in PKI, not in DNSSEC. From the above, you're looking for the "how do I reboot trust" solution, not "how do I deal with compromise" solution. That may be useful for appendix F of how to run the root in the event of catastrophic failure, but not so useful day to day. Instead look for "How do I deal with compromise" and "How do I prevent to a high probability the need to reboot trust"? The simplest way to do the second is to increase the number of trust anchors in the root, and sequester the key material needed to sign with those trust anchors in a manner that a) prevents fate sharing (e.g. one compromise does not result in another), and b) limits the probability of loss of the key material (i.e. a trust anchor without a useable private key can't be counted as a useable trust anchor). Later, Mike
Mike, On Sep 21, 2014, at 2:12 PM, Michael StJohns <msj@nthpermutation.com> wrote:
What I think you're saying above is basically, "I don't want a system that can deal with the most likely single compromise scenarios, but that I want to do a full scale trust reboot every so often and require 100 of 1000s (millions?) of manual updates of trust anchors.”
What I want or don’t want is, of course, irrelevant. It might be interesting to explore assumptions. For example, what do you believe is “the most likely single compromise scenario”? And, what do you think the penetration of 5011 will be in validating resolvers now and in (say) 5 years? I am assuming: a. for all intents and purposes, the likelihood of _any_ compromise/loss of the root key is statistically equivalent. b. regardless of (a), we _must_ be capable of dealing with a statistically unlikely event occurring. c. touching the root key for any reason increases the probability of catastrophic failure/compromise by an infinitesimal but non-zero amount. d. changing the root key of the DNS is and will continue to be an infrequent event (both because of (c) but more likely the PITA-ness of changing the key). In addition, I’m assuming: e. few large scale organizations will be comfortable with a signal being sent from somewhere out of their control that results in permanent changes to critical configuration information. f. it is hard to implement 5011 correctly. g. people will continue to ship crap code. h. as a result of a combination of (e), (f), and (g), some people won’t be able to enable 5011 support even if it does exist. And of course (not really an assumption, but), i. 5011 cannot help in the event of a catastrophic key compromise. The above assumptions leave me questioning the benefit of assuming any roll can or should be treated as “planned”.
5011 is for the normal update and supercession of keys short of a complete trust reboot.
I guess this is where I get stuck: I don’t see how we will ever (or even should) get to a point where we see superseding the root key as a ‘normal’ thing. If we assume people are dependent upon DNSSEC, I see mucking about with the root key as equivalent to juggling with an armed H-bomb: it isn’t something you want to normalize. Regards, -drc P.S. An honest question: how often do root X.509 CAs roll their root keys?
On 9/21/2014 6:16 PM, David Conrad wrote:
Mike,
On Sep 21, 2014, at 2:12 PM, Michael StJohns <msj@nthpermutation.com> wrote:
What I think you're saying above is basically, "I don't want a system that can deal with the most likely single compromise scenarios, but that I want to do a full scale trust reboot every so often and require 100 of 1000s (millions?) of manual updates of trust anchors.” What I want or don’t want is, of course, irrelevant.
It might be interesting to explore assumptions. For example, what do you believe is “the most likely single compromise scenario”? And, what do you think the penetration of 5011 will be in validating resolvers now and in (say) 5 years?
The single, simplest compromise scenario is where someone manages to set the revocation bit of the current sole KSK during one of the regular re-signings. That revokes the sole trust anchor and takes DNSSEC offline for a large portion of the network. If I wanted to cause the most mischief for the least amount of effort, I'd try and attack things there. With respect to 5011 penetration - easiest thing is to ask the vendors. AFAIK, its included in all the commercial for-sale products as well as in the custom built versions used by various providers.
I am assuming:
a. for all intents and purposes, the likelihood of _any_ compromise/loss of the root key is statistically equivalent.
Bad assumption. First its "a" root key, not "the" root key - or should be. Second, a stand-by key locked away in a safe with appropriate physical safe guards and maybe even split into N of K slices and encrypted is probably less vulnerable than using an active key to revoke itself or even using the root KSK to sign a set of ZSKs that weren't supposed to be signed. The probability of the compromise of the root key system is the product of the probability of the compromise of each individual trust anchor private key, and that's the critical metric. So if you have a 1 in a million chance to compromise one key, and the probability is similar between all keys, the probability of compromising the SYSTEM if you're using 2 keys is 1 in a trillion and 1 in a quintillion for 3 keys.
b. regardless of (a), we _must_ be capable of dealing with a statistically unlikely event occurring. Yes, but... you already have a way of dealing with clear field trust rebooting. It's well documented. It's just clumsy as hell and requires manual intervention. If you're looking for something better just for that occurrence, you're probably going to be looking for a long while. You can't magically bootstrap trust from no trust.
OTOH dealing with less than catastrophic single key compromises seems to be well within the possibility of automated and secure and is exactly what 5011 was designed to accomplish.
c. touching the root key for any reason increases the probability of catastrophic failure/compromise by an infinitesimal but non-zero amount. No. Touching the *only* root key does that. Touching one root key where the others are locked away decouples the fate of the system from the fate of the key.
d. changing the root key of the DNS is and will continue to be an infrequent event (both because of (c) but more likely the PITA-ness of changing the key). This is a circular argument, we won't change the key, because we haven't changed the key because its painful to change the key so we wont' change the key. Then there's where you measure pain. At the signing end, its just one more key generation ceremony followed by the appropriate KSK signing ceremonies. At the validating end, its having 5011 working correctly (and ideally automatically). There will be early pain - there always is when you start doing something new. But the more often you do it the better you get at it, which is the whole point of doing a key cycle earlier rather than later.
In addition, I’m assuming:
e. few large scale organizations will be comfortable with a signal being sent from somewhere out of their control that results in permanent changes to critical configuration information.
How many large scale organizations do you know that manually provide a list of trusted CAs for the browsers their employee's use? How many of them do you know check each and every browser release revision for the inclusion of new CAs? It's not exactly a signal, but it has similar effects. Another item in this space is anti-virus data. I know of no large scale organization that breaks apart and manually verifies the virus signatures [provided to it before passing it on to its employees.
f. it is hard to implement 5011 correctly. g. people will continue to ship crap code. h. as a result of a combination of (e), (f), and (g), some people won’t be able to enable 5011 support even if it does exist.
f. It is hard to implement DNSSEC correctly. g. people will continue to ship crap code including outdated DNSSEC trust anchor information. h. as a result of f and g, some websites will be unreachable that should be reachable and vice versa.
And of course (not really an assumption, but),
i. 5011 cannot help in the event of a catastrophic key compromise.
Repeat - NOTHING can help in the event of a compromise of the entire root key set except doing a complete and total trust reboot.
The above assumptions leave me questioning the benefit of assuming any roll can or should be treated as “planned”.
I believe that your assumptions are flawed or missing supporting data that would tend to support those assumptions.
5011 is for the normal update and supercession of keys short of a complete trust reboot. I guess this is where I get stuck: I don’t see how we will ever (or even should) get to a point where we see superseding the root key as a ‘normal’ thing. If we assume people are dependent upon DNSSEC, I see mucking about with the root key as equivalent to juggling with an armed H-bomb: it isn’t something you want to normalize.
Stop saying "THE" root key. As long as we stick with a single root key, all of your single key catastrophic predictions will come true. Its the major vulnerability to the system as currently implemented.
Regards, -drc
P.S. An honest question: how often do root X.509 CAs roll their root keys?
It's kind of irrelevant, but somewhere between 5 and 20 years. It's irrelevant since there are something like 50+ of them in common use (and probably another 50 in slightly sparser use). The compromise of any one of them is unlikely to have the same effect as taking out the single DNSSEC KSK trust anchor key. Later, Mike ps: One of the more useful analysis methods I've seen for thinking about things like this is Schniers's attack tree. First come up with what you want to do in an attack, next list the possible ways to get there. For each of the ways list what you need to accomplish (either an "or" or an "and" branch for each level you recurse - e.g. Either steal the password from the guys wallet, use social engineering, or break the passwd file). Then assign cost and probability of success to each branch of the tree and its subtrees. Then use the tree to calculate probability and cost for accomplishing the overall attack. For "or" branches, you take the best probability or lowest cost - you may end up calculating both versions of the tree. For "and" branches, its the sum of the costs of the subtree and the product of the probabilities. The "attack" I listed at the beginning (revoke the root trust anchor set) has several branches - only one of which is sneak in the revocation bit as part of the normal signing. Other approaches are brute force, key extraction, compromise of the key backups, etc. Of course, this doesn't necessarily help if someone comes up with an unanticipated attack. All you can do then is hope your defense in depth has mitigated at least some branches of that attack's attack tree.
Mike, [snipping for brevity, not necessarily agreement] On Sep 21, 2014, at 4:54 PM, Michael StJohns <msj@nthpermutation.com> wrote:
a. for all intents and purposes, the likelihood of _any_ compromise/loss of the root key is statistically equivalent. Bad assumption. [...] So if you have a 1 in a million chance to compromise one key, and the probability is similar between all keys, the probability of compromising the SYSTEM if you're using 2 keys is 1 in a trillion and 1 in a quintillion for 3 keys.
If the risk of compromise is constrained to key handling (which I gather you are assuming), given the processes and frequency of key use, then I will again assert that the probability of compromise of 1, 2, and 3 keys is pragmatically speaking, statistically equivalent and essentially zero. Even 1-in-a-million failure rate when you exercise the system once a quarter is beyond any reasonable timeframe that we might consider. Out of curiosity, how do you deal with the keyset size issue?
OTOH dealing with less than catastrophic single key compromises seems to be well within the possibility of automated and secure and is exactly what 5011 was designed to accomplish.
You appear to have much more faith than I that code will operate as intended when it is exercised with a frequency appropriate to dealing with critical infrastructure. Or do you believe we should revise the key handling policies and processes to roll _much_ more frequently?
c. touching the root key for any reason increases the probability of catastrophic failure/compromise by an infinitesimal but non-zero amount. No. Touching the *only* root key does that. Touching one root key where the others are locked away decouples the fate of the system from the fate of the key.
It does NOT fully decouple the fate of the system since you have to begin using the locked away key, which means you have to exercise (likely little used) processes and place that formerly-locked away key into use. All of that increases risk “by an infinitesimal but non-zero amount”.
d. changing the root key of the DNS is and will continue to be an infrequent event (both because of (c) but more likely the PITA-ness of changing the key). This is a circular argument, we won't change the key, because we haven't changed the key because its painful to change the key so we wont' change the key.
No. We won’t change the key frequently because (a) there is no operational reason that forces the key to change, (b) there is a risk — no matter how slight — that we might screw up, (c) it is expensive and time consuming to drag the necessary people into the secure facilities to spend the 2+ hours necessary to do the key handling appropriately, and (d), it is likely that rolling the key _will_ break things, the only question is how much and who will be affected. 5011, at least theoretically, gives us the ability to roll the key frequently and/or for non-crtical reasons. However, given operational realities, it isn’t clear to me that is necessary/useful/helpful. Since we have to deal with a “full trust reboot” and that provides a superset of functionality to 5011, I’m still unclear as to why we care about 5011.
P.S. An honest question: how often do root X.509 CAs roll their root keys? It's kind of irrelevant, but somewhere between 5 and 20 years.
I’ve been told (informally) the X.509 root CAs do not roll their root keys, period. It might be useful to get an authoritative answer on that question.
It's irrelevant since there are something like 50+ of them in common use
If you’re one of those CAs and your root’s private key is compromised, the fact that you have 49 competitors is unlikely to be much of a consolation. The point being that from the perspective of the CA, the loss of the key is an existential risk and your policies and processes are designed to deal with that risk. I see some parallels in the handling of the root key. It might be useful to understand how the CAs deal with that risk. Regards, -drc
On 9/21/2014 10:55 PM, David Conrad wrote:
Mike,
[snipping for brevity, not necessarily agreement]
On Sep 21, 2014, at 4:54 PM, Michael StJohns <msj@nthpermutation.com> wrote:
a. for all intents and purposes, the likelihood of _any_ compromise/loss of the root key is statistically equivalent. Bad assumption. [...] So if you have a 1 in a million chance to compromise one key, and the probability is similar between all keys, the probability of compromising the SYSTEM if you're using 2 keys is 1 in a trillion and 1 in a quintillion for 3 keys. If the risk of compromise is constrained to key handling (which I gather you are assuming), given the processes and frequency of key use, then I will again assert that the probability of compromise of 1, 2, and 3 keys is pragmatically speaking, statistically equivalent and essentially zero. Even 1-in-a-million failure rate when you exercise the system once a quarter is beyond any reasonable timeframe that we might consider.
*sigh* There is a non-zero chance for compromise of any given key. See for example the CA compromises of 2011. Those compromises generally take a non-zero time to accomplish so you mostly have some time between the exploit being public and the possible compromise of the next key. Except that we have no next key (nor did most of the CAs who either issued a new root and resigned all the subordinates or went bankrupt).
Out of curiosity, how do you deal with the keyset size issue?
You mean the over the air size of the keyset? You live with the occasional need to fallback to TCP. Or you move from RSA to EC to reduce the byte size of the key set while increasing the number of keys. That has to be done at some point.
OTOH dealing with less than catastrophic single key compromises seems to be well within the possibility of automated and secure and is exactly what 5011 was designed to accomplish. You appear to have much more faith than I that code will operate as intended when it is exercised with a frequency appropriate to dealing with critical infrastructure.
Or do you believe we should revise the key handling policies and processes to roll _much_ more frequently?
I'd suggest 1-2 years. Or basically once every 4-8 times you do a ZSK replacement.
c. touching the root key for any reason increases the probability of catastrophic failure/compromise by an infinitesimal but non-zero amount. No. Touching the *only* root key does that. Touching one root key where the others are locked away decouples the fate of the system from the fate of the key. It does NOT fully decouple the fate of the system since you have to begin using the locked away key, which means you have to exercise (likely little used) processes and place that formerly-locked away key into use. All of that increases risk “by an infinitesimal but non-zero amount”.
Until the safe is opened and the key is placed into service it does not share fate. Once in service it *may* share fate depending on the configuration of the processes. Ideally, if the compromise was an attack that can be mitigated, then the new key shares the benefit of that mitigation.
d. changing the root key of the DNS is and will continue to be an infrequent event (both because of (c) but more likely the PITA-ness of changing the key). This is a circular argument, we won't change the key, because we haven't changed the key because its painful to change the key so we wont' change the key. No. We won’t change the key frequently because (a) there is no operational reason that forces the key to change, (b) there is a risk — no matter how slight — that we might screw up, (c) it is expensive and time consuming to drag the necessary people into the secure facilities to spend the 2+ hours necessary to do the key handling appropriately, and (d), it is likely that rolling the key _will_ break things, the only question is how much and who will be affected.
5011, at least theoretically, gives us the ability to roll the key frequently and/or for non-crtical reasons. However, given operational realities, it isn’t clear to me that is necessary/useful/helpful. Since we have to deal with a “full trust reboot” and that provides a superset of functionality to 5011, I’m still unclear as to why we care about 5011.
I think you're underestimating by perhaps several orders of magnitude the cost of a "full trust reboot". Either that or the cost isn't important because you expect we'll never do one. If you're going to do a full manual trust reboot in the next year - it should be interesting to see how long it takes to a) get the new key out, and b) what happens when you revoke the old one. (Or if you're testing the emergency version, how bad it gets when you revoke the existing key before having the next one deployed) Or are you going to wait to do the full manual trust reboot until something happens - hopefully after we're all dead? If the end result of the discussion is already pre-disposed to be one of these results, I literally have nothing else to contribute. I think they're both bad ideas, and based on a flawed risk/reward analysis.
P.S. An honest question: how often do root X.509 CAs roll their root keys? It's kind of irrelevant, but somewhere between 5 and 20 years. I’ve been told (informally) the X.509 root CAs do not roll their root keys, period. It might be useful to get an authoritative answer on that question.
*sigh* Language issues. Generally what happens is that a new CA (using the same or a new CA instance name) are placed into service using a new key pair. No further signatures are made with the old key and the old CA certificate remains in the various browsers until revoked, removed from distribution or the CA certificate expires. The new CA certificate is distributed to the browsers several years or so in advance of the need to depend upon it. You can see this pattern by examining the browser CA lists, and looking at what CAs are signing which servers over time. The largest group of these was the replacement of the old 1024bit RSA keys. There's also the occasional re-sign of a self-signed CA certificate (changing the validity time without changing keys or other contents of the CA certificate). The new certificate is basically chained to the old certificate and replaces the old one in the browser CA trust store when its seen.
It's irrelevant since there are something like 50+ of them in common use
If you’re one of those CAs and your root’s private key is compromised, the fact that you have 49 competitors is unlikely to be much of a consolation. The point being that from the perspective of the CA, the loss of the key is an existential risk and your policies and processes are designed to deal with that risk. I see some parallels in the handling of the root key. It might be useful to understand how the CAs deal with that risk.
They mostly don't, or aren't prepared for it. DigiNotar being a prime example and bankruptcy the result. Depending on the risk structure and CPS of the CA, a compromise of the key material might result in a new CA key being created, and all the subordinate CA certificates of the old CA being re-signed. That has the nice benefit that the addition of the new CA to the CA cert stores rehabilitates potentially millions of certificates. The comment about irrelevancy of the CA model is that none of these are universal global roots of trust. They compete and mostly that causes really interesting interoperability problems. Failure of one of them is not going to have the universal/broad impact that the failure of the single DNSSEC root of trust would have.
Regards, -drc
Going back to trust reboot - think about the timeline used for the original key creation and signing ceremonies. Pretend a compromise happens "now". How long until DNSSEC is back up using the trust reboot process? Oh yeah - the compromise happened because the HSM you're using was found to be insecure. Ready,..... GO! By the way, your attacker *is* using 5011. Since he now has access to the trust anchor private key, he's using it to place new trust anchors for the BOA, Google and the IRS local resolvers by intercepting and replacing root zone queries. Mike
Mike, On Sep 21, 2014, at 9:02 PM, Michael StJohns <msj@nthpermutation.com> wrote:
*sigh*
Unhelpful.
Or do you believe we should revise the key handling policies and processes to roll _much_ more frequently? I'd suggest 1-2 years. Or basically once every 4-8 times you do a ZSK replacement.
The current DPS has 5 years. On what analysis do you base your suggestion?
I think you're underestimating by perhaps several orders of magnitude the cost of a "full trust reboot”.
Actually, not. I believe we have to be prepared for a "full trust reboot” _regardless of 5011 support_ and part of the exercise with the key change exercise we’re discussing/planning a workshop for is to ensure that preparation.
If the end result of the discussion is already pre-disposed to be one of these results,
As far as I am aware, nothing has been decided. The point of this mailing list (and the workshops) is to understand the issues more fully, expose assumptions, and establish community consensus in how to move forward.
The comment about irrelevancy of the CA model is that none of these are universal global roots of trust. They compete and mostly that causes really interesting interoperability problems. Failure of one of them is not going to have the universal/broad impact that the failure of the single DNSSEC root of trust would have.
Which would seem to argue that one must be extremely careful, much more careful than CAs, if you have a single root of trust and not expose that trust to potential risk unnecessarily. You appear to be arguing that rolling the key every 1-2 years would not increase risk over rolling it less frequently. I do not agree.
Going back to trust reboot - think about the timeline used for the original key creation and signing ceremonies. Pretend a compromise happens "now". How long until DNSSEC is back up using the trust reboot process? Oh yeah - the compromise happened because the HSM you're using was found to be insecure. Ready,..... GO!
What is the difference in this scenario without 5011 and with 5011 if you assume a compromise of all keys? Do you believe we do NOT need to be prepared for the latter? You also seem to assume there will be universal deployment of 5011 and that everyone will allow 5011 to operate on their infrastructure. Neither of these assumptions seem realistic to me.
By the way, your attacker *is* using 5011. Since he now has access to the trust anchor private key, he's using it to place new trust anchors for the BOA, Google and the IRS local resolvers by intercepting and replacing root zone queries.
_Exactly_ the reason I would see some folks choosing not to support 5011. Regards, -drc
On 9/22/2014 1:46 AM, David Conrad wrote:
Mike,
On Sep 21, 2014, at 9:02 PM, Michael StJohns <msj@nthpermutation.com> wrote:
*sigh* Unhelpful.
Or do you believe we should revise the key handling policies and processes to roll _much_ more frequently? I'd suggest 1-2 years. Or basically once every 4-8 times you do a ZSK replacement. The current DPS has 5 years. On what analysis do you base your suggestion?
Skill degradation, memory loss, personnel turn over. I'd do it every 1-2 years or never.
I think you're underestimating by perhaps several orders of magnitude the cost of a "full trust reboot”. Actually, not. I believe we have to be prepared for a "full trust reboot” _regardless of 5011 support_ and part of the exercise with the key change exercise we’re discussing/planning a workshop for is to ensure that preparation.
To clarify this: I believe you need to retain the capability to do a "full trust reboot" for the life of DNSSEC. I also believe that if you ever have to do it, the results will be catastrophic. My third belief, is that the process for doing the FTR (new acronym as of now), will need to be maintained and updated and probably won't adequately be.
The comment about irrelevancy of the CA model is that none of these are universal global roots of trust. They compete and mostly that causes really interesting interoperability problems. Failure of one of them is not going to have the universal/broad impact that the failure of the single DNSSEC root of trust would have. Which would seem to argue that one must be extremely careful, much more careful than CAs, if you have a single root of trust and not expose that trust to potential risk unnecessarily. You appear to be arguing that rolling the key every 1-2 years would not increase risk over rolling it less frequently. I do not agree.
I understand your lack of agreement. However, there is risk to everything. As I said above, having to do an FTR will be catastrophic. That could change over time if you socialize it and keep socializing it so that the every 5-7 years you do it people understand why its necessary and "nothing bad" (tm) happens. The risk you have with the status quo is that a completely unlikely set of events happens (e.g. root compromise) and you're midway through your cycle. No one knows where the knobs are to replace their root trust anchor configuration, everyone yells, and the root gets taken away from ICANN because it hasn't been a good caretaker. The major risk for putting together a key replacement cycle will be when you revoke the current existing sole root of trust key. That's when things have the most potential to break because it will be the first time we've done it. That applies both to 5011 and FTR. Get past that and a 1-2 year replacement cycle that's handled on an automated basis near universally is pretty much risk negative.
Going back to trust reboot - think about the timeline used for the original key creation and signing ceremonies. Pretend a compromise happens "now". How long until DNSSEC is back up using the trust reboot process? Oh yeah - the compromise happened because the HSM you're using was found to be insecure. Ready,..... GO! What is the difference in this scenario without 5011 and with 5011 if you assume a compromise of all keys?
You keep over constraining things. There are at least four different scenarios: 1) 5011, multiple roots of trust and a trust anchor replacement cycle. 2) 5011, multiple roots of trust and an N-1 key compromise 3) 5011 or FTR-only and a 100% compromise. 4) FTR only and a planned key replacement. In the first one, I can schedule the addition of the new key and the revocation of the old one to coincide with one of the ZSK ceremonies. All actions are ICANN's. Updates occur in the field automatically. In the second one, I need to get the signers together quickly to revoke the old one. (And I can actually figure out way to do that with the signers being remoted instead of the current process). All the actions take place on behalf of ICANN. In the third one, with 5011 I can revoke all the old keys so at least they can't be used - with FTR, I'm limited by how fast I can pass the word and how much people are paying attention. ICANN actions are dwarfed by the number of manual changes and updates necessary by resolver managers. DNSSEC goes down for the time needed to reboot it. Anything that relies on DNSSEC is now insecure. In the fourth one, you can generate the keys in advance, publish them and wait for 6 months for everyone to get the word and do the update. At some point you have to stop signing with the old key. At that point some chaos ensues because Comcast missed all the resolvers on the east coast, or an update process failed or the guy who was responsible for doing the updates was laid off. And this will happen without fail every time an FTR is done. If you do it frequently enough, people will learn (but that applies to 5011 as well), but mostly there will always be someone that didn't get the word, or someplace where no one was responsible for doing the update actions. I want Humans out of the loop to the greatest extent possible on the resolver side. It's the only way to scale this.
Do you believe we do NOT need to be prepared for the latter?
Have I said anywhere anything that would lead you to make that statement? Shit happens. The world can come to the end, and we may need to do an FTR. It doesn't mean I like the idea, nor does it mean that I think we should settle for that as our only tool in the toolbox. To be blunt and clear - _*Yes, I think we need to be prepared to do an FTR*_.
You also seem to assume there will be universal deployment of 5011 and that everyone will allow 5011 to operate on their infrastructure. Neither of these assumptions seem realistic to me.
This is where the *sigh* creeps back in. ICANN has specified in the DPS that 5011 is the method for doing key replacements. If they don't want to do 5011, you've pointed them to where the root key files are and they're responsible for tracking them manually. That's doesn't require that everyone has 5011, it does require that everyone be responsible for their own deployments. If you now change the DPS and say "we were only kidding, we're not going to use 5011", then you run into the whole problem of systems that were "relying" (legal term) on your assertions and not realizing they need to do something different when you update the root sets.
By the way, your attacker *is* using 5011. Since he now has access to the trust anchor private key, he's using it to place new trust anchors for the BOA, Google and the IRS local resolvers by intercepting and replacing root zone queries. _Exactly_ the reason I would see some folks choosing not to support 5011.
Hmm - how is this any different for a resolver automatically accepting new zone keys for say .COM because they were updated and resigned and the new chains go through new keys? The design of DNSSEC is that keys, contents and signatures can change overtime and that resolvers won't burp. Why do you think that 5011 is so much of a difference from that? Regards, -drc
Hi Mike, At 10:48 23-09-2014, Michael StJohns wrote:
To clarify this: I believe you need to retain the capability to do a "full trust reboot" for the life of DNSSEC. I also believe that if you ever have to do it, the results will be catastrophic. My third belief, is that the process for doing the FTR (new acronym as of now), will need to be maintained and updated and probably won't adequately be.
I'll read "full trust reboot" as having a state similar to start (2010). In my opinion, it is not possible. I agree that the process still needs to be maintained.
I understand your lack of agreement. However, there is risk to everything. As I said above, having to do an FTR will be catastrophic. That could change over time if you socialize it and keep socializing it so that the every 5-7 years you do it people understand why its necessary and "nothing bad" (tm) happens. The risk you have with the status quo is that a completely unlikely set of events happens (e.g. root compromise) and you're midway through your cycle. No one knows where the knobs are to replace their root trust anchor configuration, everyone yells, and the root gets taken away from ICANN because it hasn't been a good caretaker.
The major risk for putting together a key replacement cycle will be when you revoke the current existing sole root of trust key. That's when things have the most potential to break because it will be the first time we've done it. That applies both to 5011 and FTR. Get past that and a 1-2 year replacement cycle that's handled on an automated basis near universally is pretty much risk negative.
Yes.
This is where the *sigh* creeps back in. ICANN has specified in the DPS that 5011 is the method for doing key replacements. If they don't want to do 5011, you've pointed them to where the root key files are and they're responsible for tracking them manually. That's doesn't require that everyone has 5011, it does require that everyone be responsible for their own deployments.
If you now change the DPS and say "we were only kidding, we're not going to use 5011", then you run into the whole problem of systems that were "relying" (legal term) on your assertions and not realizing they need to do something different when you update the root sets.
I'll list two items: (a) When the key roll-over is scheduled (b) How it will be done. I read item (a) as the five-year mark (2015). Item (b) is RFC 5011. Another person might interpret item (a) as ten years; it postpones doing item (b). If an improbable event [1] occurs within the next five years the Security Director might have to provide an explanation. Regards, S. Moonesamy 1. http://www.onderzoeksraad.nl/uploads/investigations/Press_release_DigiNotar_...
On 22 sep 2014, at 06:02, Michael StJohns <msj@nthpermutation.com> wrote:
There's also the occasional re-sign of a self-signed CA certificate (changing the validity time without changing keys or other contents of the CA certificate). The new certificate is basically chained to the old certificate and replaces the old one in the browser CA trust store when its seen.
Mike, What browser implements the CA certificate update mechanism described above? My experience is that the only way the common browser CA trust stores are updated is when the static configured CAs are updated due as a result of a software update - never based on what's seen. jakob
Hello David, On Sun, Sep 21, 2014 at 7:55 PM, David Conrad <david.conrad@icann.org> wrote:
P.S. An honest question: how often do root X.509 CAs roll their root keys? It's kind of irrelevant, but somewhere between 5 and 20 years.
I’ve been told (informally) the X.509 root CAs do not roll their root keys, period. It might be useful to get an authoritative answer on that question.
What usually happens is they create a second generation Root CA cert and put it in the cert store, then simply stop issuing under the old Root CA and start issuing certs under the second generation Root CA. It's a bit like the 5011 standby key mechanism in the sense that you pre-publish a new trust anchor and then start using it when you want/need to.
If you’re one of those CAs and your root’s private key is compromised, the fact that you have 49 competitors is unlikely to be much of a consolation. The point being that from the perspective of the CA, the loss of the key is an existential risk and your policies and processes are designed to deal with that risk. I see some parallels in the handling of the root key. It might be useful to understand how the CAs deal with that risk.
If the Root CA is compromised, I'd think the CA would go out of business. Not because of the lack of mechanism to recover but from the catastrophic reputation damage and profound loss of trust. In addition, I bet the browsers will pull them out from the cert store in a heartbeat if its really bad. CAs deal with these risks by establishing and implementing rigorous security controls around key management and undergo third party audits to verify that the controls remain effective and are actually followed. This is kind of funny but they also transfer this risk by buying insurance but I'm not sure this helps. Certainly not applicable to us. I think the huge difference between the CA business and Root DNSSEC is that there is no going out-of-business for Root DNSSEC. It doesn't matter how ugly it gets, we have no option but to recover and keep on providing the service at all costs. Thanks, Tomofumi
Tomofumi, On Sep 21, 2014, at 9:41 PM, Tomofumi Okubo <tomofumi.okubo@gmail.com> wrote:
I think the huge difference between the CA business and Root DNSSEC is that there is no going out-of-business for Root DNSSEC.
I agree 100%, which is why I tend to be (perhaps overly) concerned with minimizing risks.
It doesn’t matter how ugly it gets, we have no option but to recover and keep on providing the service at all costs.
I don’t think there is any disagreement here on this issue. The question is how risks are mitigated. AFAICT, there is an assumption that there are two modes of potential failure: (a) a catastrophic failure in which the only option is re-bootstrapping and (b) a non-catastrophic failure in which 5011 is a (potentially) viable solution. Is anyone arguing that we do not need to be prepared for (a), regardless of how unlikely it might be? What exactly does (b) look like? That is, what is a non-catastrophic failure that would necessitate a key roll? Regards, -drc
Hello David, On Sun, Sep 21, 2014 at 11:05 PM, David Conrad <david.conrad@icann.org> wrote:
AFAICT, there is an assumption that there are two modes of potential failure: (a) a catastrophic failure in which the only option is re-bootstrapping and (b) a non-catastrophic failure in which 5011 is a (potentially) viable solution.
Yes, I fully agree.
Is anyone arguing that we do not need to be prepared for (a), regardless of how unlikely it might be?
Given the importance of the service, we definitely need to be prepared for the worst case. Does it hurt to be overly prepared? Absolutely not.
What exactly does (b) look like? That is, what is a non-catastrophic failure that would necessitate a key roll?
Off the top of my head, circumstances for planned (non-catastrophic) rollover is something like "theoretical" algorithm compromise (white paper), change of recommended algorithm or key length, HSM vendor change and periodical KSK roll (if we choose to do so). It's more like a due diligence thing. Thanks! Tomofumi
On 9/21/2014 10:55 PM, David Conrad wrote:
Since we have to deal with a “full trust reboot” and that provides a superset of functionality to 5011, I’m still unclear as to why we care about 5011.
By the way, I just realized that the above is somewhat equivalent to "If we can just buy a new car when one breaks, then why would we need repair shops." Just saying. 5011 is the repair shop. Later, Mike
On 9/21/14 10:29 PM, Michael StJohns wrote:
On 9/21/2014 10:55 PM, David Conrad wrote:
Since we have to deal with a “full trust reboot” and that provides a superset of functionality to 5011, I’m still unclear as to why we care about 5011.
By the way, I just realized that the above is somewhat equivalent to "If we can just buy a new car when one breaks, then why would we need repair shops." Just saying. 5011 is the repair shop.
I'm pro-5011, and I think we should definitely be working towards a key succession strategy. (more on that later) But, we should be clear that while 5011 is the best tool we have, and will be effective for a significant percentage of end users, it won't be anywhere close to universally effective. Even things like dnsmasq, which is widely used, and recently gained DNSSEC support, will not be helped by 5011, it's still a manual process. There are (of course) many other examples. Doug
On 9/21/2014 11:41 AM, Joe Abley wrote:
One way that an emergency roll is different from a planned roll is that a planned roll can make use of existing non-compromised KSKs and their corresponding trust anchors, whereas an emergency roll (where the emergency is a consequence of a key compromise) might not have that luxury.
This is probably not the definition the rest of us are using. Emergency: Action taken because a key is compromised. Planned: Action taken because its time to do so. The state of the key data set is orthogonal to whether a key change is planned or emergency, but has a substantial affect on the actions taken during the key change. Mike
On Sep 21, 2014, at 8:41 AM, Joe Abley <jabley@hopcount.ca> wrote:
One way that an emergency roll is different from a planned roll is that a planned roll can make use of existing non-compromised KSKs and their corresponding trust anchors, whereas an emergency roll (where the emergency is a consequence of a key compromise) might not have that luxury.
Just a placeholder here, but one that some people care about: A planned rollover could turn into an emergency rollover during the ceremony if it is discovered that the signing hardware for the current key (or all the current keys, if there are more than one) cannot be used. You can't tell if signing hardware that is not being used (because it purposely offline, maybe in a safe) will be usable until you try. --Paul Hoffman
On 21 sep 2014, at 21:17, Paul Hoffman <paul.hoffman@vpnc.org> wrote:
A planned rollover could turn into an emergency rollover during the ceremony if it is discovered that the signing hardware for the current key (or all the current keys, if there are more than one) cannot be used.
... and the off-site key backup turns out to be usable. jakob
On 9/21/2014 3:17 PM, Paul Hoffman wrote:
On Sep 21, 2014, at 8:41 AM, Joe Abley <jabley@hopcount.ca> wrote:
One way that an emergency roll is different from a planned roll is that a planned roll can make use of existing non-compromised KSKs and their corresponding trust anchors, whereas an emergency roll (where the emergency is a consequence of a key compromise) might not have that luxury. Just a placeholder here, but one that some people care about:
A planned rollover could turn into an emergency rollover during the ceremony if it is discovered that the signing hardware for the current key (or all the current keys, if there are more than one) cannot be used.
I had to read this a few times to get what I think you meant. Specifically, if a) a signature is expiring over one of the groups of keys in the trust chain, and b) the hardware breaks so that the signature will expire before you can do the resigning, then c) it's an emergency. I'm stating it that way because keys don't actually have a defined EOL, so whether we're in an emergency situation or not is tied to signature expiration rather than the time you're trying to do the re-sign. In the above scenario you have the time between your attempt and the signature expiration to recover the keys and complete the signature. It's an internally triggered event that if completed successfully, has no external implications. If you're unable to resign the root DNSKEY RRSet in time with one of the keys in the root trust anchor set, then its not actually an emergency rollover (keys aren't compromised, no one else can use them for faking data in the zone), but a failure of process. The question is then how do you recover/reboot your trust anchor set so you can reestablish a chain of trust. I think they're two very different things to consider. Mike
You can't tell if signing hardware that is not being used (because it purposely offline, maybe in a safe) will be usable until you try.
--Paul Hoffman _______________________________________________ ksk-rollover mailing list ksk-rollover@icann.org https://mm.icann.org/mailman/listinfo/ksk-rollover
On Sep 21, 2014, at 12:33 PM, Michael StJohns <msj@nthpermutation.com> wrote:
On 9/21/2014 3:17 PM, Paul Hoffman wrote:
On Sep 21, 2014, at 8:41 AM, Joe Abley <jabley@hopcount.ca> wrote:
One way that an emergency roll is different from a planned roll is that a planned roll can make use of existing non-compromised KSKs and their corresponding trust anchors, whereas an emergency roll (where the emergency is a consequence of a key compromise) might not have that luxury. Just a placeholder here, but one that some people care about:
A planned rollover could turn into an emergency rollover during the ceremony if it is discovered that the signing hardware for the current key (or all the current keys, if there are more than one) cannot be used.
I had to read this a few times to get what I think you meant. Specifically, if a) a signature is expiring over one of the groups of keys in the trust chain, and b) the hardware breaks so that the signature will expire before you can do the resigning, then c) it's an emergency.
It could also happen if this group decides to have a KSK rollover to test the system, even though the signature is not expiring. So, what you said, plus: Specifically, if a) ICANN has decide to expire one of the groups of keys in the trust chain as an operational exercise, and b) the hardware breaks so that the signing cannot be done during the exercise, then c) it's an emergency.
I'm stating it that way because keys don't actually have a defined EOL, so whether we're in an emergency situation or not is tied to signature expiration rather than the time you're trying to do the re-sign. In the above scenario you have the time between your attempt and the signature expiration to recover the keys and complete the signature. It's an internally triggered event that if completed successfully, has no external implications.
It will clearly have publicity (and thus policy) implications if there is a non-emergency ceremony happening for the KSK that has a giant "whoopsie" in the middle.
If you're unable to resign the root DNSKEY RRSet in time with one of the keys in the root trust anchor set, then its not actually an emergency rollover (keys aren't compromised, no one else can use them for faking data in the zone), but a failure of process. The question is then how do you recover/reboot your trust anchor set so you can reestablish a chain of trust.
I think they're two very different things to consider.
Fully agree, but I would call both of them emergencies. It sounds like you are defining "emergency" as a key compromise; others (including me) are defining it as that plus other bad things (like inability to sign with an uncompromised key). To be clearer, maybe we have two (gad, I hope only two) terms: "compromise emergency" and "operational emergency". In a compromise emergency, you can sign but you really don't want to; in an operational emergency, you cannot sign but you really wanted to be able to. --Paul Hoffman
This got stuck in my outbound queue for a few days. On Saturday, September 20, 2014, David Conrad <david.conrad@icann.org <mailto:david.conrad@icann.org>> wrote: Tomofumi, On Sep 19, 2014, at 11:46 AM, Tomofumi Okubo <tomofumi.okubo@gmail.com <javascript:;>> wrote: > The former is emergency roll and the latter is planned roll. > > The rollover we are discussing now falls under the latter. Actually, I don’t think we’ve started the discussion on changing the key as yet :). Is there a need to have a different set of policies/processes for an emergency roll vs. a planned roll? Is a planned roll a proper subset of an emergency roll? Yes and no. If you're using 5011 as the model, and you have two keys as trust anchors, and one associated private key is compromised, then there really isn't a lot of difference in proceedures. You revoke the compromised key, and start the process of getting a new key accepted as a trust anchor. The emergency thing shows up when all of your existing trust anchor keys are compromised. And there really isn't a way to deal with that contingency, planned or not. Basically, if A[public] is your only root trust anchor and A[private] is compromised, you're dead in the water. You can attempt to add new B[public] keys to the trust anchor set using 5011, but there's a good chance that your attacker is attempting to dothe same thing. If the attacker revokes A[public] by setting the bit and signing the root DNSKEY RRSet with that key, what probably happens is that ALL data without subordinate trust anchors is considered invalid by resolvers and rejected. 5011 gave very specific guidance on what needed to go into the root DNSKEY RRSet to avoid this case - but the current RRSet and Trust Anchor set are missing the second KSK. Mike Regards, -drc
On Thu, Sep 18, 2014 at 1:46 AM, Joe Abley <jabley@hopcount.ca> wrote:
Greetings,
Some notes about our upcoming operational discussions regarding future root zone KSK rollover follow, including a heartfelt request for input on the agenda.
Joe, My understanding of the specs of the HSMs used in the current key ceremonies is that they have a lifetime of around 5 years. Does this need to be factored into the planning for KSK rollover? Other equipment used in the process may also fail. andy
Hi Andy, On 22 Sep 2014, at 6:53, Andy Linton <asjl@lpnz.org> wrote:
My understanding of the specs of the HSMs used in the current key ceremonies is that they have a lifetime of around 5 years. Does this need to be factored into the planning for KSK rollover?
Other equipment used in the process may also fail.
Tomofumi will be able to speak to this better than I can, but my recollection is that the AEP Keypers have an expected battery lifetime of ten years, but in order to remain in warranty need to be reconditioned every five years. As we know from ceremony 1, it's entirely possible to clone an individual HSM onto a new HSM that has arrived in a tamper-evident bag from the factory: we did this in order to replicate the crypto assets from Culpeper to El Segundo before the keys went into production. So the existing HSMs could be replaced with brand new ones (there's an acceptance process already documented and exercised) and the old ones could be reased, tampered and securely destroyed, without any key materials leaving either facility, all within a ceremony. There were thoughts in the past that the 5 year warranty period was nicely aligned with the expectation of rolling the key within 5 years of production use, and hence perhaps the hardware replacement and the key roll could happen concurrently. That'd be an optimisation of the two processes, though, and not a requirement. Other hardware *has* failed since 2009; laptops have been replaced, for example. I think it's reasonable to leave equipment failure/replacement as implementation details for the ICANN staff that run the ceremonies, and not to factor it in to our thinking on how to roll the KSK. Joe
Hi Andy, On Sep 22, 2014, at 3:53 AM, Andy Linton <asjl@lpnz.org> wrote:
My understanding of the specs of the HSMs used in the current key ceremonies is that they have a lifetime of around 5 years. Does this need to be factored into the planning for KSK rollover?
The short answer is no. Irrespective of the work on KSK rollover, our intention is to either replace or augment the existing HSMs with new units next year. If there is a key rollover and it nicely aligns to this schedule, great, but I don't think the two activities are necessarily aligned. The HSMs report their battery status upon use and all are reporting good battery condition. We have spoken with the vendor and they have indicated they are expected to last more than 5 years. There seems to be no reason for immediate concern however we are taking a conservative approach to it. kim
Andy- I pretty much agree /w Kim. However, just for the record, as many of you know my personal opinion has been to either roll the KSK (as the community seems to desire) or replace the HSMs in a key ceremony with the 7 RKSHs present sooner than later. As 5 years was the recommended replacement interval for the HSMs (and what we therefore put in the DPS as a sensible rollover date), IMHO the conservative approach would be to replace the units before the 5 year mark as the units were purchased about a year prior to putting into service. As you know the HSMs are rarely "plugged-in" meaning that we are operating in the worst case scenario for the batteries which do not charge when plugged in (they are "primary" cells). So the tamper circuitry is continually drawing current. So, although the shelf-life of the battery is 10years according to the data sheet, the lifetime will be much less. That being said, Kim is right that conversations with the HSM vendor indicate that we are probably ok past 5 years. At minimum I have suggested that we review our HSM failure recovery procedures and be prepared. So I believe we are covered even in the worst case. I hope that helps. -Rick -----Original Message----- From: ksk-rollover-bounces@icann.org [mailto:ksk-rollover-bounces@icann.org] On Behalf Of Kim Davies Sent: Monday, September 22, 2014 7:02 PM To: Andy Linton Cc: ksk-rollover@icann.org Subject: Re: [ksk-change] [ksk-rollover] root zone KSK rollover operations workshop planning Hi Andy, On Sep 22, 2014, at 3:53 AM, Andy Linton <asjl@lpnz.org> wrote:
My understanding of the specs of the HSMs used in the current key
ceremonies is that they have a lifetime of around 5 years. Does this need to be factored into the planning for KSK rollover? The short answer is no. Irrespective of the work on KSK rollover, our intention is to either replace or augment the existing HSMs with new units next year. If there is a key rollover and it nicely aligns to this schedule, great, but I don't think the two activities are necessarily aligned. The HSMs report their battery status upon use and all are reporting good battery condition. We have spoken with the vendor and they have indicated they are expected to last more than 5 years. There seems to be no reason for immediate concern however we are taking a conservative approach to it. kim _______________________________________________ ksk-rollover mailing list ksk-rollover@icann.org https://mm.icann.org/mailman/listinfo/ksk-rollover
participants (15)
-
Andy Linton -
Anne-Marie Eklund-Löwinder -
David Conrad -
Doug Barton -
Jakob Schlyter -
Joe Abley -
Kim Davies -
manning bill -
Michael StJohns -
Paul Hoffman -
Peter Koch -
Phil Regnauld -
Richard Lamb -
S Moonesamy -
Tomofumi Okubo