Re: [ksk-change] planned vs. emergency (was Re: [ksk-rollover] root zone KSK ...)
Hi Tomofumi, At 21:41 21-09-2014, Tomofumi Okubo wrote:
CAs deal with these risks by establishing and implementing rigorous security controls around key management and undergo third party audits to verify that the controls remain effective and are actually followed. This is kind of funny but they also transfer this risk by buying insurance but I'm not sure this helps. Certainly not applicable to us.
I think the huge difference between the CA business and Root DNSSEC is that there is no going out-of-business for Root DNSSEC. It doesn't matter how ugly it gets, we have no option but to recover and keep on providing the service at all costs.
Thanks for the above explanation. It seems that the CA business is being conflated with Root DNSSEC. There is supposed to be redundancy as part of the DNSSEC practice to reduce the risks. The HSMs are offline. The risk there is physical access [1]. An emergency roll-over could, in simple terms, be when a private key is lost or compromised. A planned roll-over reduces the likelihood of that happening. The reluctance to do that planned roll-over is probably because: (a) It has never been done before. (b) There will be an operational impact. It is difficult to assess (b) because of (a). What there is now is "the root key" [2]. It is not a good idea, in my opinion, to have "the root key"[3]. Regards, S. Moonesamy 1. I'll skip a discussion of that. 2. Credits to Michael StJohns 3. I am aware that it is a shared key.
On Sep 21, 2014, at 11:37 PM, S Moonesamy <sm+icann@elandsys.com> wrote:
It seems that the CA business is being conflated with Root DNSSEC.
No.
There is supposed to be redundancy as part of the DNSSEC practice to reduce the risks. The HSMs are offline. The risk there is physical access [1]. An emergency roll-over could, in simple terms, be when a private key is lost or compromised. A planned roll-over reduces the likelihood of that happening.
If the risk is physical access, then the implication of a planned rollover is that that physical access occurs (much) more frequently than if the physical access is limited to the times when emergency rollover is needed. As such, it actually increases the likelihood of it happening. What a planned rollover does do is provide more experience in the hopes that we can recover more easily. Of course, if the private key is lost or compromised, you can’t use 5011 for a rollover.
The reluctance to do that planned roll-over is probably because: (a) It has never been done before.
No. We never signed the root before, yet we did that.
(b) There will be an operational impact.
Repeating part of a previous message: "(a) there is no operational reason that forces the key to change, (b) there is a risk — no matter how slight — that we might screw up, (c) it is expensive and time consuming to drag the necessary people into the secure facilities to spend the 2+ hours necessary to do the key handling appropriately, and (d), it is likely that rolling the key _will_ break things, the only question is how much and who will be affected."
It is difficult to assess (b) because of (a). What there is now is "the root key" [2]. It is not a good idea, in my opinion, to have "the root key"[3].
I agree that having a single key (shared or not) is not ideal. However, current limitations in algorithms/lengths imply some risk of resolver failure if multiple keys are used. Regards, -drc
On Sep 22, 2014, at 4:29 PM, David Conrad <david.conrad@icann.org<mailto:david.conrad@icann.org>> wrote: "(a) there is no operational reason that forces the key to change, (b) there is a risk — no matter how slight — that we might screw up, (c) it is expensive and time consuming to drag the necessary people into the secure facilities to spend the 2+ hours necessary to do the key handling appropriately, and (d), it is likely that rolling the key _will_ break things, the only question is how much and who will be affected.” I believe that the argument to roll to make sure bad operational habits are not ossified (paraphrased from RFC6771) is an operational reason (I’ve pasted it below for reference). So I do not (fully) agree with (a). I would agree if you would say inherit operational reason. The question is really what sort of breakage (and associated costs) do we accept now versus when we do have an inherit operational reason. I believe, but that is not very helpful I realize, that by accepting some breakage today (mainly accepting (d)) we will reduce the fraction of folk that suffer (d) in the future. At some point that argument will not hold because the amount of people in the (d) category are to many or more than only a number of early deployers that still track the technology developments. —Olaf RFC 6781 3.2.2: However, the "operational habit" argument also applies to trust anchor reconfiguration at the clients' validators. If a short key effectivity period is used and the trust anchor configuration has to be revisited on a regular basis, the odds that the configuration tends to be forgotten are smaller. In fact, the costs for those users can be minimized by automating the rollover with RFC 5011<http://tools.ietf.org/html/rfc5011> [RFC5011<http://tools.ietf.org/html/rfc5011>] and by rolling the key regularly (and advertising such) so that the operators of validating resolvers will put the appropriate mechanism in place to deal with these stability costs: In other words, budget for these costs instead of incurring them unexpectedly. — — — — — — — — — — Olaf Kolkman (on personal title)
On Mon, Sep 22, 2014 at 03:37:07PM +0000, Olaf Kolkman wrote:
The question is really what sort of breakage (and associated costs) do we accept now versus when we do have an inherit operational reason. I believe, but that is not very helpful I realize, that by accepting some breakage today (mainly accepting (d)) we will reduce the fraction of folk that suffer (d) in the future. At some point that argument will not hold because the amount of people in the (d) category are to many or more than only a number of early deployers that still track the technology developments.
I've had a lot of sympathy for the 'operational habit' school of thought when we discussed this for 4641 and, subsequently, 6781. I'm less convinced today, seeing that all of the tedious rollovers tend to have been "fully automated" to the extent that all details are hidden from the real operator. Now, maybe we've trained SW vendors this way, for what it's worth, but the early idea to drill for regular rollovers to ease emergency ones probably doesn't hold. -Peter
Hi David, At 07:29 22-09-2014, David Conrad wrote:
If the risk is physical access, then the implication of a planned rollover is that that physical access occurs (much) more frequently than if the physical access is limited to the times when emergency rollover is needed. As such, it actually increases the likelihood of it happening. What a planned rollover does do is provide more experience in the hopes that we can recover more easily.
Of course, if the private key is lost or compromised, you cant use 5011 for a rollover.
Based on publicly available information there is physical access every six months per KMF. I suggested to IKOS to have any planned key roll-over within that event. That is to avoid any additional physical access [1].
Repeating part of a previous message:
"(a) there is no operational reason that forces the key to change, (b) there is a risk no matter how slight that we might screw up, (c) it is expensive and time consuming to drag the necessary people into the secure facilities to spend the 2+ hours necessary to do the key handling appropriately, and (d), it is likely that rolling the key _will_ break things, the only question is how much and who will be affected."
Nobody will want to authorize an emergency roll-over as (a) and (b) will weigh heavily against doing that. I am personally aware of (c). I have never viewed the time as an issue; I am there to perform a task and I would like to see it done correctly. I agree that it is likely that rolling a key (d) will break things. The discussions (not on this mailing list) about that have been about how much will break and who will be affected. Regards, S. Moonesamy 1. http://data.iana.org/ksk-ceremony/18/KSK18-CAM1.mp4
Hello Subramanian, I like the idea of assessing the risks of doing the KSK rollover. Do you think it will help to facilitate this discussion if we perform a high-level risk assessment for the KSK rollover in this group so we can form a rough consensus around what exactly the risks are and how they should be treated? Risks could actually be dealt in four ways; mitigate, accept, transfer or avoid. Avoiding (not doing it) is one way but not the only way. Also, I'd like to stress that we are not just talking about present but the future. The day the algorithm (or key length) is going be obsolete is coming no matter what. Sometimes, not taking precaution could be seen as lack of due diligence. I believe if we don't take action now, the issue is going to get bigger in the future. Cheers! Tomofumi On Mon, Sep 22, 2014 at 8:50 AM, S Moonesamy <sm+icann@elandsys.com> wrote:
Hi David, At 07:29 22-09-2014, David Conrad wrote:
If the risk is physical access, then the implication of a planned rollover is that that physical access occurs (much) more frequently than if the physical access is limited to the times when emergency rollover is needed. As such, it actually increases the likelihood of it happening. What a planned rollover does do is provide more experience in the hopes that we can recover more easily.
Of course, if the private key is lost or compromised, you can’t use 5011 for a rollover.
Based on publicly available information there is physical access every six months per KMF. I suggested to IKOS to have any planned key roll-over within that event. That is to avoid any additional physical access [1].
Repeating part of a previous message:
"(a) there is no operational reason that forces the key to change, (b) there is a risk — no matter how slight — that we might screw up, (c) it is expensive and time consuming to drag the necessary people into the secure facilities to spend the 2+ hours necessary to do the key handling appropriately, and (d), it is likely that rolling the key _will_ break things, the only question is how much and who will be affected."
Nobody will want to authorize an emergency roll-over as (a) and (b) will weigh heavily against doing that.
I am personally aware of (c). I have never viewed the time as an issue; I am there to perform a task and I would like to see it done correctly.
I agree that it is likely that rolling a key (d) will break things. The discussions (not on this mailing list) about that have been about how much will break and who will be affected.
Regards, S. Moonesamy
1. http://data.iana.org/ksk-ceremony/18/KSK18-CAM1.mp4 _______________________________________________ ksk-rollover mailing list ksk-rollover@icann.org https://mm.icann.org/mailman/listinfo/ksk-rollover
Tomofumi, On Sep 22, 2014, at 11:40 AM, Tomofumi Okubo <tomofumi.okubo@gmail.com> wrote:
Risks could actually be dealt in four ways; mitigate, accept, transfer or avoid. Avoiding (not doing it) is one way but not the only way.
Perhaps there is some confusion: is anyone actually suggesting we don’t do key rollovers? I certainly am not (if there is any doubt). I am trying to draw assumptions and implications out so we can discuss them more fully, with some vague hope of reaching consensus. Regards, -drc
Hello David, On Mon, Sep 22, 2014 at 3:04 PM, David Conrad <david.conrad@icann.org> wrote:
Perhaps there is some confusion: is anyone actually suggesting we don’t do key rollovers?
I had an impression that some people are not too enthusiastic about it. I'm sorry if I misread it.
I certainly am not (if there is any doubt).
No doubt there.
I am trying to draw assumptions and implications out so we can discuss them more fully, with some vague hope of reaching consensus.
Sounds good to me. Thanks, Tomofumi
at one point in early discussions, there was a suggestion that the signature validity period expire in 2040, after the unix epoch. the thinking was, the # of conversions that would need to be done then would make it an ideal time to also roll the keys. and in some alternate universe, that might make lots of sense. RFC 5011 does make certain presumptions. The nature of what it means to be connected and what, if any, field upgrade capability exists have changed. So will you acknowledge O’Dells law or not? Will those of you who have a dog in this fight want regular existence proof that the backup plans work or not. (The data center analogy of regularly testing the backup generators comes to mind.) Mike has drilled down on a very good point re emergency rollover. /bill PO Box 12317 Marina del Rey, CA 90295 310.322.8102 On 22September2014Monday, at 15:48, Tomofumi Okubo <tomofumi.okubo@gmail.com> wrote:
Hello David,
On Mon, Sep 22, 2014 at 3:04 PM, David Conrad <david.conrad@icann.org> wrote:
Perhaps there is some confusion: is anyone actually suggesting we don’t do key rollovers?
I had an impression that some people are not too enthusiastic about it. I'm sorry if I misread it.
I certainly am not (if there is any doubt).
No doubt there.
I am trying to draw assumptions and implications out so we can discuss them more fully, with some vague hope of reaching consensus.
Sounds good to me.
Thanks, Tomofumi _______________________________________________ ksk-rollover mailing list ksk-rollover@icann.org https://mm.icann.org/mailman/listinfo/ksk-rollover
[no substance below, only ‘process’] On Sep 23, 2014, at 12:48 AM, Tomofumi Okubo <tomofumi.okubo@gmail.com<mailto:tomofumi.okubo@gmail.com>> wrote: Perhaps there is some confusion: is anyone actually suggesting we don’t do key rollovers? I had an impression that some people are not too enthusiastic about it. I'm sorry if I misread it. I certainly am not (if there is any doubt). No doubt there. Well, I started to misinterpret the thread, so I am happy this stake is put firmly in the ground: It is not about if we will being a doing a key rollover but about when. Can we also narrow down the parameters around ‘when’? (Further down this thread I see SM attempting that). I am trying to draw assumptions and implications out so we can discuss them more fully, with some vague hope of reaching consensus. Fair. It is probably good for somebody to conclude that brainstorming phase at some point and summarize what has been said. —Olaf — — — — — — — — — — Olaf Kolkman (on personal title)
Olaf, On Sep 23, 2014, at 1:05 AM, Olaf Kolkman <kolkman@isoc.org> wrote:
It is not about if we will being a doing a key rollover but about when.
Actually, I’d say it is about: - what do we want to do in addition to rolling the key (e.g., longer key size, change algorithms, add more keys, etc) - the exact methodology by which we will roll the key. - how frequently will we roll the key - what’s going to break when we roll the key (and how do we mitigate/remedy that breakage) I see the “when” bit as a relatively minor detail once we get the above ironed out.
Fair. It is probably good for somebody to conclude that brainstorming phase at some point and summarize what has been said.
I’m still seeing “brain squalls”, so I don’t think we’re at a summarization point as yet (but that might just be me).. Regards, -drc
On Sep 23, 2014, at 9:58 AM, David Conrad <david.conrad@icann.org> wrote:
Actually, I’d say it is about: - what do we want to do in addition to rolling the key (e.g., longer key size, change algorithms, add more keys, etc) - the exact methodology by which we will roll the key. - how frequently will we roll the key - what’s going to break when we roll the key (and how do we mitigate/remedy that breakage)
I see the “when” bit as a relatively minor detail once we get the above ironed out.
+1. In fact, the "when" is dependent on some of the earlier bits. For example, doing a key roll after adding a second key has completely different operational properties for ICANN, and for the relying parties, than rolling the single current key. --Paul Hoffman
On 23 sep. 2014, at 19:11, Paul Hoffman <paul.hoffman@vpnc.org> wrote:
On Sep 23, 2014, at 9:58 AM, David Conrad <david.conrad@icann.org> wrote:
Actually, I’d say it is about: - what do we want to do in addition to rolling the key (e.g., longer key size, change algorithms, add more keys, etc) - the exact methodology by which we will roll the key. - how frequently will we roll the key - what’s going to break when we roll the key (and how do we mitigate/remedy that breakage)
I see the “when” bit as a relatively minor detail once we get the above ironed out.
+1. In fact, the "when" is dependent on some of the earlier bits. For example, doing a key roll after adding a second key has completely different operational properties for ICANN, and for the relying parties, than rolling the single current key.
OK that clarifies. --OK
On Sep 22, 2014, at 8:50 AM, S Moonesamy <sm+icann@elandsys.com> wrote:
Based on publicly available information there is physical access every six months per KMF. I suggested to IKOS to have any planned key roll-over within that event. That is to avoid any additional physical access [1].
Sounds reasonable. Of course, if there are multiple keys, it may mean additional KMFs are necessary which means the current physical access strategy may not be applicable.
Nobody will want to authorize an emergency roll-over as (a) and (b) will weigh heavily against doing that.
It isn’t a question of want. You are assuming there is an option to not authorize an emergency key roll?
I am personally aware of (c). I have never viewed the time as an issue; I am there to perform a task and I would like to see it done correctly.
Thank you (sincerely) for your efforts. However, I’ve been told by several people that it is becoming increasingly challenging to get sufficient TCRs to show up for key ceremonies.
I agree that it is likely that rolling a key (d) will break things. The discussions (not on this mailing list) about that have been about how much will break and who will be affected.
I believe topics like what will break, who it will affect, and mitigation/recovery strategies would be useful topics for the workshop. Regards, -drc
On 9/22/2014 2:53 PM, David Conrad wrote:
I am personally aware of (c). I have never viewed the time as an issue; I am there to perform a task and I would like to see it done correctly. Thank you (sincerely) for your efforts. However, I’ve been told by several people that it is becoming increasingly challenging to get sufficient TCRs to show up for key ceremonies.
Hmm... this may be something else to consider. Right now, we're doing pretty much straight signature stuff. The policy enforced by the HSM is simply "if enabled by sufficient smart cards, then sign the data offered". That requires everyone to mostly be in one room. Instead, if you had an HSM which had a policy wrapper around the KSK key that said "I will sign something if I can verify that N of K other authorized entities have signed the exact thing". Each of the TCRs does a signature using their own key (using smart card etc) over the new KSK data. Each of those signed blobs is fed into the HSM one at a time until the HSM was able to identify N valid and different signatures and signers of the same data, at which point it emits its own signature over that object. The feeding in of the signed blobs becomes a local administrative issue rather than a critical cryptographic process. Doing something like this with Javacard would be simple. It also means that the partial signers get an extended chance to examine the data to be signed prior to the signing action. Just another possibility. Mike
Hi David, At 11:53 22-09-2014, David Conrad wrote:
Sounds reasonable. Of course, if there are multiple keys, it may mean additional KMFs are necessary which means the current physical access strategy may not be applicable.
Within the scope of the current discussion I do not think that the current physical assess strategy is a problem. I suggest thinking very carefully about "additional KMFs" before even floating that idea.
It isn't a question of want. You are assuming there is an option to not authorize an emergency key roll?
It is a choice. In general, it is not an option which a person(s) would blindly authorize. The planned key roll-over is also about avoiding having to face such a choice.
Thank you (sincerely) for your efforts. However, I've been told by several people that it is becoming increasingly challenging to get sufficient TCRs to show up for key ceremonies.
I am aware that event scheduling is a challenge. I have commented about that on the relevant mailing list.
I believe topics like what will break, who it will affect, and mitigation/recovery strategies would be useful topics for the workshop.
Agreed. Regards, S. Moonesamy
On 9/22/2014 10:29 AM, David Conrad wrote:
simple terms, be when a private key is lost or compromised. A planned roll-over reduces the likelihood of that happening. If the risk is physical access, then the implication of a planned rollover is that that physical access occurs (much) more frequently than if the physical access is limited to the times when emergency rollover is needed. As such, it actually increases the likelihood of it happening. What a planned rollover does do is provide more experience in the hopes that we can recover more easily.
Of course, if the private key is lost or compromised, you can’t use 5011 for a rollover.
I'm going to use "the private key is permanently unavailable to the operator (and generally anyone else) for cryptographic operations and can't be recovered" for "lost"; and "the private key is available to a second party for cryptographic operations" for "compromised". "Lost" and "Compromised" means it can't be used by the operator, but that an attacker can use it. Feel free to indicate if those differ from what you meant. Scenario: - Single trust anchor key, and the associated private key is lost. Impact: No further signing of the root is possible, a complete trust reboot is necessary. Chaos ensues. - Multiple trust anchor keys, and the associated private key for the active key is lost. Impact: The active key can't be revoked, but another existing trust anchor key may be activated to sign the DNSKEY RRSet. Using 5011, a replacement for the lost previously active key can be added to the trust anchor key set. The lost key is removed from the DNSKEY RRSet and transitions to the "missing" state permanently. Removing it from the resolver configuration requires a manual action on the part of the resolvers, but there is no security implication for leaving it there as the private key is not available. - Single trust anchor key, and the associated private key is compromised. Impact: The root operator revokes the trust anchor key, and then initiates a complete trust reboot. Chaos ensues. - Multiple trust anchor keys, and the associated private key for the active key is compromised. Impact: The root operator revokes the trust anchor key to resolve the compromise. It places one of the standby keys in active mode and re-signs the root DNSKEY RRSet. It uses 5011 to add a stand by replacement for the revoked active key. After 30 days, the revoked key is removed from the DNSKEY RRSet and transitions from the revoked stage to the removed stage after the hold down period expires. The root zone is back to where it was prior to the compromise. - Single trust anchor key and the associated key is both lost and compromised. No further signing of the root is possible by the root operator, but an attacker can sign the root DNSKEY RRSets and possibly force the acceptance of a new trust anchor in some portion of the resolvers. The root operator needs to initiate a complete trust reboot, similar to the first scenario, but unlike that scenario can't force the DNS into an appropriate error state to prevent the attacker from presenting data as valid. Chaos ensues. - Multiple trust anchor key, and the active key is both lost and compromised. The root operator can't revoke the active key, but can add other trust anchors using 5011 and controls the publication of the root zone. Some resolvers will be populated with rogue trust anchors by the attacker. This will require a partial trust reboot. There won't be a total outage period, but resolver operators will need to manually remove the lost key. Chaos ensues. In general, the probability of losing the key material at the same time its compromised is far past astronomical given the current controls. There are multiple backups in multiple locations and the chances of all of those being unavailable to reconstitute the private key are near 0. You *might* be able to do it by placing large explosive devices, but even then its hit and miss. So my scenarios where 5011 is useful are multiple keys, N-1 keys max either lost OR compromised, but not both. You need to use a trust reboot on any single key scenario. The only time you need to use it on a multikey scenario is if you both lose and compromise the key. None of these scenarios are equally likely.
On Sep 22, 2014, at 10:25 AM, Michael StJohns <msj@nthpermutation.com> wrote:
Chaos ensues.
Or out of band works fine. One of the purposes of this workshop is to work out how such a key change can happen. RFC 5011 is one way that has been brought up; out of band updates is another. --Paul Hoffman
Mike, On Sep 22, 2014, at 10:25 AM, Michael StJohns <msj@nthpermutation.com> wrote:
- Single trust anchor key, and the associated private key is lost. Impact: No further signing of the root is possible, a complete trust reboot is necessary.
Yep.
Chaos ensues.
Nope. Loss of key does not cause the Internet to stop. In this scenario, we have some amount of time in which to re-bootstrap trust. OS and resolver vendors push out a software update via the methodologies they already use to update their software (e.g., “Software Update” on Macs).
- Multiple trust anchor keys, and the associated private key for the active key is lost. Impact: The active key can't be revoked, but another existing trust anchor key may be activated to sign the DNSKEY RRSet. Using 5011, a replacement for the lost previously active key can be added to the trust anchor key set.
Sure, assuming 5011 is implemented and local security policy allows for the trust anchor configuration data to be overwritten spontaneously by an external party. Neither of which I believe we can rely upon, thus we have to be able to fall back to other mechanisms such as a “Software Update”-like system. [rest of scenarios deleted as they are all variations on the same theme] I believe part of the confusion is that we’re mixing things here: 1) single vs. multiple keys 2) transitioning between keys I’m am not arguing single keys are desirable. I would love to have multiple key (it’s one of the reasons I’ve been pushing to get the status of ECC support tested). However, there is the keyset size issue. I do not feel comfortable handwaving away fallback to TCP or fragmentation or middle boxes that react poorly to large response sizes for the root of the DNS. Call it a character flaw. However, this particular concern can probably be addressed with data. If keyset size can be shown not to break things (at least too much) or we can move to something that allows smaller keys with the same (or better) security level without blowing size limits, it’s all good. It would seem our real disagreement is around transitioning the keys. I believe that regardless of the number of keys, we need to be prepared for re-bootstrapping trust. I gather you feel otherwise. I also feel that we cannot rely on 5011 and as such, we have to assume the lowest common denominator, namely out-of-band key update mechanisms. This of course does not preclude the use of 5011 in environments where it exists and is permitted to operate (e.g., large managed networks), rather I simply do not believe it is something we can assume in the real world. Regards, -drc
participants (8)
-
David Conrad -
manning bill -
Michael StJohns -
Olaf Kolkman -
Paul Hoffman -
Peter Koch -
S Moonesamy -
Tomofumi Okubo