Where should IDN translation happen?

Michael Casadevall

Nov. 13, 2018

7:23 p.m.

So, here is a question for the list: Where in the stack should IDN/EAI translation happen? Should it happen in the user application, or lower in the stack such as the core libraries that handle things like TLS connections? The reason I ask is that I once managed to blow up Python connecting to a IDN website with 10 line program (https://gist.github.com/NCommander/04e34b3fcc4af347bcfedbc053b08c06 - with error output) trying to use the built in ssl library to connect to an IDN website. I’ve filed a bug with Python upstream (https://bugs.python.org/issue35234) but I think it raises a larger question. Python’s import ssl library is a fairly thin wrapper over OpenSSL which is what started this investigation. During an earlier conversation on this list, and during ICANN63, I began to have concerns about the health of WebPKI in relation to internationalized domain names, and I spoke to Don on investigating these issues more in-depth. As a follow up to that conversation, I began poking OpenSSL with a stick, and found much to my surprise, it has absolutely no support for IDN* or EAIs: See https://gist.github.com/NCommander/d68f66f74ba0122f8bb4567ca10a6a40 for an example This leads library makers and applications to handle IDNs manually, which in the case of Python, if they made a mistake can lead to the above error, namely blowing up Python. Now, in practice, OpenSSL not directly supporting IDNs seems to have relatively little effect on its own. After taking a very deep dive through the RFCs, it appears that in all relevant places, everything relating to web TLS certificates takes EAI5Address encoding and thus requires punycode representation. However, as I’ve just shown, there are obviously places where things have fallen through the cracks and I think it warrants a deeper investigation. The thing is though, and just to reiterate the question, just where in the stack should IDN translation happen? In the above case, had OpenSSL supported IDNs directly, it would have prevented this bug in the first place. That being said, since TLS essentially only uses A-labels as far as I can tell, I can’t necessarily say it’s wrong that OpenSSL doesn’t support IDNs. I think, though, this is an area which, in general, that needs more attention, especially if I can break a popular programming language with a trivial example. Thoughts and comments welcome, Michael * - there is one special case for wildcards

Attachments:

pEpkey.asc (application/pgp-keys — 2.4 KB)

Show replies by date

John Levine

November 2018

1:19 a.m.

New subject: OpenSSL, was Where should IDN translation happen?

In article <8ac39f55-d9ae-7c8a-f50c-4535e0428263@casadevall.pro> you write:

...

This leads library makers and applications to handle IDNs manually, which in the case of Python, if they made a mistake can lead to the above error, namely blowing up Python.

It's actually worse than that. Python's built-in IDNA support is IDNA2003 which, as we saw in a recent thread, can give some fairly wrong results. There is a correct idna library which is a drop-in replacement but I doubt many programmers know the difference.

...

In the above case, had OpenSSL supported IDNs directly, it would have prevented this bug in the first place. That being said, since TLS essentially only uses A-labels as far as I can tell, I can’t necessarily say it’s wrong that OpenSSL doesn’t support IDNs.

RFCs 8398 and 8399 allow EAI mail addresses as Alternative Names and suggest pretty strongly that even though the domains in certs are A-labels, libraries should handle U-labels and convert where needed. Since they have to handle U-labels in the EAI addresses, the domains aren't a lot of extra work. I presume that at some point OpenSSL will catch up with those RFCs but I don't know what the schedule is. R's, John

Dmitry Belyavsky

8:04 a.m.

New subject: OpenSSL, was Where should IDN translation happen?

Dear John,

...

In the above case, had OpenSSL supported IDNs directly, it would have

...
prevented this bug in the first place. That being said, since TLS essentially only uses A-labels as far as I can tell, I can’t necessarily say it’s wrong that OpenSSL doesn’t support IDNs.

RFCs 8398 and 8399 allow EAI mail addresses as Alternative Names and suggest pretty strongly that even though the domains in certs are A-labels, libraries should handle U-labels and convert where needed. Since they have to handle U-labels in the EAI addresses, the domains aren't a lot of extra work.

I presume that at some point OpenSSL will catch up with those RFCs but I don't know what the schedule is.

As I wrote before, I've started to implement RFC 8399 and the show-stopper for now is obtaining a set of test cases. OpenSSL team does not want to link OpenSSL with, say, libidn (and to implement IDN conversion inside the library for domains). I've found out that 2-3 functions inherited from RFC 3492 will fit all the purposes necessary to implement RFC 8399. -- SY, Dmitry Belyavsky

John Levine

12:59 p.m.

New subject: OpenSSL, was Where should IDN translation happen?

On Wed, 14 Nov 2018, Dmitry Belyavsky wrote:

...

OpenSSL team does not want to link OpenSSL with, say, libidn (and to implement IDN conversion inside the library for domains). I've found out that 2-3 functions inherited from RFC 3492 will fit all the purposes necessary to implement RFC 8399.

Wait -- surely you know that you can't just punycode any old UTF-8 and expect it to work. I can understand why openssl wouldn't want all of libidn2 but at least you need to check that the strings are all valid IDNA2008 code points. If you don't, you're going to have hard to find bugs with names that look the same but aren't normalized so comparisons will fail. Regards, John Levine, john.levine@standcore.com Standcore LLC

Dmitry Belyavsky

3:28 p.m.

New subject: OpenSSL, was Where should IDN translation happen?

Dear John, On Wed, Nov 14, 2018 at 3:59 PM John Levine <john.levine@standcore.com> wrote:

...

On Wed, 14 Nov 2018, Dmitry Belyavsky wrote:

...
OpenSSL team does not want to link OpenSSL with, say, libidn (and to implement IDN conversion inside the library for domains). I've found out that 2-3 functions inherited from RFC 3492 will fit all the purposes necessary to implement RFC 8399.

Wait -- surely you know that you can't just punycode any old UTF-8 and expect it to work. I can understand why openssl wouldn't want all of libidn2 but at least you need to check that the strings are all valid IDNA2008 code points.

If you don't, you're going to have hard to find bugs with names that look the same but aren't normalized so comparisons will fail.

If I read the RFC 8398 correctly, to verify the chain we do not need to punycode anything. We need to unpunycode to compare email with nameConstraints. -- SY, Dmitry Belyavsky

John Levine

4:07 p.m.

New subject: OpenSSL, was Where should IDN translation happen?

On Wed, 14 Nov 2018, Dmitry Belyavsky wrote:

...

If I read the RFC 8398 correctly, to verify the chain we do not need to punycode anything. We need to unpunycode to compare email with nameConstraints.

I suppose, if you are 100% sure that the UTF-8 email you're comparing it with has the domain part fully normalized according to IDNA2008 specs. Regards, John Levine, john.levine@standcore.com Standcore LLC

Dmitry Belyavsky

4:59 p.m.

New subject: OpenSSL, was Where should IDN translation happen?

On Wed, Nov 14, 2018 at 7:07 PM John Levine <john.levine@standcore.com> wrote:

...

On Wed, 14 Nov 2018, Dmitry Belyavsky wrote:

...
If I read the RFC 8398 correctly, to verify the chain we do not need to punycode anything. We need to unpunycode to compare email with nameConstraints.

I suppose, if you are 100% sure that the UTF-8 email you're comparing it with has the domain part fully normalized according to IDNA2008 specs.

Got your point. If nameConstraints and email itself are encoded with the same errors, it will work; otherwise we get nasty errors. -- SY, Dmitry Belyavsky

Dmitry Belyavsky

8:29 a.m.

New subject: OpenSSL, was Where should IDN translation happen?

Dear John, On Wed, Nov 14, 2018 at 7:59 PM Dmitry Belyavsky <beldmit@gmail.com> wrote:

...

On Wed, Nov 14, 2018 at 7:07 PM John Levine <john.levine@standcore.com> wrote:

...
On Wed, 14 Nov 2018, Dmitry Belyavsky wrote:

...
If I read the RFC 8398 correctly, to verify the chain we do not need to punycode anything. We need to unpunycode to compare email with nameConstraints.

I suppose, if you are 100% sure that the UTF-8 email you're comparing it with has the domain part fully normalized according to IDNA2008 specs.

Got your point.

If nameConstraints and email itself are encoded with the same errors, it will work; otherwise we get nasty errors.

I've got a response from Victor Dukhovni. His position is: 1. It's better to ask OpenSSL about their plans :) via openssl-project@openssl.org 2. (Limiting scope to EAI certificates) OpenSSL must trust the CA software that has provided punycode representation of the domain name. So we can decode A-labels and compare them. So the certificate itself can be verified, and questions whether the EAI address matches the address in From: header is out of scope of the certificate validation process. -- SY, Dmitry Belyavsky

Michael Casadevall

11:54 a.m.

New subject: OpenSSL, was Where should IDN translation happen?

On 11/15/18 3:29 AM, Dmitry Belyavsky wrote:

...

Dear John,

On Wed, Nov 14, 2018 at 7:59 PM Dmitry Belyavsky <beldmit@gmail.com <mailto:beldmit@gmail.com>> wrote: I've got a response from Victor Dukhovni. His position is:

1. It's better to ask OpenSSL about their plans :) via openssl-project@openssl.org <mailto:openssl-project@openssl.org> 2. (Limiting scope to EAI certificates) OpenSSL must trust the CA software that has provided punycode representation of the domain name. So we can decode A-labels and compare them. So the certificate itself can be verified, and questions whether the EAI address matches the address in From: header is out of scope of the certificate validation process.

That doesn't sound right at all ... RFC 7508 (https://tools.ietf.org/html/rfc7508) specifically handles how the email headers are protected by S/MIME verification (and OpenSSL specifically supports embedding these headers with their own command line tool). RFC 8398 (https://tools.ietf.org/html/rfc8398) specifically lists out the cases of where either an A-label or U-label go in the certificate; it's complicated but basically depending on the LDR component, it can be an A-Label or U-Label. Depressingly, it doesnt look like OpenSSL supports SmtpUTF8Mailbox at all (or at least I can't find it with a grep). As to how this all works, Dmitry, here's my understanding. The short version here is that the From: and To: field is specifically set by the user, and is used in SMTP specifically as the MAIL FROM and RCPT TO commands. These could be A or U labels as they're user entered. Both should generally support UTF-8, is used for bounced emails (which can be S/MIME signed if the MTA supports it). Because of this, the From/To field should always be U-Labels, and the MIME fields be those resulting U-Labels; they shouldn't ever be converted to A-Labels directly* I haven't gone through the RFCs to make sure of this, but this is what I recall off the top off my head. For a EAI-to-EAI email the following steps need to happen - MUA sets From and To fields as U-labels - MUA to sends email to their outbound SMTP server - The SMTP server breaks the email into the user part and the domain part, the domain part is converted to an A-label - If DANE is being used, the outbound server checks TLSA records after STARTTLS - IDN translation happens if necessary, standard SMTP processing happens here. MX records are downloaded, checks against DKIM/SPF run against the A-label of the From field address - Receiving SMTP server gets MAIL FROM/RCPT TO in U-Label form - Mail is delivered. S/MIME adds the following additional steps: - When an email is sent, *before* it is submitted to the SMTP server, MIME headers are duplicated, and signed within the message body. This prevents the headers from being tampered with in such a way that they're undetectable. The email (with headers) is signed with the receiver's public key. - When the email is received, the email is decrypted, and the From field is matched to the CN/SAN fields in the S/MIME certificate to know which public key to load/verify. If this mismatches, the email gets a broken lock. For verification to work as the OpenSSL guys say, verification would fail because the From label would be a U-label, and the cert would have an A-label. From where I'm sitting, validation should pass if the From field is punycoded for compatibility with old/broken MUAs/MTAs as long as it comes out to the right thing and the cert should always what spec says. If I got anything wrong from the RFCs, corrections welcome, but this is how I understand how it should work but I could very easily gotten the finer points wrong. Michael * - an exception exists for Forwarded-By/Path headers amended by SMTP which could reasonably put an A label in to allow emails to pass through non EAI-aware SMTP servers; that should work as long as the starting and end point are both EAI aware.

Dmitry Belyavsky

1:55 p.m.

New subject: OpenSSL, was Where should IDN translation happen?

Dear Michael, I suggest leaving aside the delivery and corresponding verification of addresses by MTA. It seems to be out of the scope of OpenSSL. Please correct me if I am wrong. On Thu, Nov 15, 2018 at 2:54 PM Michael Casadevall <michael@casadevall.pro> wrote:

...

On 11/15/18 3:29 AM, Dmitry Belyavsky wrote:

...
Dear John,

On Wed, Nov 14, 2018 at 7:59 PM Dmitry Belyavsky <beldmit@gmail.com <mailto:beldmit@gmail.com>> wrote:

I've got a response from Victor Dukhovni. His position is:

1. It's better to ask OpenSSL about their plans :) via openssl-project@openssl.org <mailto:openssl-project@openssl.org> 2. (Limiting scope to EAI certificates) OpenSSL must trust the CA software that has provided punycode representation of the domain name. So we can decode A-labels and compare them. So the certificate itself can be verified, and questions whether the EAI address matches the address in From: header is out of scope of the certificate validation process.

That doesn't sound right at all ...

RFC 7508 (https://tools.ietf.org/html/rfc7508) specifically handles how the email headers are protected by S/MIME verification (and OpenSSL specifically supports embedding these headers with their own command line tool).

I did not find this special handling, at least by 'grep -ri SecureHeader'. Either I did not find it in man cms/man smime.

...

RFC 8398 (https://tools.ietf.org/html/rfc8398) specifically lists out the cases of where either an A-label or U-label go in the certificate; it's complicated but basically depending on the LDR component, it can be an A-Label or U-Label.

Depressingly, it doesnt look like OpenSSL supports SmtpUTF8Mailbox at all (or at least I can't find it with a grep). As to how this all works, Dmitry, here's my understanding.

Sure. I do not know about other implementors of RFC 8398 except me. And my implementation is very limited too.

...

The short version here is that the From: and To: field is specifically set by the user, and is used in SMTP specifically as the MAIL FROM and RCPT TO commands. These could be A or U labels as they're user entered. Both should generally support UTF-8, is used for bounced emails (which can be S/MIME signed if the MTA supports it). Because of this, the From/To field should always be U-Labels, and the MIME fields be those resulting U-Labels; they shouldn't ever be converted to A-Labels directly*

I haven't gone through the RFCs to make sure of this, but this is what I recall off the top off my head. For a EAI-to-EAI email the following steps need to happen - MUA sets From and To fields as U-labels - MUA to sends email to their outbound SMTP server - The SMTP server breaks the email into the user part and the domain part, the domain part is converted to an A-label - If DANE is being used, the outbound server checks TLSA records after STARTTLS - IDN translation happens if necessary, standard SMTP processing happens here. MX records are downloaded, checks against DKIM/SPF run against the A-label of the From field address - Receiving SMTP server gets MAIL FROM/RCPT TO in U-Label form - Mail is delivered.

This seems to be out of scope of the OpenSSL toolkit but in scope of the MTA in whole.

...

S/MIME adds the following additional steps: - When an email is sent, *before* it is submitted to the SMTP server, MIME headers are duplicated, and signed within the message body. This prevents the headers from being tampered with in such a way that they're undetectable. The email (with headers) is signed with the receiver's public key.

Do you mean signing with the sender's public key or encrypting with the receiver's public key? - When the email is received, the email is decrypted, and the From

...

field is matched to the CN/SAN fields in the S/MIME certificate to know which public key to load/verify. If this mismatches, the email gets a broken lock.

There may be more than one reason to get a broken lock icon. 1. The trust chain cannot be established. RFC 8398 provides extra cases when it happens. But there we can refer only on CA software that provides A-labels and U-labels in nameConstraints and SAN. I do not think that it's a good idea to correct CA behaviour in OpenSSL. 2. The chain of trust is established, but the certificate used for signing does not match the From: address. This can happen in case of bad EAI representation either in the certificate or in MTA/MUA of sender. I think that MUA are a legitimate place to ensure IDNA2008-compliance here. 3. Everything is OK on stages 1-2, but the signature is broken. Nothing seem to depend on EAI here, because the signature depends on data, not on its semantics. It seem to be verified on the OpenSSL level too.

...

For verification to work as the OpenSSL guys say, verification would fail because the From label would be a U-label, and the cert would have an A-label. From where I'm sitting, validation should pass if the From field is punycoded for compatibility with old/broken MUAs/MTAs as long as it comes out to the right thing and the cert should always what spec says.

As I wrote before, there are at least 3 possible way to fail the verification.

...

If I got anything wrong from the RFCs, corrections welcome, but this is how I understand how it should work but I could very easily gotten the finer points wrong. Michael

* - an exception exists for Forwarded-By/Path headers amended by SMTP which could reasonably put an A label in to allow emails to pass through non EAI-aware SMTP servers; that should work as long as the starting and end point are both EAI aware.

So I think if we can together work out suggestions on how OpenSSL can fit our requirements, we can then either provide it to OpenSSL team or I can implement some patches to be somewhen merged. -- SY, Dmitry Belyavsky

Michael Casadevall

2:26 p.m.

New subject: OpenSSL, was Where should IDN translation happen?

On 11/15/2018 08:55 AM, Dmitry Belyavsky wrote:

...

Dear Michael,

I suggest leaving aside the delivery and corresponding verification of addresses by MTA. It seems to be out of the scope of OpenSSL.

Please correct me if I am wrong.

I was referring to how the From header in the email is used by MTAs and how it relates to S/MIME, and why the u-label needs to be in the certificate *or* verification has to allow for conversion on the fly. I'll get on the OpenSSL mailing list at some point, but I'm waiting for John to weigh in on my write up. Michael

John Levine

3:29 p.m.

New subject: OpenSSL, was Where should IDN translation happen?

On Thu, 15 Nov 2018, Michael Casadevall wrote:

...

I was referring to how the From header in the email is used by MTAs and how it relates to S/MIME, and why the u-label needs to be in the certificate *or* verification has to allow for conversion on the fly.

Actually, it needs both. I agree with Viktor that when you're creating the certificate you can assume the CA is sending you good data. But when you're using it, you're goint to be testing it against whatever junk the MUA or the user provides. For example, assume the name in the cert is exámple, and the user checks it against exámple except that the user's UTF-8 has an unnormalized a' rather than a precomposed á. One possibility would be to normalize it and compare and say yes. Another would be to check the code points and reject it as not a valid U-label. But it would be wrong to decode the punycode, compare the UTF-8, and say nope, they're different. Regards, John Levine, john.levine@standcore.com Standcore LLC

John Levine

3:56 p.m.

New subject: OpenSSL, was Where should IDN translation happen?

On Thu, 15 Nov 2018, Michael Casadevall wrote:

...

The short version here is that the From: and To: field is specifically set by the user, and is used in SMTP specifically as the MAIL FROM and RCPT TO commands.

No, you're confusing the message body, which is what S/MIME signs, and the SMTP envelope, which is outside the scope of S/MIME. In an EAI message the addresses in the To: and From: headers should use U-labels. A-labels are allowed but not encouraged.

...

- If DANE is being used, the outbound server checks TLSA records after STARTTLS

That is completely unrelated, verifying the name of the mail server which has nothing at all to do with any To: or From: header.

...

- IDN translation happens if necessary, standard SMTP processing happens here. MX records are downloaded, checks against DKIM/SPF run against the A-label of the From field address

Sorry, more confusion. SPF checks against the message envelope, DKIM checks against the DKIM-Signature header. Neither looks at the addresses in the To: or From: fields. DMARC tries to match the From: address with the SPF or DKIM identity but that's even less related to S/MIME. Regards, John Levine, john.levine@standcore.com Standcore LLC

Michael Casadevall

2:18 p.m.

New subject: OpenSSL, was Where should IDN translation happen?

Replies inline. On 11/14/18 3:04 AM, Dmitry Belyavsky wrote:

...

Dear John,

As I wrote before, I've started to implement RFC 8399 and the show-stopper for now is obtaining a set of test cases.

The UASG document talking about library support has a list of test cases although I'm not sure they're exhaustive. It's a starting point anyway.

...

OpenSSL team does not want to link OpenSSL with, say, libidn (and to implement IDN conversion inside the library for domains). I've found out that 2-3 functions inherited from RFC 3492 will fit all the purposes necessary to implement RFC 8399.

Is there an email conversation or bug report I can read to catch up on upstream's current state of mind on this? Secondly, what's your current progress on this? It was your original posting that inspired me to look at this (and I think I commented on it then). OpenSSL is under a weird license so they really can't link to external libraries and not to (L)GPL code so adding the necessary support for U-labels will likely require rolling your own code or finding an implementation in the public domain and cutting it down to size for direct embedding in the BIO module of OpenSSL. Getting support for U-labels will be a major win for IDNs as it simplifies IDNs for all OpenSSL applications, and opens the door to getting EAI S/MIME working. I'd also like to see a fairly extensive shakedown of TLS in general with IDNs to see if we can shake loose any bugs especially in regards to revocation, OCSP stapling, AIA, and certificate transparency. Michael

Dmitry Belyavsky

3:18 p.m.

New subject: OpenSSL, was Where should IDN translation happen?

Dear Michael, On Wed, Nov 14, 2018 at 5:18 PM Michael Casadevall <michael@casadevall.pro> wrote:

...

Replies inline.

On 11/14/18 3:04 AM, Dmitry Belyavsky wrote:

...
Dear John,

As I wrote before, I've started to implement RFC 8399 and the show-stopper for now is obtaining a set of test cases.

The UASG document talking about library support has a list of test cases although I'm not sure they're exhaustive. It's a starting point anyway.

Yes. The problem is to convert them into the test certificates :)

...

OpenSSL team does not want to link OpenSSL with, say, libidn (and to

...
implement IDN conversion inside the library for domains). I've found out that 2-3 functions inherited from RFC 3492 will fit all the purposes necessary to implement RFC 8399.

Is there an email conversation or bug report I can read to catch up on upstream's current state of mind on this?

Sure. https://www.ietf.org/mail-archive/web/ietf/current/msg101105.html Victor references to libicu, it's not so hard, I wanted to link just with libidn :) This letter is somewhere from the middle of the thread starting from https://www.ietf.org/mail-archive/web/ietf/current/msg100694.html Plus I have some personal mail from Victor Dukhovni.

...

Secondly, what's your current progress on this? It was your original posting that inspired me to look at this (and I think I commented on it then). OpenSSL is under a weird license so they really can't link to external libraries and not to (L)GPL code so adding the necessary support for U-labels will likely require rolling your own code or finding an implementation in the public domain and cutting it down to size for direct embedding in the BIO module of OpenSSL.

My current branch is here: https://github.com/beldmit/openssl/tree/rfc8398 I currently am able to recognize the EAI in certificate and (badly) display it. I have a lacks of example to test chain limitations described in the RFC.

...

Getting support for U-labels will be a major win for IDNs as it simplifies IDNs for all OpenSSL applications, and opens the door to getting EAI S/MIME working. I'd also like to see a fairly extensive shakedown of TLS in general with IDNs to see if we can shake loose any bugs especially in regards to revocation, OCSP stapling, AIA, and certificate transparency.

Well, for now the A-labels seem to fit here more or less reasonably. IDN transformation can be done at more high level, I think. -- SY, Dmitry Belyavsky

Michael Casadevall

3:11 p.m.

New subject: OpenSSL, was Where should IDN translation happen?

Replies inline. Note to John: I got a Spamhaus bounce back, but none of the IPs I control seem to be on the list. I'm not sure what happened. List emails seem to be going through though. On 11/13/18 8:19 PM, John Levine wrote:

...

In article <8ac39f55-d9ae-7c8a-f50c-4535e0428263@casadevall.pro> you write:

...
This leads library makers and applications to handle IDNs manually, which in the case of Python, if they made a mistake can lead to the above error, namely blowing up Python.

It's actually worse than that. Python's built-in IDNA support is IDNA2003 which, as we saw in a recent thread, can give some fairly wrong results. There is a correct idna library which is a drop-in replacement but I doubt many programmers know the difference.

Ow. That really hurts. I went looking through the Python bug tracker, and it looks like the core team will accept patches to fix this, but don't have the knowledge or expertise to implement it on their own. Relevant link is here: https://bugs.python.org/issue17305 I spent some time looking through the bug reports, and Python's internal implementation, but it looks like they want to keep IDNA2003 support, and implement IDNA2008 as 'utf48' as an encoding. I think it can be argued on security grounds that the default 'idna' needs to change once support is added, but 2003 support can be retained if necessary. I've tinkered with CPython's core before, and I'm pretty comfortable with attempting to write patches that upstream will accept and I'm willing to sit and see if I can bang Python 3.8 into IDNA 2008 compliance; unfortunate pyidna can just be "dropped in" to the core library; Python has a pretty extensive system of dealing with unicode, and encode("idna") basically works around that instead of reinventing the wheel as import idna did which prevented it from landing in core. I'm willing to write a proposal for the group to put a plan to try and fix Python's IDN support as my work with OpenSSL seems to be covered already, and the problem/impact isn't as large due as I feared initially. Fixing Python fixes a very large class of internet systems, and then we can focus backporting the fix to older Python branches increasing IDN support across the internet.

...

...
In the above case, had OpenSSL supported IDNs directly, it would have prevented this bug in the first place. That being said, since TLS essentially only uses A-labels as far as I can tell, I can’t necessarily say it’s wrong that OpenSSL doesn’t support IDNs.

RFCs 8398 and 8399 allow EAI mail addresses as Alternative Names and suggest pretty strongly that even though the domains in certs are A-labels, libraries should handle U-labels and convert where needed. Since they have to handle U-labels in the EAI addresses, the domains aren't a lot of extra work.

I presume that at some point OpenSSL will catch up with those RFCs but I don't know what the schedule is.

I completely forgot emails could be in sANs now though technically, the commonName can be encoded in UTF-8 per specification although I don't believe this is CA/B compliant (as much as the CAB deals with S/MIME issuance). This creates a technical pain point because I think it is legal for the From: headers to be written as UTF-8 or in ASCII so for validation to work, on the fly conversion has to happen. That means you might get a unicode string, or a punycode string, or a unicode mainbox with a punycode domain name. Lots of fun edge cases. Do you know which RFCs specifically define how From labels should get written for EAIs? And what clients actually do in practice? Michael

John Levine

3:34 p.m.

New subject: OpenSSL, was Where should IDN translation happen?

On Wed, 14 Nov 2018, Michael Casadevall wrote:

...

...
It's actually worse than that. Python's built-in IDNA support is IDNA2003 which, as we saw in a recent thread, can give some fairly wrong results. There is a correct idna library which is a drop-in replacement but I doubt many programmers know the difference.

...

Relevant link is here: https://bugs.python.org/issue17305

I spent some time looking through the bug reports, and Python's internal implementation, but it looks like they want to keep IDNA2003 support, and implement IDNA2008 as 'utf48' as an encoding. I think it can be argued on security grounds that the default 'idna' needs to change once support is added, but 2003 support can be retained if necessary.

I presume you mean uts46, which is Unicode's unfortunate attempt to put bandaids on the difference between idna2003 and idna2008. The key point is that idna2003 is obsolete, and the stuff that 2008 removed was removed for good reasons. For example, 2003 turned German ß into "ss" which turned out to be a bad idea, since German people want to use ß in their IDNs. I'd think that the right way forward is to make the idna codec do idna2008 with a flag to turn on 2003 mode if you have some data that depends on 2003 rules. For the EAI OpenSSL stuff, I asked Russ Housley if he knows of work to update crypto libraries to support 8398 and 8399. As to what TLS software does with EAI addresses now, I expect for the most part it just doesn't work. The RFCs to support it are new, and for most people there's little incentive to do anything about it. Regards, John Levine, john.levine@standcore.com Standcore LLC

Dmitry Belyavsky

3:44 p.m.

New subject: OpenSSL, was Where should IDN translation happen?

Dear John, On Wed, Nov 14, 2018 at 6:34 PM John Levine <john.levine@standcore.com> wrote: For the EAI OpenSSL stuff, I asked Russ Housley if he knows of work to

...

update crypto libraries to support 8398 and 8399.

As to what TLS software does with EAI addresses now, I expect for the most part it just doesn't work. The RFCs to support it are new, and for most people there's little incentive to do anything about it.

Could you please clarify how TLS is affected by EAI? Domain names are processed as A-labels, and I can't recall how we interact with emails in TLS. -- SY, Dmitry Belyavsky

Dmitry Belyavsky

8:16 a.m.

Dear Michael, On Tue, Nov 13, 2018 at 10:24 PM Michael Casadevall <michael@casadevall.pro> wrote:

...

So, here is a question for the list: Where in the stack should IDN/EAI translation happen? Should it happen in the user application, or lower in the stack such as the core libraries that handle things like TLS connections?

...

This leads library makers and applications to handle IDNs manually, which in the case of Python, if they made a mistake can lead to the above error, namely blowing up Python. Now, in practice, OpenSSL not directly supporting IDNs seems to have relatively little effect on its own. After taking a very deep dive through the RFCs, it appears that in all relevant places, everything relating to web TLS certificates takes EAI5Address encoding and thus requires punycode representation. However, as I’ve just shown, there are obviously places where things have fallen through the cracks and I think it warrants a deeper investigation.

I think that error in IDN processing in OpenSSL itself can cause a lot of bad consequences being worse than errors in any application using OpenSSL. So I understand the position of OpenSSL Team when they do not want to implement such support here.

...

The thing is though, and just to reiterate the question, just where in the stack should IDN translation happen?

In the above case, had OpenSSL supported IDNs directly, it would have prevented this bug in the first place. That being said, since TLS essentially only uses A-labels as far as I can tell, I can’t necessarily say it’s wrong that OpenSSL doesn’t support IDNs. I think, though, this is an area which, in general, that needs more attention, especially if I can break a popular programming language with a trivial example.

Thoughts and comments welcome, Michael

* - there is one special case for wildcards

In my (rather limited) practice I usually use A-labels for internal representation and U-labels (or both) for external. Script languages, such as Python, Perl, PHP etc usually provide wrappers for IDN converters so it's much simpler to do this conversion outside in applications. -- SY, Dmitry Belyavsky

2765

Age (days ago)

2767

Last active (days ago)

List overview

Download

18 comments

3 participants

participants (3)

Dmitry Belyavsky
John Levine
Michael Casadevall

Where should IDN translation happen?

tags

participants (3)