Replies inline. Note to John: I got a Spamhaus bounce back, but none of the IPs I control seem to be on the list. I'm not sure what happened. List emails seem to be going through though. On 11/13/18 8:19 PM, John Levine wrote:
In article <8ac39f55-d9ae-7c8a-f50c-4535e0428263@casadevall.pro> you write:
This leads library makers and applications to handle IDNs manually, which in the case of Python, if they made a mistake can lead to the above error, namely blowing up Python.
It's actually worse than that. Python's built-in IDNA support is IDNA2003 which, as we saw in a recent thread, can give some fairly wrong results. There is a correct idna library which is a drop-in replacement but I doubt many programmers know the difference.
Ow. That really hurts. I went looking through the Python bug tracker, and it looks like the core team will accept patches to fix this, but don't have the knowledge or expertise to implement it on their own. Relevant link is here: https://bugs.python.org/issue17305 I spent some time looking through the bug reports, and Python's internal implementation, but it looks like they want to keep IDNA2003 support, and implement IDNA2008 as 'utf48' as an encoding. I think it can be argued on security grounds that the default 'idna' needs to change once support is added, but 2003 support can be retained if necessary. I've tinkered with CPython's core before, and I'm pretty comfortable with attempting to write patches that upstream will accept and I'm willing to sit and see if I can bang Python 3.8 into IDNA 2008 compliance; unfortunate pyidna can just be "dropped in" to the core library; Python has a pretty extensive system of dealing with unicode, and encode("idna") basically works around that instead of reinventing the wheel as import idna did which prevented it from landing in core. I'm willing to write a proposal for the group to put a plan to try and fix Python's IDN support as my work with OpenSSL seems to be covered already, and the problem/impact isn't as large due as I feared initially. Fixing Python fixes a very large class of internet systems, and then we can focus backporting the fix to older Python branches increasing IDN support across the internet.
In the above case, had OpenSSL supported IDNs directly, it would have prevented this bug in the first place. That being said, since TLS essentially only uses A-labels as far as I can tell, I can’t necessarily say it’s wrong that OpenSSL doesn’t support IDNs.
RFCs 8398 and 8399 allow EAI mail addresses as Alternative Names and suggest pretty strongly that even though the domains in certs are A-labels, libraries should handle U-labels and convert where needed. Since they have to handle U-labels in the EAI addresses, the domains aren't a lot of extra work.
I presume that at some point OpenSSL will catch up with those RFCs but I don't know what the schedule is.
I completely forgot emails could be in sANs now though technically, the commonName can be encoded in UTF-8 per specification although I don't believe this is CA/B compliant (as much as the CAB deals with S/MIME issuance). This creates a technical pain point because I think it is legal for the From: headers to be written as UTF-8 or in ASCII so for validation to work, on the fly conversion has to happen. That means you might get a unicode string, or a punycode string, or a unicode mainbox with a punycode domain name. Lots of fun edge cases. Do you know which RFCs specifically define how From labels should get written for EAIs? And what clients actually do in practice? Michael