Hi, I promised to write something about email address syntax. It’s actually a touch more complicated than I mentioned yesterday. As John says, Email addresses are more complicated than one usually sees, and there exist syntax features that can be used to confuse people. But, I think, three reduced grammars: 1. One might say that banning all the obs- productions in RFC5322 produces a reduced grammar. Those productions are obsolete, that’s why they’re called obs-whatever, and should not be used. Noone uses them anyway, I’m only mentioning this for completeness. 2. The WHATWG HTML specification contains a simple grammar, which web browsers implement. Since one must expect email addresses to be typed into HTML forms, IMO following this specification is strongly advisable. The spec is on https://html.spec.whatwg.org/multipage/input.html#email-state-(type=email) and so far isn’t UA-ready. I fear that getting that UA-ready will shortly be one of my tasks. 3. \X+ is an extremely simple PCRE (a regular expression) that IMO should match all addresses. I suspect we’ll see that in some well-known web software in the near future. IMO it’s safe to advise anyone to use only addresses that match the WHATWG spec (when extended) and also match \X+. The two suggestions complement each other; “ “@example.com<mailto:“%20“@example.com> matches \X+ but I’m not brave enough to use that an address like that 😉 -- Arnt Gulbrandsen UA Technology Sr. Manager, ICANN +32 492 374706
It appears that Arnt Gulbrandsen via UA-EAI <arnt.gulbrandsen@icann.org> said:
1. One might say that banning all the obs- productions in RFC5322 produces a reduced grammar. Those productions are obsolete, that’s why they’re called obs-whatever, and should not be used. Noone uses them anyway, I’m only mentioning this for completeness.
Agreed.
2. The WHATWG HTML specification contains a simple grammar, which web browsers implement. Since one must expect email addresses to be typed into HTML forms, IMO following this specification is strongly advisable. The spec is on https://html.spec.whatwg.org/multipage/input.html#email-state-(type=email) and so far isn’t UA-ready. I fear that getting that UA-ready will shortly be one of my tasks.
That pattern deliberately allows strings that are invalid under RFC 5322, apparently because WHATWG imagines that they understand mail better than the IETF does. In particular it allows two dots in a row which really does not work in a lot of mail systems.
3. \X+ is an extremely simple PCRE (a regular expression) that IMO should match all addresses. I suspect we’ll see that in some well-known web software in the near future.
IMO it’s safe to advise anyone to use only addresses that match the WHATWG spec (when extended) and also match \X+.
See above. I would say WHATWG spec minus the invalid stuff. As you probably recall, when the W3C and WHATWG had competing web specs, the W3C tried to make their pattern UA compatible but they did it in a totally naive way, allowing any sequence of UTF-8 code points. I don't know what kind of local parts real EAI mail systems allow. Perhaps people who run them can tell us what they do. R's, John
Thanks for raising it. It is a difficult issue. It is bette to solve it. Jiankang Yao From: Arnt Gulbrandsen via UA-EAI Date: 2023-04-26 20:18 To: ua-eai@icann.org Subject: [UA-EAI] Email address syntax Hi, I promised to write something about email address syntax. It’s actually a touch more complicated than I mentioned yesterday. As John says, Email addresses are more complicated than one usually sees, and there exist syntax features that can be used to confuse people. But, I think, three reduced grammars: One might say that banning all the obs- productions in RFC5322 produces a reduced grammar. Those productions are obsolete, that’s why they’re called obs-whatever, and should not be used. Noone uses them anyway, I’m only mentioning this for completeness. The WHATWG HTML specification contains a simple grammar, which web browsers implement. Since one must expect email addresses to be typed into HTML forms, IMO following this specification is strongly advisable. The spec is on https://html.spec.whatwg.org/multipage/input.html#email-state-(type=email) and so far isn’t UA-ready. I fear that getting that UA-ready will shortly be one of my tasks. \X+ is an extremely simple PCRE (a regular expression) that IMO should match all addresses. I suspect we’ll see that in some well-known web software in the near future. IMO it’s safe to advise anyone to use only addresses that match the WHATWG spec (when extended) and also match \X+. The two suggestions complement each other; “ “@example.com matches \X+ but I’m not brave enough to use that an address like that -- Arnt Gulbrandsen UA Technology Sr. Manager, ICANN +32 492 374706
participants (3)
-
Arnt Gulbrandsen -
Jiankang Yao -
John Levine