On Tue, 12 Sep 2017 22:43:09 +0200, Don Hollander <don.hollander@icann.org> wrote:
Thanks Rubens. Which raises the question as to when the validation takes place. Before or after a >punycode transformation.
I would generally like validation to take place after punycode conversion. First because there are strings that match the regex bu not punycode constraints. Likewise I agree with Rubens that assuming TLDs are not domains and email must go to a subdomain seems less than prescient with hindsight.
And David, thanks for the article. The UASG has long advocated turning validation off - but very >few active practitioners seem willing think outside that box.
I'm not entirely convinced by that approach either. I think there is value in validation - first, to determine whether an email address is real - if it isn't, you are probably better off getting a warning than trying to send it. Second, I find it very helpful, including as a protection against phishing emails, to be told if an email is not recognised as a contact to whom I have *sent* an email, which is a stricter validation check. Applications that do that for me - especially for scripts I don't read fluently like Chinese - are common, and I would be upset if they were to stop validating. On the other hand, incorrect validation, e.g. of an address in a form, with no punycode conversion run first and no reason not to accept an internationalised email is clearly a bad idea - largely since it fails to actually validate whether something is a valid email address. A given application or toolchain may be incapable of handling some valid email addresses, but I think a campaign to convince developers to produce a statement like "this application is second-rate and obsolete" would face significant challenges. Whether it is worth pushing for such applications to state that they do not yet support appropriate standards may be worth considering... cheers Chaals
D
On 13/09/2017, at 8:31 AM, Rubens Kuhl <rubensk@nic.br> wrote:
On Sep 12, 2017, at 3:44 PM, Don Hollander <don.hollander@icann.org> wrote:
Please note that this is a Geeky post - so carry on if that’s not you.
Email validation is an area where many websites fall short as we found in our study on Website UA >>>Readiness (nearing publication)
The technologies behind these websites generally use a Regular Expression as their first line of >>>defence against rubbish data. The issue is that most of these RegExs are overly restrictive.
As an appendix to the Website review, we looked at some of the technologies behind the websites to >>>see if there were common denominators for good and bad experiences.
One RegEx has stood out as being simple and correct. I’d like the UASG to consider recommending >>>this in our documentation. Toward that end, this thread is for discussion.
/^.+@(?:[^.]+\.)+(?:[^.]{2,})$
Regular expression check in Javascript. This accepts any Unicode characters, only insisting that >>>the domain must have more than one label and the TLD is 2 characters or longer. Your thoughts?
Single IDN TLDs for some scripts is something being considered for subsequent procedures, so I >>would think of 1 or more and prevent the same UA challenges previous rounds TLDs are suffering.
Rubens
Don Hollander Universal Acceptance Steering Group Skype: don_hollander
-- Chaals is Charles McCathie Nevile find more at http://yandex.com