Re: [UA-discuss] Regular Expression

Sept. 12, 2017

      On Tue, 12 Sep 2017 22:43:09 +0200, Don Hollander
<don.hollander@icann.org> wrote:
...
Thanks Rubens.  Which raises the question as to when the validation  
takes place.  Before or after a >punycode transformation.
I would generally like validation to take place after punycode conversion.
First because there are strings that match the regex bu not punycode
constraints. Likewise I agree with Rubens that assuming TLDs are not
domains and email must go to a subdomain seems less than prescient with
hindsight.
...
And David, thanks for the article.   The UASG has long advocated turning  
validation off - but very >few active practitioners seem willing think  
outside that box.
I'm not entirely convinced by that approach either.

I think there is value in validation - first, to determine whether an
email address is real - if it isn't, you are probably better off getting a
warning than trying to send it.

Second, I find it very helpful, including as a protection against phishing
emails, to be told if an email is not recognised as a contact to whom I
have *sent* an email, which is a stricter validation check. Applications
that do that for me - especially for scripts I don't read fluently like
Chinese - are common, and I would be upset if they were to stop validating.

On the other hand, incorrect validation, e.g. of an address in a form, with
no punycode conversion run first and no reason not to accept an
internationalised email is clearly a bad idea - largely since it fails to
actually validate whether something is a valid email address.

A given application or toolchain may be incapable of handling some valid
email addresses, but I think a campaign to convince developers to produce
a statement like "this application is second-rate and obsolete" would face
significant challenges. Whether it is worth pushing for such applications  
to state that they do not yet support appropriate standards may be worth  
considering...

cheers

Chaals
...
D
...
On 13/09/2017, at 8:31 AM, Rubens Kuhl <rubensk@nic.br> wrote:
...
On Sep 12, 2017, at 3:44 PM, Don Hollander <don.hollander@icann.org>  
wrote:
Please note that this is a Geeky post - so carry on if that’s not you.
Email validation is an area where many websites fall short as we found  
in our study on Website UA >>>Readiness (nearing publication)
The technologies behind these websites generally use a Regular  
Expression as their first line of >>>defence against rubbish data.    
The issue is that most of these RegExs are overly restrictive.
As an appendix to the Website review, we looked at some of the  
technologies behind the websites to >>>see if there were common  
denominators for good and bad experiences.
One RegEx has stood out as being simple and correct.   I’d like the  
UASG to consider recommending >>>this in our documentation.   Toward  
that end, this thread is for discussion.
/^.+@(?:[^.]+\.)+(?:[^.]{2,})$
Regular expression check in Javascript. This accepts any Unicode  
characters, only insisting that >>>the domain must have more than one  
label and the TLD is 2 characters or longer.
Your thoughts?
Single IDN TLDs for some scripts is something being considered for  
subsequent procedures, so I >>would think of 1 or more and prevent the  
same UA challenges previous rounds TLDs are suffering.
Rubens
Don Hollander
Universal Acceptance Steering Group
Skype: don_hollander
-- 
Chaals is Charles McCathie Nevile
find more at http://yandex.com

Re: [UA-discuss] Regular Expression

Chaals McCathie Nevile