Re : Re: Regular Expression
This is what Microsoft suggests for EAI Validation.https://blogs.msdn.microsoft.com/shawnste/2014/04/01/eai-email-address-inter... Dr. Ajay DATA | Founder & CEO Get email id like अजय@डाटा.भारत in your own language,visit www.xgenplus.com From: Don Hollander MailId : [73397993]To: Mark Svancarek Cc: Universal Acceptance Subject: Re: [UA-discuss] Regular ExpressionDate: 13 Sep 2017 10:37:07 PM Mark.What would such a RegEx look like?D> On 14/09/2017, at 4:26 AM, Mark Svancarek via UA-discuss wrote:> > Depending on your mail client, you may have experienced a linkification error in my response. Weird. > > -----Original Message-----> From: Mark Svancarek > Sent: Wednesday, September 13, 2017 9:21 AM> To: `Vittorio Bertola` Chaals McCathie Nevile ua-discuss@icann.org> Subject: RE: [UA-discuss] Regular Expression> > I believe that validation should be as light as possible. "Contains `@` " is about the extent of it unless you are willing to look at bidi and IFS. Just capture the string and send a test message.> > -----Original Message-----> From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Vittorio Bertola> Sent: Wednesday, September 13, 2017 1:34 AM> To: Chaals McCathie Nevile ua-discuss@icann.org> Subject: Re: [UA-discuss] Regular Expression> >> Il 13 settembre 2017 alle 0.01 Chaals McCathie Nevile ha scritto:>> >> >> On Tue, 12 Sep 2017 22:43:09 +0200, Don Hollander >> wrote:>> >> I think there is value in validation - first, to determine whether an >> email address is real - if it isn`t, you are probably better off >> getting a warning than trying to send it.> > But this is nothing you can do just with a regexp. The regexp could allow you to intercept blatant mistakes - e.g., there are national keyboards where typing "@" requires pressing Alt or some uncommon combination of keys, so it`s easy to mistype it and you can easily warn the user that their entered string does not have a "@" - but anything beyond that is unnecessary, because, if a user mistypes an email address in any other way, it`s very likely that he will still end up entering a valid email address that no regexp will be able to tell as non-existing, or that could even exist but belong to someone else.> > On the other hand, if you try to implement a complex regexp, and especially if you try to figure it out on your own, it`s almost certain that you will mark as invalid several valid email addresses that are corner cases but should be accepted, as well as many future developments of the standards which are invalid now but will be valid in the future.> >> Second, I find it very helpful, including as a protection against >> phishing emails, to be told if an email is not recognised as a contact >> to whom I have *sent* an email, which is a stricter validation check.>> Applications that do that for me - especially for scripts I don`t read >> fluently like Chinese - are common, and I would be upset if they were to stop validating.> > But this, again, is a validation that cannot be done via a regexp (can you write a regexp representing your entire contact book?) and that, on the other hand, poses an additional stricter condition than just "the email address is valid". We are just discussing how to check that the email address is syntactically valid, any other checks could still be implemented however appropriate.> > The point here is that you should not try to determine whether an email address is valid by checking its syntax, other than checking that it has a "@" and possibly a "." on the right of it (but even this latter condition is already too strict, as the ideographic full stop "。" should be accepted in place of the ASCII dot, if you check strings in IDN form). Anything beyond that is going to exclude some valid addresses while not increasing in any significant way your chances of intercepting user input error at this stage - and you will still intercept any user error a few seconds later, when you send the validation/confirmation message.> > Regards,> -- > > Vittorio Bertola | Research & Innovation Engineer vittorio.bertola@open-xchange.com Open-Xchange Srl - Office @ Via Treviso 12, 10144 Torino, ItalyDon HollanderUniversal Acceptance Steering GroupSkype: don_hollanderDo not Remove:[HID]20170913223703930[-HID]
On 9/13/2017 10:32 AM, Dr. AJAY D A T A wrote:
This is what Microsoft suggests for EAI Validation.
https://blogs.msdn.microsoft.com/shawnste/2014/04/01/eai-email-address-inter... "^([a-zA-Z0-9.!#$%&'*+/=?^_`{|}~\u00A0-\uD7FF\uE000-\uFFFF-]|([\uD800-\uDBFF][\uDC00\uDFFF]))+$"
This would allow most of the ASCII range and all of UTF-16 beyond ASCII. It would have been cleaner/clearer to express the reverse, that is, all code points not allowed, such as {@, ", controls, Space, NBSP, etc). The blog post suggests splitting the address at the @ and separately using the validation regex on the localpart and translation to punycode for the host (using a validating converter). A./
*Dr. Ajay DATA* *| Founder & CEO * Get email id like *अजय@डाटा.भारत* in your own language, visit www.xgenplus.com <http://www.xgenplus.com/> ------------------------------------------------------------------------ *From:* Don Hollander <don.hollander@icann.org> MailId : [73397993] *To:* Mark Svancarek <marksv@microsoft.com> *Cc:* Universal Acceptance <ua-discuss@icann.org> *Subject: *Re: [UA-discuss] Regular Expression *Date:* 13 Sep 2017 10:37:07 PM
Mark.
What would such a RegEx look like?
D
On 14/09/2017, at 4:26 AM, Mark Svancarek via UA-discuss <ua-discuss@icann.org> wrote:
Depending on your mail client, you may have experienced a linkification error in my response. Weird.
-----Original Message----- From: Mark Svancarek Sent: Wednesday, September 13, 2017 9:21 AM To: `Vittorio Bertola` <vittorio.bertola@open-xchange.com>; Chaals McCathie Nevile <chaals@yandex.ru>; ua-discuss@icann.org Subject: RE: [UA-discuss] Regular Expression
I believe that validation should be as light as possible. "Contains `@` " is about the extent of it unless you are willing to look at bidi and IFS. Just capture the string and send a test message.
-----Original Message----- From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Vittorio Bertola Sent: Wednesday, September 13, 2017 1:34 AM To: Chaals McCathie Nevile <chaals@yandex.ru>; ua-discuss@icann.org Subject: Re: [UA-discuss] Regular Expression
Il 13 settembre 2017 alle 0.01 Chaals McCathie Nevile <chaals@yandex.ru> ha scritto:
On Tue, 12 Sep 2017 22:43:09 +0200, Don Hollander <don.hollander@icann.org> wrote:
I think there is value in validation - first, to determine whether an email address is real - if it isn`t, you are probably better off getting a warning than trying to send it.
But this is nothing you can do just with a regexp. The regexp could allow you to intercept blatant mistakes - e.g., there are national keyboards where typing "@" requires pressing Alt or some uncommon combination of keys, so it`s easy to mistype it and you can easily warn the user that their entered string does not have a "@" - but anything beyond that is unnecessary, because, if a user mistypes an email address in any other way, it`s very likely that he will still end up entering a valid email address that no regexp will be able to tell as non-existing, or that could even exist but belong to someone else.
On the other hand, if you try to implement a complex regexp, and especially if you try to figure it out on your own, it`s almost certain that you will mark as invalid several valid email addresses that are corner cases but should be accepted, as well as many future developments of the standards which are invalid now but will be valid in the future.
Second, I find it very helpful, including as a protection against phishing emails, to be told if an email is not recognised as a contact to whom I have *sent* an email, which is a stricter validation check. Applications that do that for me - especially for scripts I don`t read fluently like Chinese - are common, and I would be upset if they were to stop validating.
But this, again, is a validation that cannot be done via a regexp (can you write a regexp representing your entire contact book?) and that, on the other hand, poses an additional stricter condition than just "the email address is valid". We are just discussing how to check that the email address is syntactically valid, any other checks could still be implemented however appropriate.
The point here is that you should not try to determine whether an email address is valid by checking its syntax, other than checking that it has a "@" and possibly a "." on the right of it (but even this latter condition is already too strict, as the ideographic full stop "。" should be accepted in place of the ASCII dot, if you check strings in IDN form). Anything beyond that is going to exclude some valid addresses while not increasing in any significant way your chances of intercepting user input error at this stage - and you will still intercept any user error a few seconds later, when you send the validation/confirmation message.
Regards, --
Vittorio Bertola | Research & Innovation Engineer vittorio.bertola@open-xchange.com Open-Xchange Srl - Office @ Via Treviso 12, 10144 Torino, Italy
Don Hollander Universal Acceptance Steering Group Skype: don_hollander
Do not Remove: [HID]20170913223703930[-HID]
I still feel this is too complex for the average web developer. I would just look for <anytext> + @ + <anytext>, and send a test email. Looking for dots in the domain part requires you to understand which is the domain part, which requires you to understand bidi rules. Sending a test email pushes the complexity to your email program, of course, and non-UA behavior in the email ecosystem remains. But it’s easier for UASG to measure and inform the relatively small set of email ecosystem players than it is to inform every website developer that uses regexes, let alone to influence them to consistently change. Does that make sense? From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Asmus Freytag Sent: Wednesday, September 13, 2017 3:17 PM To: ua-discuss@icann.org Subject: Re: [UA-discuss] Re : Re: Regular Expression On 9/13/2017 10:32 AM, Dr. AJAY D A T A wrote: This is what Microsoft suggests for EAI Validation. https://blogs.msdn.microsoft.com/shawnste/2014/04/01/eai-email-address-internationalization-address-validation/<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fblogs.msdn.microsoft.com%2Fshawnste%2F2014%2F04%2F01%2Feai-email-address-internationalization-address-validation%2F&data=02%7C01%7Cmarksv%40microsoft.com%7Cb542ae9cd6564c12531208d4faf5266a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636409378204623021&sdata=TbjHDjr1MUow%2Bj2z392Eh4V3RPZMTvoKApAakK55e1s%3D&reserved=0> "^([a-zA-Z0-9.!#$%&'*+/=?^_`{|}~\u00A0-\uD7FF\uE000-\uFFFF-]|([\uD800-\uDBFF][\uDC00\uDFFF]))+$" This would allow most of the ASCII range and all of UTF-16 beyond ASCII. It would have been cleaner/clearer to express the reverse, that is, all code points not allowed, such as {@, ", controls, Space, NBSP, etc). The blog post suggests splitting the address at the @ and separately using the validation regex on the localpart and translation to punycode for the host (using a validating converter). A./ Dr. Ajay DATA | Founder & CEO Get email id like अजय@डाटा.भारत<mailto:अजय@डाटा.भारत> in your own language, visit www.xgenplus.com<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.xgenplus...> ________________________________ From: Don Hollander <don.hollander@icann.org><mailto:don.hollander@icann.org> MailId : [73397993] To: Mark Svancarek <marksv@microsoft.com><mailto:marksv@microsoft.com> Cc: Universal Acceptance <ua-discuss@icann.org><mailto:ua-discuss@icann.org> Subject: Re: [UA-discuss] Regular Expression Date: 13 Sep 2017 10:37:07 PM Mark. What would such a RegEx look like? D
On 14/09/2017, at 4:26 AM, Mark Svancarek via UA-discuss <ua-discuss@icann.org><mailto:ua-discuss@icann.org> wrote:
Depending on your mail client, you may have experienced a linkification error in my response. Weird.
-----Original Message----- From: Mark Svancarek Sent: Wednesday, September 13, 2017 9:21 AM To: `Vittorio Bertola` <vittorio.bertola@open-xchange.com><mailto:vittorio.bertola@open-xchange.com>; Chaals McCathie Nevile <chaals@yandex.ru><mailto:chaals@yandex.ru>; ua-discuss@icann.org<mailto:ua-discuss@icann.org> Subject: RE: [UA-discuss] Regular Expression
I believe that validation should be as light as possible. "Contains `@` " is about the extent of it unless you are willing to look at bidi and IFS. Just capture the string and send a test message.
-----Original Message----- From: ua-discuss-bounces@icann.org<mailto:ua-discuss-bounces@icann.org> [mailto:ua-discuss-bounces@icann.org] On Behalf Of Vittorio Bertola Sent: Wednesday, September 13, 2017 1:34 AM To: Chaals McCathie Nevile <chaals@yandex.ru><mailto:chaals@yandex.ru>; ua-discuss@icann.org<mailto:ua-discuss@icann.org> Subject: Re: [UA-discuss] Regular Expression
Il 13 settembre 2017 alle 0.01 Chaals McCathie Nevile <chaals@yandex.ru><mailto:chaals@yandex.ru> ha scritto:
On Tue, 12 Sep 2017 22:43:09 +0200, Don Hollander <don.hollander@icann.org><mailto:don.hollander@icann.org> wrote:
I think there is value in validation - first, to determine whether an email address is real - if it isn`t, you are probably better off getting a warning than trying to send it.
But this is nothing you can do just with a regexp. The regexp could allow you to intercept blatant mistakes - e.g., there are national keyboards where typing "@" requires pressing Alt or some uncommon combination of keys, so it`s easy to mistype it and you can easily warn the user that their entered string does not have a "@" - but anything beyond that is unnecessary, because, if a user mistypes an email address in any other way, it`s very likely that he will still end up entering a valid email address that no regexp will be able to tell as non-existing, or that could even exist but belong to someone else.
On the other hand, if you try to implement a complex regexp, and especially if you try to figure it out on your own, it`s almost certain that you will mark as invalid several valid email addresses that are corner cases but should be accepted, as well as many future developments of the standards which are invalid now but will be valid in the future.
Second, I find it very helpful, including as a protection against phishing emails, to be told if an email is not recognised as a contact to whom I have *sent* an email, which is a stricter validation check. Applications that do that for me - especially for scripts I don`t read fluently like Chinese - are common, and I would be upset if they were to stop validating.
But this, again, is a validation that cannot be done via a regexp (can you write a regexp representing your entire contact book?) and that, on the other hand, poses an additional stricter condition than just "the email address is valid". We are just discussing how to check that the email address is syntactically valid, any other checks could still be implemented however appropriate.
The point here is that you should not try to determine whether an email address is valid by checking its syntax, other than checking that it has a "@" and possibly a "." on the right of it (but even this latter condition is already too strict, as the ideographic full stop "。" should be accepted in place of the ASCII dot, if you check strings in IDN form). Anything beyond that is going to exclude some valid addresses while not increasing in any significant way your chances of intercepting user input error at this stage - and you will still intercept any user error a few seconds later, when you send the validation/confirmation message.
Regards, --
Vittorio Bertola | Research & Innovation Engineer vittorio.bertola@open-xchange.com<mailto:vittorio.bertola@open-xchange.com> Open-Xchange Srl - Office @ Via Treviso 12, 10144 Torino, Italy
Don Hollander Universal Acceptance Steering Group Skype: don_hollander Do not Remove: [HID]20170913223703930[-HID][https://data.in/XGenPlusMessageID:15053239640295291a-#RCPT#.jpg] [http://dlr.tbms.in:8077/XET9956:201709.jpg]
Don, thanks for asking the group for opinions. My recommendation is to not offer a regex for validating email and instead the report must emphasize in its conclusion that developers must assure that their code does not 1) treat top level domains longer than 3 characters as invalid or 2) treat domains with non-international characters as invalid or 3) treat email addresses with non-international characters in the user part as invalid They can use the data in the study for quality assurance purposes. Further, the report should identify there is a need (and has been for many years) for reference code for proper validation of email addresses since so few people have gotten it right. My arguments for this approach are: 1) The position that a good solution may be too complex for web or other developers, ignores that a good solution can be packaged as well as we would be needlessly handicapping capable developers. 2) Although I appreciate the case made for the minimal <stuff>@<stuff> validation coupled with rigorous server side validation, some costs can be reduced by stronger client side validation as well as providing a better user experience. And although I know it can be worked around by the malicious, I still like to filter out addresses that might have deleterious effects- embedded html , sql or other commands. i.e. I don’t care if your email is “delete *”@example.com <mailto:“delete%20*”@example.com> I will invalidate it. Therefore, many of us will have filters regardless, and the minimal one is not helpful or worthy of endorsement in that context. (Yes, I understand that I still need to protect against malicious code on the server side.) 3) Promoting the minimal regex hides the real problem, that there is a lack of a good, referenceable answer, whether it is a regex or other implementation. The question simply moves to how to do proper validation on the server side. Providing the minimal regex hides the fact we are not really addressing the community’s problem of how to correctly validate an email address. We should simply make developers clear on the requirements for UA, and at the same time urge the community to define a reference set for the solution. tex
These seem reasonable. Just accepting random strings has side effects (security risks) beyond universal acceptance. On 9/14/2017 3:16 AM, Tex Texin wrote:
Don, thanks for asking the group for opinions.
My recommendation is to not offer a regex for validating email and instead the report must emphasize in its conclusion that developers must assure that their code does not
1)treat top level domains longer than 3 characters as invalid or
IDN TLDs may also be 1 character long
2)treat domains with non-international characters as invalid or
?? are you referring to ASCII mixing
3)treat email addresses with non-international characters in the user part as invalid
?? are you referring to ASCII mixing
They can use the data in the study for quality assurance purposes.
Further, the report should identify there is a need (and has been for many years) for reference code for proper validation of email addresses since so few people have gotten it right.
My arguments for this approach are:
1)The position that a good solution may be too complex for web or other developers, ignores that a good solution can be packaged as well as we would be needlessly handicapping capable developers.
2)Although I appreciate the case made for the minimal <stuff>@<stuff> validation coupled with rigorous server side validation, some costs can be reduced by stronger client side validation as well as providing a better user experience. And although I know it can be worked around by the malicious, I still like to filter out addresses that might have deleterious effects- embedded html , sql or other commands. i.e. I don’t care if your email is “delete *”@example.com <mailto:%E2%80%9Cdelete%20*%E2%80%9D@example.com> I will invalidate it. Therefore, many of us will have filters regardless, and the minimal one is not helpful or worthy of endorsement in that context. (Yes, I understand that I still need to protect against malicious code on the server side.)
Would you do that by black-list filters that describe what is to be prohibited? Instead of some massive Regex that describes what is allowed?
3)Promoting the minimal regex hides the real problem, that there is a lack of a good, referenceable answer, whether it is a regex or other implementation. The question simply moves to how to do proper validation on the server side. Providing the minimal regex hides the fact we are not really addressing the community’s problem of how to correctly validate an email address.
We should simply make developers clear on the requirements for UA, and at the same time urge the community to define a reference set for the solution.
tex
participants (4)
-
Asmus Freytag -
Dr. AJAY D A T A -
Mark Svancarek -
Tex Texin