Regular Expression To Validate EAI Addresses
Dear colleagues, I hope that you are doing well, since along time I tried to use regex to validate EAI addresses for many project I have related to UA, I used the tool here<https://rubular.com/> and used this Regex "^[\p{L}.%+-]+@[\p{L}.-]+\.[\p{L}]{2,}$" to validate some EAI addresses and it works well like the below screenshot. [cid:image010.png@01D67260.F5721690] Thanks a lot. All the Best, Abdalmonem Tharwat Galila Deputy Manager, Dot Masr Registry, Operation Sector. [NTRA Logo 2016] National Telecommunication Regulatory Authority [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523405_telephone] Office Tel.: +2 02 35341582<tel:02%2035341582> - +2 02 35341300<tel:02%2035341300> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Mobile] Mobile: +2 010 00049068<tel:010%2000049068> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: ICON] Fax : +2 02 35370537<tel:02%2035370537> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: oNLINE] Website : http:\\www.mcit.gov.eg<http://www.mcit.gov.eg/> : http:\\www.tra.gov.eg<http://www.mcit.gov.eg/> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523294_email] E-mail : agalila@mcit.gov.eg<mailto:agalila@mcit.gov.eg> : atharwat@tra.gov.eg<mailto:atharwat@tra.gov.eg> [Description: 1447802547_skype] Skype : abdalmonem.galila [Description: static_qr_code_without_logo] [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523469_error]DISCLAIMER This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to which they are addressed. If you have received this email in error please notify your system support manager. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the National Telecom Regulatory Authority (NTRA) . Finally, the recipient should check this email and any attachments for the presence of viruses. The NTRA accepts no liability for any damage caused by any virus transmitted by this email.
On 14 Aug 2020, at 11:32, Abdelmeniem Tharwat wrote:
Dear colleagues, I hope that you are doing well, since along time I tried to use regex to validate EAI addresses for many project I have related to UA, I used the tool here<https://rubular.com/> and used this Regex "^[\p{L}.%+-]+@[\p{L}.-]+\.[\p{L}]{2,}$" to validate some EAI addresses and it works well like the below screenshot.
{L} is for Unicode property Letter. So: - for IDNA, it is near (as IDNA base is Unicode Letter property) but not quite. see RFC5892 - for EAI, then it is restricting a lot since the mailbox can be almost any UTF8 string. see RFC6531 So you may want to use that regex, but be aware of its side-effects, including not accepting some domains and mailboxes. Finally, not all regex engines support Unicode properties, so make sure the one used support it. Regards, Marc.
[cid:image010.png@01D67260.F5721690]
Thanks a lot.
All the Best, Abdalmonem Tharwat Galila Deputy Manager, Dot Masr Registry, Operation Sector.
[NTRA Logo 2016] National Telecommunication Regulatory Authority [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523405_telephone] Office Tel.: +2 02 35341582<tel:02%2035341582> - +2 02 35341300<tel:02%2035341300> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Mobile] Mobile: +2 010 00049068<tel:010%2000049068> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: ICON] Fax : +2 02 35370537<tel:02%2035370537> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: oNLINE] Website : http:\\www.mcit.gov.eg<http://www.mcit.gov.eg/> : http:\\www.tra.gov.eg<http://www.mcit.gov.eg/> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523294_email] E-mail : agalila@mcit.gov.eg<mailto:agalila@mcit.gov.eg> : atharwat@tra.gov.eg<mailto:atharwat@tra.gov.eg> [Description: 1447802547_skype] Skype : abdalmonem.galila [Description: static_qr_code_without_logo] [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523469_error]DISCLAIMER This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to which they are addressed. If you have received this email in error please notify your system support manager. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the National Telecom Regulatory Authority (NTRA) . Finally, the recipient should check this email and any attachments for the presence of viruses. The NTRA accepts no liability for any damage caused by any virus transmitted by this email.
_______________________________________________ UA-discuss mailing list UA-discuss@icann.org https://mm.icann.org/mailman/listinfo/ua-discuss _______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
Could we have an examples so I could fix that !! Sent from my iPhone
On Aug 14, 2020, at 5:49 PM, Marc Blanchet <marc.blanchet@viagenie.ca> wrote:
On 14 Aug 2020, at 11:32, Abdelmeniem Tharwat wrote:
Dear colleagues, I hope that you are doing well, since along time I tried to use regex to validate EAI addresses for many project I have related to UA, I used the tool here<https://rubular.com/> and used this Regex "^[\p{L}.%+-]+@[\p{L}.-]+\.[\p{L}]{2,}$" to validate some EAI addresses and it works well like the below screenshot.
{L} is for Unicode property Letter. So: - for IDNA, it is near (as IDNA base is Unicode Letter property) but not quite. see RFC5892 - for EAI, then it is restricting a lot since the mailbox can be almost any UTF8 string. see RFC6531
So you may want to use that regex, but be aware of its side-effects, including not accepting some domains and mailboxes.
Finally, not all regex engines support Unicode properties, so make sure the one used support it.
Regards, Marc.
[cid:image010.png@01D67260.F5721690]
Thanks a lot.
All the Best, Abdalmonem Tharwat Galila Deputy Manager, Dot Masr Registry, Operation Sector.
[NTRA Logo 2016] National Telecommunication Regulatory Authority [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523405_telephone] Office Tel.: +2 02 35341582<tel:02%2035341582> - +2 02 35341300<tel:02%2035341300> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Mobile] Mobile: +2 010 00049068<tel:010%2000049068> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: ICON] Fax : +2 02 35370537<tel:02%2035370537> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: oNLINE] Website : http:\\www.mcit.gov.eg<http://www.mcit.gov.eg/> : http:\\www.tra.gov.eg<http://www.mcit.gov.eg/> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523294_email] E-mail : agalila@mcit.gov.eg<mailto:agalila@mcit.gov.eg> : atharwat@tra.gov.eg<mailto:atharwat@tra.gov.eg> [Description: 1447802547_skype] Skype : abdalmonem.galila [Description: static_qr_code_without_logo] [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523469_error]DISCLAIMER This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to which they are addressed. If you have received this email in error please notify your system support manager. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the National Telecom Regulatory Authority (NTRA) . Finally, the recipient should check this email and any attachments for the presence of viruses. The NTRA accepts no liability for any damage caused by any virus transmitted by this email.
_______________________________________________ UA-discuss mailing list UA-discuss@icann.org https://mm.icann.org/mailman/listinfo/ua-discuss _______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
On 14 Aug 2020, at 11:51, Abdelmeniem Tharwat wrote:
Could we have an examples so I could fix that !!
for IDNA, find any codepoint which does not have a Unicode property Letter but is PVALID from IDNA. Digits come to mind immediately \p{N}, but that is not the end of the story: many PVALID codepoints do not have the Unicode property L or N. Again, see RFC5892 and IANA IDNA registry. As I wrote and discussed in the Java tutorial for UA, if you fully want to correctly handle IDNA in a regex, you end up coding the full IDNA rules into Regex, which is, well, if not impossible, very very very complicated, and not worth the work. for EAI, then find any codepoint does not have a Unicode property Letter, and it won’t work with the regex below. The danger here is again promoting a regex which « kinda work but not quite » and that it becomes the « standard » everybody uses, and then one essentially create a fork of the RFCs with more limitations from an implementation point of view. One may argue that we should have done IDNA2008 based on the fact that it could be implemented in a regex, but that did not happen. The best way would be to modify the regex engine itself to embed the IDNA protocol inside it and then define a new regex token for IDNA and then we will be in business… Not an easy task. Regards, Marc.
Sent from my iPhone
On Aug 14, 2020, at 5:49 PM, Marc Blanchet <marc.blanchet@viagenie.ca> wrote:
On 14 Aug 2020, at 11:32, Abdelmeniem Tharwat wrote:
Dear colleagues, I hope that you are doing well, since along time I tried to use regex to validate EAI addresses for many project I have related to UA, I used the tool here<https://rubular.com/> and used this Regex "^[\p{L}.%+-]+@[\p{L}.-]+\.[\p{L}]{2,}$" to validate some EAI addresses and it works well like the below screenshot.
{L} is for Unicode property Letter. So: - for IDNA, it is near (as IDNA base is Unicode Letter property) but not quite. see RFC5892 - for EAI, then it is restricting a lot since the mailbox can be almost any UTF8 string. see RFC6531
So you may want to use that regex, but be aware of its side-effects, including not accepting some domains and mailboxes.
Finally, not all regex engines support Unicode properties, so make sure the one used support it.
Regards, Marc.
[cid:image010.png@01D67260.F5721690]
Thanks a lot.
All the Best, Abdalmonem Tharwat Galila Deputy Manager, Dot Masr Registry, Operation Sector.
[NTRA Logo 2016] National Telecommunication Regulatory Authority [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523405_telephone] Office Tel.: +2 02 35341582<tel:02%2035341582> - +2 02 35341300<tel:02%2035341300> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Mobile] Mobile: +2 010 00049068<tel:010%2000049068> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: ICON] Fax : +2 02 35370537<tel:02%2035370537> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: oNLINE] Website : http:\\www.mcit.gov.eg<http://www.mcit.gov.eg/> : http:\\www.tra.gov.eg<http://www.mcit.gov.eg/> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523294_email] E-mail : agalila@mcit.gov.eg<mailto:agalila@mcit.gov.eg> : atharwat@tra.gov.eg<mailto:atharwat@tra.gov.eg> [Description: 1447802547_skype] Skype : abdalmonem.galila [Description: static_qr_code_without_logo] [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523469_error]DISCLAIMER This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to which they are addressed. If you have received this email in error please notify your system support manager. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the National Telecom Regulatory Authority (NTRA) . Finally, the recipient should check this email and any attachments for the presence of viruses. The NTRA accepts no liability for any damage caused by any virus transmitted by this email.
_______________________________________________ UA-discuss mailing list UA-discuss@icann.org https://mm.icann.org/mailman/listinfo/ua-discuss _______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
In article <A1234DFB-AE2B-4FA4-9FB7-B65D365651BD@viagenie.ca> you write:
The best way would be to modify the regex engine itself to embed the IDNA protocol inside it and then define a new regex token for IDNA and then we will be in business… Not an easy task.
Good catch on the digits, I missed that. To validate host names I think the least bad plan is not to try to build IDNA into regexes, but do it in two steps, use a loose RE but then validate the domain through an actual IDN library. -- Regards, John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies", Please consider the environment before reading this e-mail. https://jl.ly
On 14 Aug 2020, at 12:15, John Levine wrote:
In article <A1234DFB-AE2B-4FA4-9FB7-B65D365651BD@viagenie.ca> you write:
The best way would be to modify the regex engine itself to embed the IDNA protocol inside it and then define a new regex token for IDNA and then we will be in business… Not an easy task.
Good catch on the digits, I missed that. To validate host names I think the least bad plan is not to try to build IDNA into regexes, but do it in two steps, use a loose RE but then validate the domain through an actual IDN library.
yeah. exactly what I said in the java-ua tutorial. Marc.
-- Regards, John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies", Please consider the environment before reading this e-mail. https://jl.ly
Hello Mark, others, On 15/08/2020 01:08, Marc Blanchet wrote:
The best way would be to modify the regex engine itself to embed the IDNA protocol inside it and then define a new regex token for IDNA and then we will be in business… Not an easy task.
I'm not sure I agree. IDNA is changing for new versions of Unicode, and that would mean that such regex engines need to be updated. If that's easy (e.g. it's part of the properties provided by Unicode, which are needed for version updates anyway), that might work, but otherwise, I'd guess it won't. Regards, Martin.
Regards, Marc.
On 15 Aug 2020, at 3:11, Martin J. Dürst wrote:
Hello Mark, others,
On 15/08/2020 01:08, Marc Blanchet wrote:
The best way would be to modify the regex engine itself to embed the IDNA protocol inside it and then define a new regex token for IDNA and then we will be in business… Not an easy task.
I'm not sure I agree. IDNA is changing for new versions of Unicode, and that would mean that such regex engines need to be updated. If that's easy (e.g. it's part of the properties provided by Unicode, which are needed for version updates anyway), that might work,
well, it depends on how/if the regex engine uses any underlying OS services, which may deliver what it needs.
but otherwise, I'd guess it won't.
as I wrote, that would be the best way (from a UA point of view), but far from sure it would work and certainly not an easy task. I’m not saying we should undertake that, but I was throwing an idea in case someone wants to try it out ;-) Marc.
Regards, Martin.
Regards, Marc.
Marc, I doubt that engines can not understand this !! Specially that it is working for Arabic and also test for Russian guys. If you have names of this engines, we have to contact them to support that, or what do you think !! Sent from my iPhone
On Aug 14, 2020, at 6:57 PM, Marc Blanchet <marc.blanchet@viagenie.ca> wrote:
On 14 Aug 2020, at 11:32, Abdelmeniem Tharwat wrote:
Dear colleagues, I hope that you are doing well, since along time I tried to use regex to validate EAI addresses for many project I have related to UA, I used the tool here<https://rubular.com/> and used this Regex "^[\p{L}.%+-]+@[\p{L}.-]+\.[\p{L}]{2,}$" to validate some EAI addresses and it works well like the below screenshot.
{L} is for Unicode property Letter. So: - for IDNA, it is near (as IDNA base is Unicode Letter property) but not quite. see RFC5892 - for EAI, then it is restricting a lot since the mailbox can be almost any UTF8 string. see RFC6531
So you may want to use that regex, but be aware of its side-effects, including not accepting some domains and mailboxes.
Finally, not all regex engines support Unicode properties, so make sure the one used support it.
Regards, Marc.
[cid:image010.png@01D67260.F5721690]
Thanks a lot.
All the Best, Abdalmonem Tharwat Galila Deputy Manager, Dot Masr Registry, Operation Sector.
[NTRA Logo 2016] National Telecommunication Regulatory Authority [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523405_telephone] Office Tel.: +2 02 35341582<tel:02%2035341582> - +2 02 35341300<tel:02%2035341300> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Mobile] Mobile: +2 010 00049068<tel:010%2000049068> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: ICON] Fax : +2 02 35370537<tel:02%2035370537> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: oNLINE] Website : http:\\www.mcit.gov.eg<http://www.mcit.gov.eg/> : http:\\www.tra.gov.eg<http://www.mcit.gov.eg/> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523294_email] E-mail : agalila@mcit.gov.eg<mailto:agalila@mcit.gov.eg> : atharwat@tra.gov.eg<mailto:atharwat@tra.gov.eg> [Description: 1447802547_skype] Skype : abdalmonem.galila [Description: static_qr_code_without_logo] [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523469_error]DISCLAIMER This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to which they are addressed. If you have received this email in error please notify your system support manager. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the National Telecom Regulatory Authority (NTRA) . Finally, the recipient should check this email and any attachments for the presence of viruses. The NTRA accepts no liability for any damage caused by any virus transmitted by this email.
_______________________________________________ UA-discuss mailing list UA-discuss@icann.org https://mm.icann.org/mailman/listinfo/ua-discuss _______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
On 14 Aug 2020, at 13:19, Abdalmonem Tharwat Galila wrote:
Marc, I doubt that engines can not understand this !!
you are top posting, so I’m not sure you are referring to what. I’m guessing that you are referring to my comment on « not all regex engines support Unicode properties ». I don’t have any data of all regex engines, but I’ve encountered in the past regex engines not supporting Unicode properties. Sometimes it is some extension, some flags to enable the use of these. Sometimes it is a different syntax (I think the POSIX one uses :className: format for example, and I don’t know how it maps to unicode). How many, which, well, that is something to figure out. It may be less of an issue nowadays, don’t know. My point is just make sure your regex engine support those. Experience has shown various degree of support of regex format enhancements of different regex engines. Marc.
Specially that it is working for Arabic and also test for Russian guys.
If you have names of this engines, we have to contact them to support that, or what do you think !!
Sent from my iPhone
On Aug 14, 2020, at 6:57 PM, Marc Blanchet <marc.blanchet@viagenie.ca> wrote:
On 14 Aug 2020, at 11:32, Abdelmeniem Tharwat wrote:
Dear colleagues, I hope that you are doing well, since along time I tried to use regex to validate EAI addresses for many project I have related to UA, I used the tool here<https://rubular.com/> and used this Regex "^[\p{L}.%+-]+@[\p{L}.-]+\.[\p{L}]{2,}$" to validate some EAI addresses and it works well like the below screenshot.
{L} is for Unicode property Letter. So: - for IDNA, it is near (as IDNA base is Unicode Letter property) but not quite. see RFC5892 - for EAI, then it is restricting a lot since the mailbox can be almost any UTF8 string. see RFC6531
So you may want to use that regex, but be aware of its side-effects, including not accepting some domains and mailboxes.
Finally, not all regex engines support Unicode properties, so make sure the one used support it.
Regards, Marc.
[cid:image010.png@01D67260.F5721690]
Thanks a lot.
All the Best, Abdalmonem Tharwat Galila Deputy Manager, Dot Masr Registry, Operation Sector.
[NTRA Logo 2016] National Telecommunication Regulatory Authority [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523405_telephone] Office Tel.: +2 02 35341582<tel:02%2035341582> - +2 02 35341300<tel:02%2035341300> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Mobile] Mobile: +2 010 00049068<tel:010%2000049068> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: ICON] Fax : +2 02 35370537<tel:02%2035370537> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: oNLINE] Website : http:\\www.mcit.gov.eg<http://www.mcit.gov.eg/> : http:\\www.tra.gov.eg<http://www.mcit.gov.eg/> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523294_email] E-mail : agalila@mcit.gov.eg<mailto:agalila@mcit.gov.eg> : atharwat@tra.gov.eg<mailto:atharwat@tra.gov.eg> [Description: 1447802547_skype] Skype : abdalmonem.galila [Description: static_qr_code_without_logo] [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523469_error]DISCLAIMER This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to which they are addressed. If you have received this email in error please notify your system support manager. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the National Telecom Regulatory Authority (NTRA) . Finally, the recipient should check this email and any attachments for the presence of viruses. The NTRA accepts no liability for any damage caused by any virus transmitted by this email.
_______________________________________________ UA-discuss mailing list UA-discuss@icann.org https://mm.icann.org/mailman/listinfo/ua-discuss _______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
Writing a single regexp is probably forbiddingly complicated (and I say that as someone who once wrote a 25-line regexp and used it in production), but writing two might work well. One that accepts most valid names and accepts noting invalid, and one that rejects common/assumedly common errors and accepts everything that's valid (and some errors). These should be simple to write, and yield three answers: good, bad and uncertain. Arnt
So, like many UASG discussions, this is very enlightening from an academic point of view. However, my question after all of these considerations is: what are the reasonable next steps to take in relation to making this into some sort of guideline that we can recommend to third parties who need a simple answer? PS: There are 3 very active Marks/Marc on this group, so I'd suggest the list always use the first letter of our surnames to make it less confusing. Regards, -- Mark W. Datysgeld from Governance Primer [www.markwd.website] In partnership with AR-TARC and the Brazilian Association of Software Companies (ABES) On August 14, 2020 3:33:08 PM GMT-03:00, Arnt Gulbrandsen <arnt@gulbrandsen.priv.no> wrote:
Writing a single regexp is probably forbiddingly complicated (and I say
that as someone who once wrote a 25-line regexp and used it in production), but writing two might work well.
One that accepts most valid names and accepts noting invalid, and one that rejects common/assumedly common errors and accepts everything that's valid (and
some errors). These should be simple to write, and yield three answers:
good, bad and uncertain.
Arnt _______________________________________________ UA-discuss mailing list UA-discuss@icann.org https://mm.icann.org/mailman/listinfo/ua-discuss _______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
On 15 Aug 2020, at 7:04, Mark W. Datysgeld wrote:
So, like many UASG discussions, this is very enlightening from an academic point of view. However, my question after all of these considerations is: what are the reasonable next steps to take in relation to making this into some sort of guideline that we can recommend to third parties who need a simple answer?
The recommendation I made during the Java tutorial on UA is as follows: - use a simple regex to throw out the basic errors (example: an (EAI email address must have a « @ « and at least one « . » on the right side; you can enhance this a bit, but if you go too much, then you start getting into trouble) - use a library for verifying IDNA. - there is not that much one can do on the left side of an EAI email address. maybe just verifying it is a valid UTF-8 string. John Levine also wrote similarly in a previous email in this thread. Marc.
PS: There are 3 very active Marks/Marc on this group, so I'd suggest the list always use the first letter of our surnames to make it less confusing.
Regards, -- Mark W. Datysgeld from Governance Primer [www.markwd.website] In partnership with AR-TARC and the Brazilian Association of Software Companies (ABES)
On August 14, 2020 3:33:08 PM GMT-03:00, Arnt Gulbrandsen <arnt@gulbrandsen.priv.no> wrote:
Writing a single regexp is probably forbiddingly complicated (and I say
that as someone who once wrote a 25-line regexp and used it in production), but writing two might work well.
One that accepts most valid names and accepts noting invalid, and one that rejects common/assumedly common errors and accepts everything that's valid (and
some errors). These should be simple to write, and yield three answers:
good, bad and uncertain.
Arnt _______________________________________________ UA-discuss mailing list UA-discuss@icann.org https://mm.icann.org/mailman/listinfo/ua-discuss _______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
_______________________________________________ UA-discuss mailing list UA-discuss@icann.org https://mm.icann.org/mailman/listinfo/ua-discuss _______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
In article <14D6CA6C-2F5F-4FAB-8602-41B700DB727E@viagenie.ca>, Marc Blanchet <marc.blanchet@viagenie.ca> wrote:
The recommendation I made during the Java tutorial on UA is as follows: - use a simple regex to throw out the basic errors (example: an (EAI email address must have a « @ « and at least one « . » on the right side; you can enhance this a bit, but if you go too much, then you start getting into trouble) - use a library for verifying IDNA. - there is not that much one can do on the left side of an EAI email address. maybe just verifying it is a valid UTF-8 string.
John Levine also wrote similarly in a previous email in this thread.
You can do a little more validation of the mailbox to be sure it follows the syntax rules, e.g., doesn't have two unquoted dots in a row. For practical purposes I would also exclude % and ! which used to be (ab)used for source routing and are likely to fail. -- Regards, John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies", Please consider the environment before reading this e-mail. https://jl.ly
So now we back to the short regex for EAI address validation at same time it will be good to have more patterns to test and make any necessary changes. FYI, I used this regex with bootstrap and query for the client side validation and use it inside the Java code for server side validation. Sent from my iPhone
On Aug 15, 2020, at 4:55 PM, John Levine <john.levine@standcore.com> wrote:
In article <14D6CA6C-2F5F-4FAB-8602-41B700DB727E@viagenie.ca>, Marc Blanchet <marc.blanchet@viagenie.ca> wrote:
The recommendation I made during the Java tutorial on UA is as follows: - use a simple regex to throw out the basic errors (example: an (EAI email address must have a « @ « and at least one « . » on the right side; you can enhance this a bit, but if you go too much, then you start getting into trouble) - use a library for verifying IDNA. - there is not that much one can do on the left side of an EAI email address. maybe just verifying it is a valid UTF-8 string.
John Levine also wrote similarly in a previous email in this thread.
You can do a little more validation of the mailbox to be sure it follows the syntax rules, e.g., doesn't have two unquoted dots in a row. For practical purposes I would also exclude % and ! which used to be (ab)used for source routing and are likely to fail.
-- Regards, John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies", Please consider the environment before reading this e-mail. https://jl.ly _______________________________________________ UA-discuss mailing list UA-discuss@icann.org https://mm.icann.org/mailman/listinfo/ua-discuss _______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
Thanks, Marc. -- Mark W. Datysgeld from Governance Primer [www.markwd.website] In partnership with AR-TARC and the Brazilian Association of Software Companies (ABES) On August 15, 2020 11:34:36 AM GMT-03:00, Marc Blanchet <marc.blanchet@viagenie.ca> wrote:
On 15 Aug 2020, at 7:04, Mark W. Datysgeld wrote:
So, like many UASG discussions, this is very enlightening from an academic point of view. However, my question after all of these considerations is: what are the reasonable next steps to take in relation to making this into some sort of guideline that we can recommend to third parties who need a simple answer?
The recommendation I made during the Java tutorial on UA is as follows: - use a simple regex to throw out the basic errors (example: an (EAI email address must have a « @ « and at least one « . » on the right side; you can enhance this a bit, but if you go too much, then you start getting into trouble) - use a library for verifying IDNA. - there is not that much one can do on the left side of an EAI email address. maybe just verifying it is a valid UTF-8 string.
John Levine also wrote similarly in a previous email in this thread.
Marc.
PS: There are 3 very active Marks/Marc on this group, so I'd suggest the list always use the first letter of our surnames to make it less confusing.
Regards, -- Mark W. Datysgeld from Governance Primer [www.markwd.website] In partnership with AR-TARC and the Brazilian Association of Software
Companies (ABES)
On August 14, 2020 3:33:08 PM GMT-03:00, Arnt Gulbrandsen <arnt@gulbrandsen.priv.no> wrote:
Writing a single regexp is probably forbiddingly complicated (and I say
that as someone who once wrote a 25-line regexp and used it in production), but writing two might work well.
One that accepts most valid names and accepts noting invalid, and one that rejects common/assumedly common errors and accepts everything that's valid (and
some errors). These should be simple to write, and yield three answers:
good, bad and uncertain.
Arnt _______________________________________________ UA-discuss mailing list UA-discuss@icann.org https://mm.icann.org/mailman/listinfo/ua-discuss _______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
_______________________________________________ UA-discuss mailing list UA-discuss@icann.org https://mm.icann.org/mailman/listinfo/ua-discuss _______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
Abdel, thank you greatly for this. One of the coders at Governance Primer has been studying that code since you showed it on the EAI call, and came up witht he following variant: ^(([\p{L}.%+-]+@[\p{L}.-]+\.[\p{L}]{2,})|([\p{L}]{2,}\.[\p{L}.-]+@[\p{L}.%+-]+))$ What do you think about this one? Regards, -- Mark W. Datysgeld from Governance Primer [www.markwd.website] In partnership with AR-TARC and the Brazilian Association of Software Companies (ABES) On August 14, 2020 12:32:24 PM GMT-03:00, Abdelmeniem Tharwat <atharwat@tra.gov.eg> wrote:
Dear colleagues, I hope that you are doing well, since along time I tried to use regex to validate EAI addresses for many project I have related to UA, I used the tool here<https://rubular.com/> and used this Regex "^[\p{L}.%+-]+@[\p{L}.-]+\.[\p{L}]{2,}$" to validate some EAI addresses and it works well like the below screenshot.
[cid:image010.png@01D67260.F5721690]
Thanks a lot.
All the Best, Abdalmonem Tharwat Galila Deputy Manager, Dot Masr Registry, Operation Sector.
[NTRA Logo 2016] National Telecommunication Regulatory Authority [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523405_telephone] Office Tel.: +2 02 35341582<tel:02%2035341582> - +2 02 35341300<tel:02%2035341300> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Mobile] Mobile: +2 010 00049068<tel:010%2000049068> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: ICON] Fax : +2 02 35370537<tel:02%2035370537> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: oNLINE] Website : http:\\www.mcit.gov.eg<http://www.mcit.gov.eg/> : http:\\www.tra.gov.eg<http://www.mcit.gov.eg/> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523294_email] E-mail : agalila@mcit.gov.eg<mailto:agalila@mcit.gov.eg> : atharwat@tra.gov.eg<mailto:atharwat@tra.gov.eg> [Description: 1447802547_skype] Skype : abdalmonem.galila [Description: static_qr_code_without_logo] [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523469_error]DISCLAIMER This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to which they are addressed. If you have received this email in error please notify your system support manager. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the National Telecom Regulatory Authority (NTRA) . Finally, the recipient should check this email and any attachments for the presence of viruses. The NTRA accepts no liability for any damage caused by any virus transmitted by this email.
Thanks Mark, I tried the regex below but it came with similar results, so could we have his idea behind this change as there may be some patterns have to considered for our future considerations !! Thanks a lot. All the Best, Abdalmonem Tharwat Galila Deputy Manager, Dot Masr Registry, Operation Sector. [NTRA Logo 2016] National Telecommunication Regulatory Authority [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523405_telephone] Office Tel.: +2 02 35341582<tel:02%2035341582> - +2 02 35341300<tel:02%2035341300> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Mobile] Mobile: +2 010 00049068<tel:010%2000049068> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: ICON] Fax : +2 02 35370537<tel:02%2035370537> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: oNLINE] Website : http:\\www.mcit.gov.eg<http://www.mcit.gov.eg/> : http:\\www.tra.gov.eg<http://www.mcit.gov.eg/> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523294_email] E-mail : agalila@mcit.gov.eg<mailto:agalila@mcit.gov.eg> : atharwat@tra.gov.eg<mailto:atharwat@tra.gov.eg> [Description: 1447802547_skype] Skype : abdalmonem.galila [Description: static_qr_code_without_logo] [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523469_error]DISCLAIMER This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to which they are addressed. If you have received this email in error please notify your system support manager. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the National Telecom Regulatory Authority (NTRA) . Finally, the recipient should check this email and any attachments for the presence of viruses. The NTRA accepts no liability for any damage caused by any virus transmitted by this email. From: UA-EAI <ua-eai-bounces@icann.org> On Behalf Of Mark W. Datysgeld Sent: Friday, August 14, 2020 05:56 PM To: ua-discuss@icann.org; Abdelmeniem Tharwat <atharwat@tra.gov.eg>; uaeai <ua-eai@icann.org>; ua <ua-discuss@icann.org>; TF-AIDN (tf-aidn@meswg.org) <tf-aidn@meswg.org> Cc: savyovm@gmail.com Subject: Re: [UA-EAI] [UA-discuss] Regular Expression To Validate EAI Addresses Abdel, thank you greatly for this. One of the coders at Governance Primer has been studying that code since you showed it on the EAI call, and came up witht he following variant: ^(([\p{L}.%+-]+@[\p{L}.-]+\.[\p{L}]{2,})|([\p{L}]{2,}\.[\p{L}.-]+@[\p{L}.%+-]+))$<mailto:+@[\p%7bL%7d.-%5d+\.%5b\p%7bL%7d%5d%7b2,%7d)|(%5b\p%7bL%7d%5d%7b2,%7d\.%5b\p%7bL%7d.-%5d+@%5b\p%7bL%7d.%25+-%5d+))$> What do you think about this one? Regards, -- Mark W. Datysgeld from Governance Primer [www.markwd.website] In partnership with AR-TARC and the Brazilian Association of Software Companies (ABES) On August 14, 2020 12:32:24 PM GMT-03:00, Abdelmeniem Tharwat <atharwat@tra.gov.eg<mailto:atharwat@tra.gov.eg>> wrote: Dear colleagues, I hope that you are doing well, since along time I tried to use regex to validate EAI addresses for many project I have related to UA, I used the tool here<https://rubular.com/> and used this Regex “^[\p{L}.%+-]+@[\p{L}.-]+\.[\p{L}]{2,}$<mailto:+@[\p%7bL%7d.-%5d+\.%5b\p%7bL%7d%5d%7b2,%7d$>” to validate some EAI addresses and it works well like the below screenshot. [cid:image010.png@01D67260.F5721690] Thanks a lot. All the Best, Abdalmonem Tharwat Galila Deputy Manager, Dot Masr Registry, Operation Sector. [NTRA Logo 2016] National Telecommunication Regulatory Authority [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523405_telephone] Office Tel.: +2 02 35341582<tel:02%2035341582> - +2 02 35341300<tel:02%2035341300> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Mobile] Mobile: +2 010 00049068<tel:010%2000049068> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: ICON] Fax : +2 02 35370537<tel:02%2035370537> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: oNLINE] Website : http:\\www.mcit.gov.eg<http://www.mcit.gov.eg/> : http:\\www.tra.gov.eg<http://www.mcit.gov.eg/> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523294_email] E-mail : agalila@mcit.gov.eg<mailto:agalila@mcit.gov.eg> : atharwat@tra.gov.eg<mailto:atharwat@tra.gov.eg> [Description: 1447802547_skype] Skype : abdalmonem.galila [Description: static_qr_code_without_logo] [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523469_error]DISCLAIMER This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to which they are addressed. If you have received this email in error please notify your system support manager. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the National Telecom Regulatory Authority (NTRA) . Finally, the recipient should check this email and any attachments for the presence of viruses. The NTRA accepts no liability for any damage caused by any virus transmitted by this email.
In article <58D262E2-7B30-4F9B-882F-49865CDC4B2A@governanceprimer.com> you write:
-=-=-=-=-=- -=-=-=-=-=-
Abdel, thank you greatly for this.
One of the coders at Governance Primer has been studying that code since you showed it on the EAI call, and came up witht he following variant:
^(([\p{L}.%+-]+@[\p{L}.-]+\.[\p{L}]{2,})|([\p{L}]{2,}\.[\p{L}.-]+@[\p{L}.%+-]+))$
What do you think about this one?
I don't understand what this one is trying to do but it's clearly wrong since it allows % in host names. If it's trying to handle RTL scripts, your coder misunderstands the way RE matching works. It matches characters in the order they're stored, not the order they're displayed. -- Regards, John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies", Please consider the environment before reading this e-mail. https://jl.ly
So John, which one could we follow or modify to handle Marc concern !! Sent from my iPhone
On Aug 14, 2020, at 6:12 PM, John Levine <john.levine@standcore.com> wrote:
In article <58D262E2-7B30-4F9B-882F-49865CDC4B2A@governanceprimer.com> you write:
-=-=-=-=-=- -=-=-=-=-=-
Abdel, thank you greatly for this.
One of the coders at Governance Primer has been studying that code since you showed it on the EAI call, and came up witht he following variant:
^(([\p{L}.%+-]+@[\p{L}.-]+\.[\p{L}]{2,})|([\p{L}]{2,}\.[\p{L}.-]+@[\p{L}.%+-]+))$
What do you think about this one?
I don't understand what this one is trying to do but it's clearly wrong since it allows % in host names.
If it's trying to handle RTL scripts, your coder misunderstands the way RE matching works. It matches characters in the order they're stored, not the order they're displayed.
-- Regards, John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies", Please consider the environment before reading this e-mail. https://jl.ly _______________________________________________ UA-discuss mailing list UA-discuss@icann.org https://mm.icann.org/mailman/listinfo/ua-discuss _______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
I see what you are saying. I think that's precisely the case. -- Mark W. Datysgeld from Governance Primer [www.markwd.website] In partnership with AR-TARC and the Brazilian Association of Software Companies (ABES) On August 14, 2020 1:13:03 PM GMT-03:00, John Levine <john.levine@standcore.com> wrote:
In article <58D262E2-7B30-4F9B-882F-49865CDC4B2A@governanceprimer.com> you write:
-=-=-=-=-=- -=-=-=-=-=-
Abdel, thank you greatly for this.
One of the coders at Governance Primer has been studying that code since you showed it on the EAI call, and came up witht he following variant:
^(([\p{L}.%+-]+@[\p{L}.-]+\.[\p{L}]{2,})|([\p{L}]{2,}\.[\p{L}.-]+@[\p{L}.%+-]+))$
What do you think about this one?
I don't understand what this one is trying to do but it's clearly wrong since it allows % in host names.
If it's trying to handle RTL scripts, your coder misunderstands the way RE matching works. It matches characters in the order they're stored, not the order they're displayed.
-- Regards, John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies", Please consider the environment before reading this e-mail. https://jl.ly
In article <aabfc6a29ba14bdf82a2197a3c211e92@ntra-mbx1.TRA.GOV.EG> you write:
Regex "^[\p{L}.%+-]+@[\p{L}.-]+\.[\p{L}]{2,}$" to validate some EAI addresses and it works well like the below screenshot.
I would take the % out of the local part. While it is techically allowed, historically it has been used to do source routing and been widely abused, so that most mail systems reject addresses with % as a security risk. -- Regards, John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies", Please consider the environment before reading this e-mail. https://jl.ly
participants (7)
-
Abdalmonem Tharwat Galila -
Abdelmeniem Tharwat -
Arnt Gulbrandsen -
John Levine -
Marc Blanchet -
Mark W. Datysgeld -
Martin J. Dürst