Educational session on existing variant practices
The topic of variants isn't new and the issues we are trying to examine here in this group may have already been defined and/or dealt with somewhere on this planet. We have experts from different fields here, and not necessarily all of us are well informed about the existing practices related to variants. Hence an idea: how about holding an educational session for the entire group where we will review how variant issues are managed in: 1. Existing ICANN TLD programs (Fast Track, Applicant Guidebook) 2. IDN ccTLD registries - variants on the top level 3. TLD registries - variants on the second level 4. DNS standards (some background on DNAMEs might be helpful) 5. RFCs that deal with string similarity (4790, 3743 for Chinese, 5992 for Cyrillic, others?) 6. (possibly) Trademark laws I'm happy to contribute to part 3 sharing the .tel experience with managing variants. We support IDNs in 15 languages with different degrees of variant management in Chinese, Japanese and Russian. Would the group support such an effort? Best regards, Vladimir Shadrunov
Dear Vladimir, Thank you very much for excellent idea - I can try to make short summary on the 1st point. Best regards, Oksana 2011/6/30 Vladimir Shadrunov <vlad.london.uk@gmail.com>
The topic of variants isn't new and the issues we are trying to examine here in this group may have already been defined and/or dealt with somewhere on this planet.
We have experts from different fields here, and not necessarily all of us are well informed about the existing practices related to variants. Hence an idea: how about holding an educational session for the entire group where we will review how variant issues are managed in:
1. Existing ICANN TLD programs (Fast Track, Applicant Guidebook) 2. IDN ccTLD registries - variants on the top level 3. TLD registries - variants on the second level 4. DNS standards (some background on DNAMEs might be helpful) 5. RFCs that deal with string similarity (4790, 3743 for Chinese, 5992 for Cyrillic, others?) 6. (possibly) Trademark laws
I'm happy to contribute to part 3 sharing the .tel experience with managing variants. We support IDNs in 15 languages with different degrees of variant management in Chinese, Japanese and Russian.
Would the group support such an effort?
Best regards, Vladimir Shadrunov
Vladimir, a very good idea, and I support it. I could contribute to points 1, 2, and 3 (if I can make it to the session). [cid:image001.png@01CC3751.995588A0] National Information Technology Center Fahd A. Batayneh Team Lead National Domain Names Division Data and Network Security Department P.O.Box: 259 ▪ Amman 11941 ▪ Jordan Tel: 962.6.5300225 Fax: 962.6.5300277 E-Mail: fahd.batayneh@nitc.gov.jo<mailto:fahd.batayneh@nitc.gov.jo> -- Follow NITC on Twitter<http://twitter.com/jordannitc> Register your Arabic Domain Name under .alordun. For more information, please visit our website http://www.idn.jo/ or http://نطاقات-عربية.الاردن/ Disclaimer The message contained in this e-mail along with the attachments (if present) are meant for the use of the intended recipient only. If you are not the intended recipient, please notify the sender immediately. Any unauthorized disclosure, copying, distribution of or taking any action in reliance on the contents of the information contained herein is strictly prohibited From: vip-bounces@icann.org [mailto:vip-bounces@icann.org] On Behalf Of Vladimir Shadrunov Sent: Thursday, June 30, 2011 5:45 PM To: vip@icann.org Subject: [vip] Educational session on existing variant practices The topic of variants isn't new and the issues we are trying to examine here in this group may have already been defined and/or dealt with somewhere on this planet. We have experts from different fields here, and not necessarily all of us are well informed about the existing practices related to variants. Hence an idea: how about holding an educational session for the entire group where we will review how variant issues are managed in: 1. Existing ICANN TLD programs (Fast Track, Applicant Guidebook) 2. IDN ccTLD registries - variants on the top level 3. TLD registries - variants on the second level 4. DNS standards (some background on DNAMEs might be helpful) 5. RFCs that deal with string similarity (4790, 3743 for Chinese, 5992 for Cyrillic, others?) 6. (possibly) Trademark laws I'm happy to contribute to part 3 sharing the .tel experience with managing variants. We support IDNs in 15 languages with different degrees of variant management in Chinese, Japanese and Russian. Would the group support such an effort? Best regards, Vladimir Shadrunov
Vladimir, I would be glad to share our experience with #3. -Behnam On Thu, Jun 30, 2011 at 10:45 AM, Vladimir Shadrunov <vlad.london.uk@gmail.com> wrote:
The topic of variants isn't new and the issues we are trying to examine here in this group may have already been defined and/or dealt with somewhere on this planet.
We have experts from different fields here, and not necessarily all of us are well informed about the existing practices related to variants. Hence an idea: how about holding an educational session for the entire group where we will review how variant issues are managed in:
1. Existing ICANN TLD programs (Fast Track, Applicant Guidebook) 2. IDN ccTLD registries - variants on the top level 3. TLD registries - variants on the second level 4. DNS standards (some background on DNAMEs might be helpful) 5. RFCs that deal with string similarity (4790, 3743 for Chinese, 5992 for Cyrillic, others?) 6. (possibly) Trademark laws
I'm happy to contribute to part 3 sharing the .tel experience with managing variants. We support IDNs in 15 languages with different degrees of variant management in Chinese, Japanese and Russian.
Would the group support such an effort?
Best regards, Vladimir Shadrunov
-- ' بهنام اسفهبد ' Behnam Esfahbod ' http://behnam.esfahbod.info * .. http://zwnj.org/ * ` * http://persian-computing.ir * o * 3E7F B4B6 6F4C A8AB 9BB9 7520 5701 CA40 259E 0F8B
I will share Pakistan's experiences regarding items # 1 and 2. Regards Syed Iftikhar H. Shah --- On Thu, 6/30/11, Behnam Esfahbod <behnam@esfahbod.info> wrote:
From: Behnam Esfahbod <behnam@esfahbod.info> Subject: Re: [vip] Educational session on existing variant practices To: "Vladimir Shadrunov" <vlad.london.uk@gmail.com> Cc: vip@icann.org Date: Thursday, June 30, 2011, 2:52 PM Vladimir,
I would be glad to share our experience with #3.
-Behnam
On Thu, Jun 30, 2011 at 10:45 AM, Vladimir Shadrunov <vlad.london.uk@gmail.com> wrote:
The topic of variants isn't new and the issues we are trying to examine here in this group may have already been defined and/or dealt with somewhere on this planet.
We have experts from different fields here, and not necessarily all of us are well informed about the existing practices related to variants. Hence an idea: how about holding an educational session for the entire group where we will review how variant issues are managed in:
1. Existing ICANN TLD programs (Fast Track, Applicant Guidebook) 2. IDN ccTLD registries - variants on the top level 3. TLD registries - variants on the second level 4. DNS standards (some background on DNAMEs might be helpful) 5. RFCs that deal with string similarity (4790, 3743 for Chinese, 5992 for Cyrillic, others?) 6. (possibly) Trademark laws
I'm happy to contribute to part 3 sharing the .tel experience with managing variants. We support IDNs in 15 languages with different degrees of variant management in Chinese, Japanese and Russian.
Would the group support such an effort?
Best regards, Vladimir Shadrunov
-- ' بهنام اسفهبد ' Behnam Esfahbod ' http://behnam.esfahbod.info * .. http://zwnj.org/ * ` * http://persian-computing.ir * o * 3E7F B4B6 6F4C A8AB 9BB9 7520 5701 CA40 259E 0F8B
Seems the idea has some support and we have a few people who are willing to share their knowledge and experience. Here's the list of those who responded on 1, 2 and 3. 1. Existing ICANN TLD programs (Fast Track, Applicant Guidebook) Oksana Prykhodko, *Fahd Batayneh*, iftakhar shah 2. IDN ccTLD registries - variants on the top level *Fahd Batayneh, *iftakhar shah 3. TLD registries - variants on the second level, *Fahd Batayneh*, Vladimir Shadrunov , Behnam Esfahbod, Andrzej Bartosiewicz If someone active in the IETF community would be willing to give a presentation on how existing DNS standards and related RFC handle string similarity that would also be great. Additionally some feedback from the IP lawyers would be appreciated as to whether the variant problem exists in the IP realm. May I defer it to the chair or to the moderator of this list to respond regarding how a group-wide webinar can be implemented. Thanks, Vladimir Shadrunov On 3 July 2011 15:20, Andrzej Bartosiewicz <andrzej@yonita.com> wrote:
** On 6/30/2011 4:45 PM, Vladimir Shadrunov wrote:
3. TLD registries - variants on the second level
Happy to share my experience with variants on TLD level.
Best, Andrzej
-- *Dr. Andrzej Bartosiewicz* phone (US): +1 650 2493707 phone (Poland): +48 518 235209
Dear colleagues, I think this is an excellent suggestion worthy of pursuit. I have one remark, however. On Mon, Jul 04, 2011 at 11:04:07AM +0100, Vladimir Shadrunov wrote:
If someone active in the IETF community would be willing to give a presentation on how existing DNS standards and related RFC handle string similarity that would also be great. Additionally some feedback from the IP lawyers would be appreciated as to whether the variant problem exists in the IP realm.
I can certainly provide an overview of how various DNS technologies can be used to help with label-string similarity issues. There are some tricks one can use in the DNS that make some of these issues in some ways easier, and in some ways trickier, to deal with. But the short answer to the question as phrased above is much more blunt, and I want to put it here on the list so that it is not lost. The existing DNS standards and related RFCs do nothing at all about string similarity. DNS is an exact-match technology. You send a query for a QNAME, QCLASS, and QTYPE. If there is an entry in the authoritative name server you happen to talk to, or some in intermediate cache, that matches _exactly_ your requested combination, you get back an answer. If there is the same name but something else doesn't match, you get back an empty answer with no error (or, sometimes, a redirection). And if there is no such name in the authoritative servers, you get back "NXDOMAIN" (that is, RCODE=3 Name Error). This is a deep and fundamental part of the DNS, and it is important we not lose sight of it; it is not something we can change without effectively replacing the DNS itself. Anything that we cannot ultimately simulate by adding to the number of exact matches in the global DNS -- that is, making the DNS bigger -- is just not a policy that can ever be deployed. Best regards, A -- Andrew Sullivan ajs@anvilwalrusden.com
Hi Andrew and thanks for your comment. I perfectly understand your points. However, I heard a number of times that DNAME or some other xNAME may be part of the solution for the problem we are trying to define. There may be members of this group who do not know well what DNAME is and I think a 5-10 minute presentation may fill this gap. With regards to the work of IETF there are RFCs that consider variants and string similarity, at least for Chinese and Cyrillic scripts. I believe this is very relevant to the work of this group and as these are finished work products I believe it may be useful for the group to learn if the variant issues were defined and dealt with in these RFCs. Best regards, Vladimir Shadrunov On 4 July 2011 22:03, Andrew Sullivan <ajs@anvilwalrusden.com> wrote:
Dear colleagues,
I think this is an excellent suggestion worthy of pursuit. I have one remark, however.
On Mon, Jul 04, 2011 at 11:04:07AM +0100, Vladimir Shadrunov wrote:
If someone active in the IETF community would be willing to give a presentation on how existing DNS standards and related RFC handle string similarity that would also be great. Additionally some feedback from the IP lawyers would be appreciated as to whether the variant problem exists in the IP realm.
I can certainly provide an overview of how various DNS technologies can be used to help with label-string similarity issues. There are some tricks one can use in the DNS that make some of these issues in some ways easier, and in some ways trickier, to deal with.
But the short answer to the question as phrased above is much more blunt, and I want to put it here on the list so that it is not lost. The existing DNS standards and related RFCs do nothing at all about string similarity. DNS is an exact-match technology. You send a query for a QNAME, QCLASS, and QTYPE. If there is an entry in the authoritative name server you happen to talk to, or some in intermediate cache, that matches _exactly_ your requested combination, you get back an answer. If there is the same name but something else doesn't match, you get back an empty answer with no error (or, sometimes, a redirection). And if there is no such name in the authoritative servers, you get back "NXDOMAIN" (that is, RCODE=3 Name Error).
This is a deep and fundamental part of the DNS, and it is important we not lose sight of it; it is not something we can change without effectively replacing the DNS itself. Anything that we cannot ultimately simulate by adding to the number of exact matches in the global DNS -- that is, making the DNS bigger -- is just not a policy that can ever be deployed.
Best regards,
A
-- Andrew Sullivan ajs@anvilwalrusden.com
Hi Vladimir and all, I have been discussing for quite a long period with the DNSEXT group of the IETF because (in my opinion) an xNAME solution is necessary in many cases for a good user experience of the IDN Variants. That discussion has not finished but it hasn't progressed significantly since the SF Icann meeting. I fully support your proposal about the presentation, since many are unaware of the difficulties of administering multiple TLDs (when they should perform as one) while trying to help the user minimize his costs and troubles. I believe Andrew could tell us what the present situation of the xNAME discussion is. I had to take some unavoidable time of absence lately and I am a little uninformed since so please forgive me for not giving you a clear picture of the current status of this issue. Kind Regards, Vaggelis Segredakis _____ From: vip-bounces@icann.org [mailto:vip-bounces@icann.org] On Behalf Of Vladimir Shadrunov Sent: Tuesday, July 05, 2011 12:07 PM To: vip@icann.org Subject: Re: [vip] Educational session on existing variant practices Hi Andrew and thanks for your comment. I perfectly understand your points. However, I heard a number of times that DNAME or some other xNAME may be part of the solution for the problem we are trying to define. There may be members of this group who do not know well what DNAME is and I think a 5-10 minute presentation may fill this gap. With regards to the work of IETF there are RFCs that consider variants and string similarity, at least for Chinese and Cyrillic scripts. I believe this is very relevant to the work of this group and as these are finished work products I believe it may be useful for the group to learn if the variant issues were defined and dealt with in these RFCs. Best regards, Vladimir Shadrunov On 4 July 2011 22:03, Andrew Sullivan <ajs@anvilwalrusden.com> wrote: Dear colleagues, I think this is an excellent suggestion worthy of pursuit. I have one remark, however. On Mon, Jul 04, 2011 at 11:04:07AM +0100, Vladimir Shadrunov wrote:
If someone active in the IETF community would be willing to give a presentation on how existing DNS standards and related RFC handle string similarity that would also be great. Additionally some feedback from the IP lawyers would be appreciated as to whether the variant problem exists in the IP realm.
I can certainly provide an overview of how various DNS technologies can be used to help with label-string similarity issues. There are some tricks one can use in the DNS that make some of these issues in some ways easier, and in some ways trickier, to deal with. But the short answer to the question as phrased above is much more blunt, and I want to put it here on the list so that it is not lost. The existing DNS standards and related RFCs do nothing at all about string similarity. DNS is an exact-match technology. You send a query for a QNAME, QCLASS, and QTYPE. If there is an entry in the authoritative name server you happen to talk to, or some in intermediate cache, that matches _exactly_ your requested combination, you get back an answer. If there is the same name but something else doesn't match, you get back an empty answer with no error (or, sometimes, a redirection). And if there is no such name in the authoritative servers, you get back "NXDOMAIN" (that is, RCODE=3 Name Error). This is a deep and fundamental part of the DNS, and it is important we not lose sight of it; it is not something we can change without effectively replacing the DNS itself. Anything that we cannot ultimately simulate by adding to the number of exact matches in the global DNS -- that is, making the DNS bigger -- is just not a policy that can ever be deployed. Best regards, A -- Andrew Sullivan ajs@anvilwalrusden.com
I believe Andrew could tell us what the present situation of the xNAME discussion is. I had to take some unavoidable time of absence lately and I am a little uninformed since so please forgive me for not giving you a clear picture of the current status of this issue. As far as I know, there is still no clear picture what the requirements are for "variants". Furthermore, it seems that a "quick fix" in the style of more alias mechanisms is good enough or even doable. Therefore this discussion about variants to analyse the problem and hopefully come up with a better understanding. jaap
I believe Andrew could tell us what the present situation of the xNAME discussion is. I had to take some unavoidable time of absence lately and I am a little uninformed since so please forgive me for not giving you a clear picture of the current status of this issue. As far as I know, there is still no clear picture what the requirements are for "variants". Furthermore, it seems that a "quick fix" in the style of more alias mechanisms is good enough or even doable. Therefore this discussion about variants to analyse the problem and hopefully come up with a better understanding. Oops, I of course meant "is not good enough" ... Apologies jaap
On 4 Jul 2011, at 11:04, Vladimir Shadrunov wrote:
Seems the idea has some support and we have a few people who are willing to share their knowledge and experience. Here's the list of those who responded on 1, 2 and 3.
1. Existing ICANN TLD programs (Fast Track, Applicant Guidebook)
Oksana Prykhodko, Fahd Batayneh, iftakhar shah
2. IDN ccTLD registries - variants on the top level
Fahd Batayneh, iftakhar shah
3. TLD registries - variants on the second level,
Fahd Batayneh, Vladimir Shadrunov , Behnam Esfahbod, Andrzej Bartosiewicz
+1 Desiree --
If someone active in the IETF community would be willing to give a presentation on how existing DNS standards and related RFC handle string similarity that would also be great. Additionally some feedback from the IP lawyers would be appreciated as to whether the variant problem exists in the IP realm.
May I defer it to the chair or to the moderator of this list to respond regarding how a group-wide webinar can be implemented.
Thanks, Vladimir Shadrunov
On 3 July 2011 15:20, Andrzej Bartosiewicz <andrzej@yonita.com> wrote: On 6/30/2011 4:45 PM, Vladimir Shadrunov wrote:
3. TLD registries - variants on the second level
Happy to share my experience with variants on TLD level.
Best, Andrzej
-- Dr. Andrzej Bartosiewicz phone (US): +1 650 2493707 phone (Poland): +48 518 235209
Dear all, the IDN variant project team appreciates your contribution. Based on the request, we will organize a webinar. Here is a draft agenda: 1. An overview of existing TLD programs related to variants by ICANN staff [15 minutes] 2. Experience of variant handling in ccTLDs [50 minutes] Five presentations, 7 mins each + 15 mins Q&A * Fahd Batayneh, Iftikhar H. Shah, Behnam Esfahbod, Andrzej Bartosiewicz, Vladimir Shadrunov 4: Overview of DNS mechanisms and status from IETF standards. [7 mins] 5: Outline of RFC discussions about variant handling. [15 mins] 6. 3 min wrap up. Before we close the queue, are there any others willing to share their experiences? We will also proceed to arrange a time that speakers can make and announce the dates shortly. Given the global nature of our audience, we may have to run two exact webinars. Warm regards, IDN Variant Project Team On 7/4/11 2:46 PM, "Desiree Miloshevic" <dmiloshevic@afilias.info> wrote: On 4 Jul 2011, at 11:04, Vladimir Shadrunov wrote: Seems the idea has some support and we have a few people who are willing to share their knowledge and experience. Here's the list of those who responded on 1, 2 and 3. 1. Existing ICANN TLD programs (Fast Track, Applicant Guidebook) Oksana Prykhodko, Fahd Batayneh, iftakhar shah 2. IDN ccTLD registries - variants on the top level Fahd Batayneh, iftakhar shah 3. TLD registries - variants on the second level, Fahd Batayneh, Vladimir Shadrunov , Behnam Esfahbod, Andrzej Bartosiewicz +1 Desiree -- If someone active in the IETF community would be willing to give a presentation on how existing DNS standards and related RFC handle string similarity that would also be great. Additionally some feedback from the IP lawyers would be appreciated as to whether the variant problem exists in the IP realm. May I defer it to the chair or to the moderator of this list to respond regarding how a group-wide webinar can be implemented. Thanks, Vladimir Shadrunov On 3 July 2011 15:20, Andrzej Bartosiewicz <andrzej@yonita.com> wrote: On 6/30/2011 4:45 PM, Vladimir Shadrunov wrote: 3. TLD registries - variants on the second level Happy to share my experience with variants on TLD level. Best, Andrzej
Dear Steve, Here are my draft slides for webinar: http://yonita.com/ICANN-IDN-Variants-Bartosiewicz.pdf http://yonita.com/ICANN-IDN-Variants-Bartosiewicz.pptx Best, Andrzej On 7/15/2011 7:53 PM, Steve Sheng wrote:
Dear all, the IDN variant project team appreciates your contribution. Based on the request, we will organize a webinar.
Here is a draft agenda:
1. An overview of existing TLD programs related to variants by ICANN staff [15 minutes]
2. Experience of variant handling in ccTLDs [50 minutes]
Five presentations, 7 mins each + 15 mins Q&A
* Fahd Batayneh, Iftikhar H. Shah, Behnam Esfahbod, Andrzej Bartosiewicz, Vladimir Shadrunov
4: Overview of DNS mechanisms and status from IETF standards. [7 mins]
5: Outline of RFC discussions about variant handling. [15 mins]
6. 3 min wrap up.
Before we close the queue, are there any others willing to share their experiences?
We will also proceed to arrange a time that speakers can make and announce the dates shortly. Given the global nature of our audience, we may have to run two exact webinars.
Warm regards, IDN Variant Project Team
Nice work, Andrzej- You might want to expose the tæst1234.pl (xn--tst1234-mxa.pl) and taest1234.pl homograph potential in this which is something that has occurred since. https://bugzilla.mozilla.org/show_bug.cgi?id=618051#c12 -Jothan Jothan Frakes +1.206-355-0230 tel +1.206-201-6881 fax On Sun, Jul 24, 2011 at 1:43 PM, Andrzej Bartosiewicz <andrzej@yonita.com> wrote:
Dear Steve,
Here are my draft slides for webinar: http://yonita.com/ICANN-IDN-Variants-Bartosiewicz.pdf http://yonita.com/ICANN-IDN-Variants-Bartosiewicz.pptx
Best, Andrzej
On 7/15/2011 7:53 PM, Steve Sheng wrote:
Dear all, the IDN variant project team appreciates your contribution. Based on the request, we will organize a webinar.
Here is a draft agenda:
1. An overview of existing TLD programs related to variants by ICANN staff [15 minutes]
2. Experience of variant handling in ccTLDs [50 minutes]
Five presentations, 7 mins each + 15 mins Q&A
Fahd Batayneh, Iftikhar H. Shah, Behnam Esfahbod, Andrzej Bartosiewicz, Vladimir Shadrunov
4: Overview of DNS mechanisms and status from IETF standards. [7 mins]
5: Outline of RFC discussions about variant handling. [15 mins]
6. 3 min wrap up.
Before we close the queue, are there any others willing to share their experiences?
We will also proceed to arrange a time that speakers can make and announce the dates shortly. Given the global nature of our audience, we may have to run two exact webinars.
Warm regards, IDN Variant Project Team
On 7/24/2011 11:03 PM, Jothan Frakes wrote:
Nice work, Andrzej-
You might want to expose the tæst1234.pl (xn--tst1234-mxa.pl) and taest1234.pl homograph potential in this which is something that has occurred since.
Dear Jothan, I have also commented on the discussion @mozilla.org list. I'm really confused what is the strategy of Mozilla regarding "variants" or look-alike domains. I have no problmem with "æ" and "ae", as well as I have no problem with "O" and "0". It's insane to protect us against any similarities, which will lead to very strange and complicated policies. As I know, nobody in Europe has ever used maliciously the case of "æ" (which is allowed by many ccTLD), so maybe this is a dead-end to explore such cases by security experts? Maybe Mozilla and we should rather focus on real-life examples, not theoretical one? As I mentioned in Singapore, I would prefer discussion based on the list of existing "pairs" of look-alike / variant characters (or combination of characters), not the theoretical discussions of what is variant and what is not. If we create a list "pairs" (including example of U+00E6), we can go through the list and make recommendations. Maybe I'm wrong, but we can make our job much easier and more useful in practice if we follow the EXAMPLES, not DEFINITIONS. Andrzej -- *Dr. Andrzej Bartosiewicz*, CEO & President, Yonita Inc. <http://www.yonita.com> phone (US): +1 650 2493707 phone (Poland): +48 518 235209
Hi Andrzej- I saw that you responded to the mozilla ticket. Thank you for taking the time to do this. I understand that the visual ae issue or other ligature type combinations were not considered in the very good work that you did in the efforts with NASK to be a variant, and I have also heard from Denic about the manner in which a similar circumstance exists with the sharp s character. I think personally that I have heard compelling anecdotal descriptions that justify the case where there could and rightly should be two separate websites for two separate domains with two separate meanings. There are also crafty entrepreneurial participants on the internet that don't always have the best interest of the end user in mind who could leverage the visual similarity between the strings in a manner that is arguably bad for the end-user, either in a confusing manner or in some cases worse. There is always ongoing discussion about the evolution of 'doing the right thing' with Mozilla as far as the approach taken with addressing visual variations. The objective is to ensure the least end-user confusion. Without saying it is right or wrong how some software behaves in the presence of ligature or other visual variants that split one character into more than one, I think for the purposes and context of VIP simply exposing those as distinct variant types. Let's simply document this as a type of variant. This is was what the context and intent of my mention was. -Jothan Jothan Frakes +1.206-355-0230 tel +1.206-201-6881 fax On Mon, Jul 25, 2011 at 2:32 AM, Andrzej Bartosiewicz <andrzej@yonita.com> wrote:
On 7/24/2011 11:03 PM, Jothan Frakes wrote:
Nice work, Andrzej-
You might want to expose the tæst1234.pl (xn--tst1234-mxa.pl) and taest1234.pl homograph potential in this which is something that has occurred since.
https://bugzilla.mozilla.org/show_bug.cgi?id=618051#c12
Dear Jothan,
I have also commented on the discussion @mozilla.org list.
I'm really confused what is the strategy of Mozilla regarding "variants" or look-alike domains. I have no problmem with "æ" and "ae", as well as I have no problem with "O" and "0". It's insane to protect us against any similarities, which will lead to very strange and complicated policies.
As I know, nobody in Europe has ever used maliciously the case of "æ" (which is allowed by many ccTLD), so maybe this is a dead-end to explore such cases by security experts? Maybe Mozilla and we should rather focus on real-life examples, not theoretical one?
As I mentioned in Singapore, I would prefer discussion based on the list of existing "pairs" of look-alike / variant characters (or combination of characters), not the theoretical discussions of what is variant and what is not. If we create a list "pairs" (including example of U+00E6), we can go through the list and make recommendations.
Maybe I'm wrong, but we can make our job much easier and more useful in practice if we follow the EXAMPLES, not DEFINITIONS.
Andrzej
-- Dr. Andrzej Bartosiewicz, CEO & President, Yonita Inc. phone (US): +1 650 2493707 phone (Poland): +48 518 235209
Jothan, I fully agree that we should document "ae" as type of "variant" and think how to address it. I think that discussion that is taking place with Mozilla is a good example for us, what should we (ICANN) expect when new TLDs will be opened and variant-TLDs are going to be submitted. What I don't actually understand with Mozilla (and this is out of the scope of ICANN VIP - sorry to mention this here): why exactly the same policies (DENIC for .DE and ARNES for .SI) are treated by Mozilla in completely different way - the first one is "whitelisted" and the second one is "blacklisted". Maybe ICANN VIP experts can also take a look at this problem and help our colleagues from .SI ccTLD to solve the problem with Mozilla _*different treatment*_ of different TLDs with the _*same policies*_. Best, -- *Dr. Andrzej Bartosiewicz*, CEO & President, Yonita Inc. <http://www.yonita.com> phone (US): +1 650 2493707 phone (Poland): +48 518 235209 On 7/25/2011 8:40 PM, Jothan Frakes wrote:
I saw that you responded to the mozilla ticket. Thank you for taking the time to do this.
I understand that the visual ae issue or other ligature type combinations were not considered in the very good work that you did in the efforts with NASK to be a variant, and I have also heard from Denic about the manner in which a similar circumstance exists with the sharp s character.
I think personally that I have heard compelling anecdotal descriptions that justify the case where there could and rightly should be two separate websites for two separate domains with two separate meanings.
There are also crafty entrepreneurial participants on the internet that don't always have the best interest of the end user in mind who could leverage the visual similarity between the strings in a manner that is arguably bad for the end-user, either in a confusing manner or in some cases worse.
There is always ongoing discussion about the evolution of 'doing the right thing' with Mozilla as far as the approach taken with addressing visual variations. The objective is to ensure the least end-user confusion.
Without saying it is right or wrong how some software behaves in the presence of ligature or other visual variants that split one character into more than one, I think for the purposes and context of VIP simply exposing those as distinct variant types.
Let's simply document this as a type of variant.
This is was what the context and intent of my mention was.
Andzrej- I hear you, and I am glad that we're aligned on the documentation capture. I believe that the presence of a variant at the top level is different than at the second level (or deeper). Just to respond on the manner in which the Mozilla community curation / approval process works, there is a large developer community that works to put together. Changes under mozilla require multiple approvals, and the approval must come from someone other than the submitter or myself. -Jothan Jothan Frakes +1.206-355-0230 tel +1.206-201-6881 fax On Mon, Jul 25, 2011 at 11:56 AM, Andrzej Bartosiewicz <andrzej@yonita.com> wrote:
Jothan,
I fully agree that we should document "ae" as type of "variant" and think how to address it.
I think that discussion that is taking place with Mozilla is a good example for us, what should we (ICANN) expect when new TLDs will be opened and variant-TLDs are going to be submitted.
What I don't actually understand with Mozilla (and this is out of the scope of ICANN VIP - sorry to mention this here): why exactly the same policies (DENIC for .DE and ARNES for .SI) are treated by Mozilla in completely different way - the first one is "whitelisted" and the second one is "blacklisted". Maybe ICANN VIP experts can also take a look at this problem and help our colleagues from .SI ccTLD to solve the problem with Mozilla different treatment of different TLDs with the same policies.
Best, -- Dr. Andrzej Bartosiewicz, CEO & President, Yonita Inc. phone (US): +1 650 2493707 phone (Poland): +48 518 235209
On 7/25/2011 8:40 PM, Jothan Frakes wrote:
I saw that you responded to the mozilla ticket. Thank you for taking the time to do this.
I understand that the visual ae issue or other ligature type combinations were not considered in the very good work that you did in the efforts with NASK to be a variant, and I have also heard from Denic about the manner in which a similar circumstance exists with the sharp s character.
I think personally that I have heard compelling anecdotal descriptions that justify the case where there could and rightly should be two separate websites for two separate domains with two separate meanings.
There are also crafty entrepreneurial participants on the internet that don't always have the best interest of the end user in mind who could leverage the visual similarity between the strings in a manner that is arguably bad for the end-user, either in a confusing manner or in some cases worse.
There is always ongoing discussion about the evolution of 'doing the right thing' with Mozilla as far as the approach taken with addressing visual variations. The objective is to ensure the least end-user confusion.
Without saying it is right or wrong how some software behaves in the presence of ligature or other visual variants that split one character into more than one, I think for the purposes and context of VIP simply exposing those as distinct variant types.
Let's simply document this as a type of variant.
This is was what the context and intent of my mention was.
On 25 jul 2011, at 11.32, Andrzej Bartosiewicz wrote:
I have no problmem with "æ" and "ae"
I do, they are two different things if you are a Swedish speaking. "æ" and "ä" on the other hand should be treated as the same. Which is something completely different than "lookalike" of course. Just to show the confusion. Patrik
On 26 jul 2011, at 08.55, Patrik Fältström wrote:
On 25 jul 2011, at 11.32, Andrzej Bartosiewicz wrote:
I have no problmem with "æ" and "ae"
I do, they are two different things if you are a Swedish speaking.
"æ" and "ä" on the other hand should be treated as the same.
Which is something completely different than "lookalike" of course.
Just to show the confusion.
Let me add an explanation to the above. In English the "æ" is a ligature, and in some more languages. It is a separate letter and *not* a ligature in the Scandinavian languages that uses it. So whether something is a ligature, and because of that what is "the same" is context dependent. Patrik
At 09:02 26/07/2011, Patrik Fältström wrote:
On 26 jul 2011, at 08.55, Patrik Fältström wrote:
On 25 jul 2011, at 11.32, Andrzej Bartosiewicz wrote:
I have no problmem with "æ" and "ae"
I do, they are two different things if you are a Swedish speaking.
"æ" and "ä" on the other hand should be treated as the same.
Which is something completely different than "lookalike" of course.
Just to show the confusion.
Let me add an explanation to the above.
In English the "æ" is a ligature, and in some more languages. It is a separate letter and *not* a ligature in the Scandinavian languages that uses it.
So whether something is a ligature, and because of that what is "the same" is context dependent.
Patrik
In such a case, the solution seems to be to use a table where the visual geometric symbol æ can be freely used by people along their own orthotypography of their own language without caring about the ways other languages, cultures, typographies, orthotypographies. Either it is possible to bridge such a graphcode with unicode and we have to do it, or it is not and here is the problem we (VIP, PRECIS, IUCG, ...) have to address. Best jfc
On 26.07.11 11:14, JFC Morfin wrote:
In English the "æ" is a ligature, and in some more languages. It is a separate letter and *not* a ligature in the Scandinavian languages that uses it.
So whether something is a ligature, and because of that what is "the same" is context dependent.
Patrik
In such a case, the solution seems to be to use a table where the visual geometric symbol æ can be freely used by people along their own orthotypography of their own language without caring about the ways other languages, cultures, typographies, orthotypographies. Either it is possible to bridge such a graphcode with unicode and we have to do it, or it is not and here is the problem we (VIP, PRECIS, IUCG, ...) have to address.
First, let's agree that we discuss variants not because of computers (DNS, as such), but because of humans. It is humans that can declare something to be a variant or not. For computers and DNS in particular it is all different and computers and DNS, do not have any problem to resolve here. When we consider the way humans read, we need to consider the fact that humans do not match characters in a (Unicode? ;) table, but rather interpret what they see and switch context. If the majority of the text is Cyrillic, as indicated by the presence of unique Cyrillic-(only/mostly) characters, then the human considers that text 'Cyrillic' and the letter 'a' is therefore Cyrillic Small Letter A, and not Latin Small Letter A -- however identical those might look. Same for Greek and even much easier for the Arabic/Chinese scripts. One common argument that pops up in such discussions is that 'most of the world uses ASCII already' -- but let me remind the saying "It all looks Greek to me" (or "Graecum est; non legitur"). In the not so distant future, DNS will not be ASCII-mostly anymore. We need to base our work on that assumption, or it will be obsolete in just few years. Daniel
I am attaching a draft text that was initially intended for distribution to the Latin script study participants. In light of the discussion in progress here, it now seems more appropriate to post it to the broader group. /Cary
I had previously posted a support brief prepared for the Latin study to the general VIP list. I have now posted an update to that text which you will find at: http://mm.icann.org/pipermail/latin-vip/attachments/20110812/2b5bb0c7/latin-... The differences between the current version 1.1 and the previous 1.0 include the correction of errors in the specimen glyphs in the script table and changes to its scope, additional examples of types of variant relationships, and a number of clarifications and stylistic edits. It is my intention to prepare a final version 2.0 once all of the requirements that might be placed on it in the context of the Latin group's work have become clear, and to reflect other commentary received during the course of the project. /Cary
At 16:35 12/08/2011, Cary Karp wrote:
I had previously posted a support brief prepared for the Latin study to the general VIP list. I have now posted an update to that text which you will find at:
http://mm.icann.org/pipermail/latin-vip/attachments/20110812/2b5bb0c7/latin-...
Dear Carry, "The code points available for use in IDNs are all taken from the Unicode Character Code Charts. The Latin script is divided there into nine blocks.". This clearly gives the environment of your work. Thank you for it as it seems a very good reference. It shows that you definitly consider Unicode as a partner of the Variant debate. This means that the work I carried to consolidate all the post-IDNA2008 problems resolution vocabulary has to include the Unicode Glossary. I added it under http://iucg.org/wiki/IDNS_Common_Glossary. This is an fastitdious task that is nearly completed. It gives us all the vocabulary that we may need from the existing technologies and SDOs (I may still introduce some additional concepts from the IUTF side). However, I am afraid there are two generic prerequisites before considering "locale" problems (like for the Latin script). These two prerequisites are the use side architectural responses to orthotypography and homography requirements. For years of stubborn debate Unicode has shown they have no simple architectural response to any of them. In the Latin case there are two lack of response: (1) French majuscules and (2) Roman script letters confusions with Greek, Cyrillic, etc. scripts. IMHO this logically precludes Variants to be documented and supported through any Unicode related algorithm. Actually, the true difficulty is that by nature (one sign, one meaning): - the Unicode's offer is "Saussurian", i.e. a dyadic semiology; - while the whole digital ecosystem use demand (e.g. Variants) is "Piercian", i.e. triadic semiotics. (different signs, many people, several meanings or "denotata"). The very idea of "variant", implies a semiosis somewhere. This is why the way I see IDNA2008 (and its consequences) has not much to do with Unicode, except as an accidental [it can change] convenient pointer set to the signs we refer to when talking of scripts. For me IDNA2008 has to do with the DNS we had to preserve and the network simplicity to be supported along RFC 1958 recipes when we have to support semiosis in its four (semantic, syntax, pragmatic and multilinguistic) components. And IMHO we do not know (yet?) how to make it. This definitely is what Fast Track should have tested. I know scripts are not a IETF, ICANN, ITU, etc. area of expertise. This is however the area of the users' demands (e.g. variants). IDNA2008 has proposed what IETF could propose: a stable, reliable and Unicode independent proven basis with the DNS. Vint proposed that ICANN could take over, you accepted. Now you explain how you see that support, in the Latin locale case For example you write: "Various terms have been used to designate this concept of equivalence but none has yet been provided with a definition that is adequate in all the contexts where it is needed". I am afraid this is not correct : "Variants are to be people's and DNS equivalent, i.e. all of the variants must resolve the same IP address". This better defines what we want. Best. jfc
Daniel, we obviouly agree. However, this agreement IS _the_ problem we meet, because: 1. it concerns a much more general area than ICANN and the Internet. Actually the entire digital ecosystem. The only place I can imagine it belongs to is a multilingualisation oriented ISO 3166-4. I made called a meeting a few years ago between ISO 3166/MA, ISO/CS (HQ), AFNOR (French standards) who managed ISO/TC46 the ISO 3166/MA belongs to, BSI (British Standards) who managed ISO/TC37 responsible for ISO 639 (names of languages), ICANN (they were two to come to Paris) and me. The point was that the BSI was introducing an ISO 3166-4 NWIP (new work intemproposition) that would have "Anglo-saxonICANNized" the digital ecosystem. As a Something I obviously opposed. It resulted in an international vote concerning this proposition. It was only actively supported by ... UK, Ireland and after the limit date, ANSI (USA) and therefore failed. Since the same persons are still involed who pushed out David Dalby the initiator of ISO 639-6 the needed counterpart of ISO 3166-1 to form a stable 3166-4 document, ISO is probably still not the place for 3166-4. So, I reserved 3166-4.org for the IUCG to host that work. To prepare such a document I introduced the ccTAG concept: http://tools.ietf.org/html/draft-mltf-jfcm-cctags-00. In line with the ISO 3166-1:2006 which introduced the basis for multilinguisation in norms and standards, lead to my IDNA2008 position and consensus, and should lead to the core of the Intersem (Semiotic and semantic Internet) I try to pioneer. 2. Eventually, our agreement will have to technically support in the simplest and broadest way (RFC 3439) a common digital basis for every subsidiary use oriented further development. IMHO, if we do not consider these two points in strict parallel, we will fail. Best jfc At 10:32 26/07/2011, Daniel Kalchev wrote:
On 26.07.11 11:14, JFC Morfin wrote:
In English the "æ" is a ligature, and in some more languages. It is a separate letter and *not* a ligature in the Scandinavian languages that uses it.
So whether something is a ligature, and because of that what is "the same" is context dependent.
Patrik
In such a case, the solution seems to be to use a table where the visual geometric symbol æ can be freely used by people along their own orthotypography of their own language without caring about the ways other languages, cultures, typographies, orthotypographies. Either it is possible to bridge such a graphcode with unicode and we have to do it, or it is not and here is the problem we (VIP, PRECIS, IUCG, ...) have to address.
First, let's agree that we discuss variants not because of computers (DNS, as such), but because of humans. It is humans that can declare something to be a variant or not. For computers and DNS in particular it is all different and computers and DNS, do not have any problem to resolve here.
When we consider the way humans read, we need to consider the fact that humans do not match characters in a (Unicode? ;) table, but rather interpret what they see and switch context. If the majority of the text is Cyrillic, as indicated by the presence of unique Cyrillic-(only/mostly) characters, then the human considers that text 'Cyrillic' and the letter 'a' is therefore Cyrillic Small Letter A, and not Latin Small Letter A -- however identical those might look. Same for Greek and even much easier for the Arabic/Chinese scripts.
One common argument that pops up in such discussions is that 'most of the world uses ASCII already' -- but let me remind the saying "It all looks Greek to me" (or "Graecum est; non legitur"). In the not so distant future, DNS will not be ASCII-mostly anymore. We need to base our work on that assumption, or it will be obsolete in just few years.
Daniel
In English the "æ" is a ligature, and in some more languages. It is a separate letter and *not* a ligature in the Scandinavian languages that uses it.
The Unicode name for it is LATIN SMALL LETTER AE. It was called LATIN SMALL LIGATURE AE in version 1.0 but that was changed to LETTER in the next version. (It's now at 6.0.) /Cary
Sorry, I can not open the web site yet. Zhang Zhoucai (Joe) Chinese Case Study Team Member ICANN/IDN Variant Project skype: joezhang43 MSN: joezhang43@hotmail.com http://Hanzi.unihan.com.cn 发件人: vip-bounces@icann.org [mailto:vip-bounces@icann.org] 代表 Andrzej Bartosiewicz 发送时间: 2011年7月25日 4:43 收件人: vip@icann.org; Steve Sheng; Vladimir Shadrunov 主题: Re: [vip] Educational session on existing variant practices Dear Steve, Here are my draft slides for webinar: http://yonita.com/ICANN-IDN-Variants-Bartosiewicz.pdf http://yonita.com/ICANN-IDN-Variants-Bartosiewicz.pptx Best, Andrzej On 7/15/2011 7:53 PM, Steve Sheng wrote: Dear all, the IDN variant project team appreciates your contribution. Based on the request, we will organize a webinar. Here is a draft agenda: 1. An overview of existing TLD programs related to variants by ICANN staff [15 minutes] 2. Experience of variant handling in ccTLDs [50 minutes] Five presentations, 7 mins each + 15 mins Q&A * Fahd Batayneh, Iftikhar H. Shah, Behnam Esfahbod, Andrzej Bartosiewicz, Vladimir Shadrunov 4: Overview of DNS mechanisms and status from IETF standards. [7 mins] 5: Outline of RFC discussions about variant handling. [15 mins] 6. 3 min wrap up. Before we close the queue, are there any others willing to share their experiences? We will also proceed to arrange a time that speakers can make and announce the dates shortly. Given the global nature of our audience, we may have to run two exact webinars. Warm regards, IDN Variant Project Team
Tried three times, finally, I got it. Thanks, Zhang Zhoucai (Joe) Chinese Case Study Team Member ICANN/IDN Variant Project skype: joezhang43 MSN: joezhang43@hotmail.com http://Hanzi.unihan.com.cn 发件人: vip-bounces@icann.org [mailto:vip-bounces@icann.org] 代表 Andrzej Bartosiewicz 发送时间: 2011年7月25日 4:43 收件人: vip@icann.org; Steve Sheng; Vladimir Shadrunov 主题: Re: [vip] Educational session on existing variant practices Dear Steve, Here are my draft slides for webinar: http://yonita.com/ICANN-IDN-Variants-Bartosiewicz.pdf http://yonita.com/ICANN-IDN-Variants-Bartosiewicz.pptx Best, Andrzej On 7/15/2011 7:53 PM, Steve Sheng wrote: Dear all, the IDN variant project team appreciates your contribution. Based on the request, we will organize a webinar. Here is a draft agenda: 1. An overview of existing TLD programs related to variants by ICANN staff [15 minutes] 2. Experience of variant handling in ccTLDs [50 minutes] Five presentations, 7 mins each + 15 mins Q&A * Fahd Batayneh, Iftikhar H. Shah, Behnam Esfahbod, Andrzej Bartosiewicz, Vladimir Shadrunov 4: Overview of DNS mechanisms and status from IETF standards. [7 mins] 5: Outline of RFC discussions about variant handling. [15 mins] 6. 3 min wrap up. Before we close the queue, are there any others willing to share their experiences? We will also proceed to arrange a time that speakers can make and announce the dates shortly. Given the global nature of our audience, we may have to run two exact webinars. Warm regards, IDN Variant Project Team
participants (17)
-
Andrew Sullivan -
Andrzej Bartosiewicz -
Behnam Esfahbod -
Cary Karp -
Daniel Kalchev -
Desiree Miloshevic -
Fahd Batayneh -
iftakhar shah -
Jaap Akkerhuis -
JFC Morfin -
Jothan Frakes -
Oksana Prykhodko -
Patrik Fältström -
Steve Sheng -
Vaggelis Segredakis -
Vladimir Shadrunov -
Zhang Joe