Variant cross-script analysis worksheets
Dear Latin GP, The worksheets for cross-script analysis are ready, as follows: * Cyrillic: https://docs.google.com/spreadsheets/d/1_AvaG3vgNsozyNs1Cj68HivcWmV7HRwwXz4k... * Greek: https://docs.google.com/spreadsheets/d/1EK9HdI64HXvhGP8XkV9ZSZhePVRWPBxBDXId... * Armenian: https://docs.google.com/spreadsheets/d/1SpC8hHecvohEeBveFBQz0Fw7wQAxE9HG9Bu-... I recommend we start with Cyrillic and Greek, as these can be worked on Google Docs directly. Armenian script support is limited to one font, so we need to decide whether to continue doing the analysis in Google Docs (preferred method because of collaboration features) or do it offline in Excel and then merge the different inputs (more work but doable, I think). CALL TO ACTION: Please respond to this email if you can help do the cross-script analysis. You can sign-up for one or for all, let me know. The mechanics is very simple: 1. Each code point needs to be inspected twice by different people. A score needs to be assigned for each font type (Times, Arial and Courier). At the end of each inspection there should be 3 distinct score inputs (Column J for first inspection or Column L for second inspection). 2. The Scoring System is defined as follows: * 1 = Identical = the two glyphs are identical in design * 2 = Nearly Identical = the two glyphs are not exactly identical, but the difference may be attributed to font design (Example: Latin “r” and Cyrillic “г”) * 3 = Distinguishable = the two glyphs are similar but distinguishable from each other * 4 = Different = self-explanatory * 5 = N/A = this means there is no variant candidate found in Latin LGR If you want to volunteer for this work please answer to this email by no later than noon Friday 27 Easter Time. I will do my best to leave the assignments before I go offline for a few days. Thank you, Dennis
Dear Dennis, I would like to contribute to all. I'm OK to take a look at all three if that's helpful. Thanks, Meikal On 25. Apr 2018 at 23:17, Tan Tanaka, Dennis via Latingp <latingp@icann.org> wrote: Dear Latin GP, The worksheets for cross-script analysis are ready, as follows: - Cyrillic: https://docs.google.com/spreadsheets/d/1_AvaG3vgNsozyNs1Cj68HivcWmV7HRwwXz4k... - Greek: https://docs.google.com/spreadsheets/d/1EK9HdI64HXvhGP8XkV9ZSZhePVRWPBxBDXId... - Armenian: https://docs.google.com/spreadsheets/d/1SpC8hHecvohEeBveFBQz0Fw7wQAxE9HG9Bu-... I recommend we start with Cyrillic and Greek, as these can be worked on Google Docs directly. Armenian script support is limited to one font, so we need to decide whether to continue doing the analysis in Google Docs (preferred method because of collaboration features) or do it offline in Excel and then merge the different inputs (more work but doable, I think). *CALL TO ACTION*: Please respond to this email if you can help do the cross-script analysis. You can sign-up for one or for all, let me know. The mechanics is very simple: 1. Each code point needs to be inspected twice by different people. A score needs to be assigned for each font type (Times, Arial and Courier). At the end of each inspection there should be 3 distinct score inputs (Column J for first inspection or Column L for second inspection). 2. The Scoring System is defined as follows: - 1 = Identical = the two glyphs are identical in design - 2 = Nearly Identical = the two glyphs are not exactly identical, but the difference may be attributed to font design (Example: Latin “r” and Cyrillic “г”) - 3 = Distinguishable = the two glyphs are similar but distinguishable from each other - 4 = Different = self-explanatory - 5 = N/A = this means there is no variant candidate found in Latin LGR If you want to volunteer for this work please answer to this email by no later than noon Friday 27 Easter Time. I will do my best to leave the assignments before I go offline for a few days. Thank you, Dennis _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp
Last call for volunteers … So far I have Michael and Meikal. From: Meikal Mumin <meikal@mumin.de> Date: Thursday, April 26, 2018 at 3:03 AM To: Dennis Tan Tanaka <dtantanaka@verisign.com> Cc: Ahmed Bakhat via Latingp <latingp@icann.org> Subject: [EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets Dear Dennis, I would like to contribute to all. I'm OK to take a look at all three if that's helpful. Thanks, Meikal On 25. Apr 2018 at 23:17, Tan Tanaka, Dennis via Latingp <latingp@icann.org<mailto:latingp@icann.org>> wrote: Dear Latin GP, The worksheets for cross-script analysis are ready, as follows: · Cyrillic: https://docs.google.com/spreadsheets/d/1_AvaG3vgNsozyNs1Cj68HivcWmV7HRwwXz4k... · Greek: https://docs.google.com/spreadsheets/d/1EK9HdI64HXvhGP8XkV9ZSZhePVRWPBxBDXId... · Armenian: https://docs.google.com/spreadsheets/d/1SpC8hHecvohEeBveFBQz0Fw7wQAxE9HG9Bu-... I recommend we start with Cyrillic and Greek, as these can be worked on Google Docs directly. Armenian script support is limited to one font, so we need to decide whether to continue doing the analysis in Google Docs (preferred method because of collaboration features) or do it offline in Excel and then merge the different inputs (more work but doable, I think). CALL TO ACTION: Please respond to this email if you can help do the cross-script analysis. You can sign-up for one or for all, let me know. The mechanics is very simple: 1. Each code point needs to be inspected twice by different people. A score needs to be assigned for each font type (Times, Arial and Courier). At the end of each inspection there should be 3 distinct score inputs (Column J for first inspection or Column L for second inspection). 2. The Scoring System is defined as follows: · 1 = Identical = the two glyphs are identical in design · 2 = Nearly Identical = the two glyphs are not exactly identical, but the difference may be attributed to font design (Example: Latin “r” and Cyrillic “г”) · 3 = Distinguishable = the two glyphs are similar but distinguishable from each other · 4 = Different = self-explanatory · 5 = N/A = this means there is no variant candidate found in Latin LGR If you want to volunteer for this work please answer to this email by no later than noon Friday 27 Easter Time. I will do my best to leave the assignments before I go offline for a few days. Thank you, Dennis _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp
I missed that you were looking for volunteers including those of us already on the Variants group. Sorry Of course I will be glad to help Bill Sent from Yahoo Mail on Android On Fri, Apr 27, 2018 at 11:03 AM, Tan Tanaka, Dennis via Latingp<latingp@icann.org> wrote: _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp
Thanks Bill. I didn’t want to assume you all will be available. Others? Going once, going twice.... Sent from my iPhone On Apr 27, 2018, at 2:07 PM, Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> wrote: I missed that you were looking for volunteers including those of us already on the Variants group. Sorry Of course I will be glad to help Bill Sent from Yahoo Mail on Android<https://go.onelink.me/107872968?pid=InProduct&c=Global_Internal_YGrowth_Andr...> On Fri, Apr 27, 2018 at 11:03 AM, Tan Tanaka, Dennis via Latingp <latingp@icann.org<mailto:latingp@icann.org>> wrote: _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp <Untitled>
Bill, Meikal and Michael Now you can see the assignments in columns K and M. Your input should be recorded in Columns J and L per the scoring system and criteria. Cyrillic: https://docs.google.com/spreadsheets/d/1_AvaG3vgNsozyNs1Cj68HivcWmV7HRwwXz4k... Greek: https://docs.google.com/spreadsheets/d/1EK9HdI64HXvhGP8XkV9ZSZhePVRWPBxBDXId... Armenian: https://docs.google.com/spreadsheets/d/1SpC8hHecvohEeBveFBQz0Fw7wQAxE9HG9Bu-... As a first step, can I ask you to do FIRST INSPECTION ONLY. That means, fill out Column J only by Monday May 14. I will have us meet (if there is no Panel meeting) later that week as a check-point. Work out bugs, feedback, etc and then move one to second inspection. Any questions please let me know. -Dennis From: Latingp <latingp-bounces@icann.org> on behalf of "Tan Tanaka, Dennis via Latingp" <latingp@icann.org> Reply-To: Dennis Tan Tanaka <dtantanaka@verisign.com> Date: Friday, April 27, 2018 at 2:09 PM To: "bill.jouris@insidethestack.com" <bill.jouris@insidethestack.com> Cc: Meikal Mumin <meikal@mumin.de>, "Tan Tanaka, Dennis via Latingp" <latingp@icann.org> Subject: [EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets Thanks Bill. I didn’t want to assume you all will be available. Others? Going once, going twice.... Sent from my iPhone On Apr 27, 2018, at 2:07 PM, Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> wrote: I missed that you were looking for volunteers including those of us already on the Variants group. Sorry Of course I will be glad to help Bill Sent from Yahoo Mail on Android<https://go.onelink.me/107872968?pid=InProduct&c=Global_Internal_YGrowth_Andr...> On Fri, Apr 27, 2018 at 11:03 AM, Tan Tanaka, Dennis via Latingp <latingp@icann.org<mailto:latingp@icann.org>> wrote: _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp <Untitled>
Dear colleagues, I completed my first and second pass reviews of the three scripts, with the exception of such cases where I found further possible variant cases, which I pointed out in comments and which we should consider IMHO. Dennis has started looking at some of those comments, but based on some of these answers it seems that there is also general question - at least on my part - that is if a single code point can be in a variant relationship with a sequence of code points or not? I was under the impression that the answer is yes... Best, Meikal On 27 April 2018 at 22:08, Tan Tanaka, Dennis <dtantanaka@verisign.com> wrote:
Bill, Meikal and Michael
Now you can see the assignments in columns K and M. Your input should be recorded in Columns J and L per the scoring system and criteria.
Cyrillic: https://docs.google.com/spreadsheets/d/1_ AvaG3vgNsozyNs1Cj68HivcWmV7HRwwXz4k9sN9sGs/edit?usp=sharing
Greek: https://docs.google.com/spreadsheets/d/ 1EK9HdI64HXvhGP8XkV9ZSZhePVRWPBxBDXIdLB220zk/edit?usp=sharing
Armenian: https://docs.google.com/spreadsheets/d/ 1SpC8hHecvohEeBveFBQz0Fw7wQAxE9HG9Bu-nwVuL74/edit?usp=sharing
As a first step, can I ask you to do FIRST INSPECTION ONLY. That means, fill out Column J only by Monday May 14. I will have us meet (if there is no Panel meeting) later that week as a check-point. Work out bugs, feedback, etc and then move one to second inspection.
Any questions please let me know.
-Dennis
*From: *Latingp <latingp-bounces@icann.org> on behalf of "Tan Tanaka, Dennis via Latingp" <latingp@icann.org> *Reply-To: *Dennis Tan Tanaka <dtantanaka@verisign.com> *Date: *Friday, April 27, 2018 at 2:09 PM *To: *"bill.jouris@insidethestack.com" <bill.jouris@insidethestack.com> *Cc: *Meikal Mumin <meikal@mumin.de>, "Tan Tanaka, Dennis via Latingp" < latingp@icann.org> *Subject: *[EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets
Thanks Bill. I didn’t want to assume you all will be available.
Others? Going once, going twice....
Sent from my iPhone
On Apr 27, 2018, at 2:07 PM, Bill Jouris <bill.jouris@insidethestack.com> wrote:
I missed that you were looking for volunteers including those of us already on the Variants group. Sorry
Of course I will be glad to help
Bill
Sent from Yahoo Mail on Android <https://go.onelink.me/107872968?pid=InProduct&c=Global_Internal_YGrowth_Andr...>
On Fri, Apr 27, 2018 at 11:03 AM, Tan Tanaka, Dennis via Latingp
<latingp@icann.org> wrote:
_______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp
<Untitled>
a single code point can be in a variant relationship with a sequence of code points or not? I was under the impression that the answer is yes... I would agree with you, only if such sequence is considered as a “single” code point in the repertoire, otherwise we are comparing a code point against a string (i.e. label)
Well first of all a string does not equal a label, since there are further restrictions on the latter - I think a minimum length of 3 characters in the case of A-labels is one. Secondly, in the case of Arabic LGR, we defined 16 different sequences which cannot co-occur in labels, only that we did that in the form of WLEs rather than variants. But I seem to remember a discussion among the GP and in between the GP and IP, where IP explained that such confusabilities can be dealt with either in the form of variant rules or whole label evaluation rules. Anyhow RFC 8288 gives a case example for variants of sequences exactly parallel to some of those cross-script variant candidates I was suggesting: 17 <https://tools.ietf.org/html/rfc8228#section-17>. Variants for Sequences Variant mappings can be defined between sequences or between a code point and a sequence. For example, one might define a "blocked" variant between the sequence "rn" and the code point "m" because they are practically indistinguishable in common UI fonts. Since we are discussing cross-script variants, I don't think WLEs will be able to control them, which would mean that we must deal with such confusable characters or sequences of characters in the context of variants, and therefore come up with a stronger criterion than"not a "single" code point in the repertoire" On 18 May 2018 at 14:22, Tan Tanaka, Dennis <dtantanaka@verisign.com> wrote:
a single code point can be in a variant relationship with a sequence of code points or not? I was under the impression that the answer is yes...
I would agree with you, only if such sequence is considered as a “single” code point in the repertoire, otherwise we are comparing a code point against a string (i.e. label)
we must deal with such confusable characters or sequences of characters in the context of variants No, we don’t. Confusability is not in scope. We established the Latin panel will deal with homoglyphs or nearly homoglyphs (i.e. font variation) in the context of cross-scripts.
It is pretty clear, if one reads the MSR-3 document, that we are supposed to deal with Variants. Which include, but are NOT limited to, homoglyphs. Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 925-855-9512 (direct) From: "Tan Tanaka, Dennis" <dtantanaka@verisign.com> To: Meikal Mumin <meikal@mumin.de> Cc: "bill.jouris@insidethestack.com" <bill.jouris@insidethestack.com>; Michael Bauland <Michael.Bauland@knipp.de>; "Tan Tanaka, Dennis via Latingp" <latingp@icann.org> Sent: Friday, May 18, 2018 10:20 AM Subject: Re: [Latingp] Variant cross-script analysis worksheets #yiv5088625704 #yiv5088625704 -- _filtered #yiv5088625704 {panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv5088625704 {font-family:DengXian;panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered #yiv5088625704 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv5088625704 {font-family:-webkit-standard;panose-1:2 11 6 4 2 2 2 2 2 4;} _filtered #yiv5088625704 {panose-1:2 1 6 0 3 1 1 1 1 1;}#yiv5088625704 #yiv5088625704 p.yiv5088625704MsoNormal, #yiv5088625704 li.yiv5088625704MsoNormal, #yiv5088625704 div.yiv5088625704MsoNormal {margin:0in;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv5088625704 a:link, #yiv5088625704 span.yiv5088625704MsoHyperlink {color:#0563C1;text-decoration:underline;}#yiv5088625704 a:visited, #yiv5088625704 span.yiv5088625704MsoHyperlinkFollowed {color:#954F72;text-decoration:underline;}#yiv5088625704 p.yiv5088625704msonormal0, #yiv5088625704 li.yiv5088625704msonormal0, #yiv5088625704 div.yiv5088625704msonormal0 {margin-right:0in;margin-left:0in;font-size:11.0pt;font-family:sans-serif;}#yiv5088625704 span.yiv5088625704EmailStyle18 {font-family:sans-serif;color:windowtext;}#yiv5088625704 .yiv5088625704MsoChpDefault {font-size:10.0pt;} _filtered #yiv5088625704 {margin:1.0in 1.0in 1.0in 1.0in;}#yiv5088625704 div.yiv5088625704WordSection1 {}#yiv5088625704 we must deal with such confusable characters or sequences of characters in the context of variants No, we don’t. Confusability is not in scope. We established the Latin panel will deal with homoglyphs or nearly homoglyphs (i.e. font variation) in the context of cross-scripts.
I believe we delimited the scope of variants for the Latin script in the face to face meeting in Brussels, did we not? From: Bill Jouris <bill.jouris@insidethestack.com> Reply-To: Bill Jouris <bill.jouris@insidethestack.com> Date: Friday, May 18, 2018 at 2:18 PM To: Dennis Tan Tanaka <dtantanaka@verisign.com>, Meikal Mumin <meikal@mumin.de> Cc: Michael Bauland <Michael.Bauland@knipp.de>, "Tan Tanaka, Dennis via Latingp" <latingp@icann.org> Subject: [EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets It is pretty clear, if one reads the MSR-3 document, that we are supposed to deal with Variants. Which include, but are NOT limited to, homoglyphs. Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 925-855-9512 (direct) ________________________________ From: "Tan Tanaka, Dennis" <dtantanaka@verisign.com> To: Meikal Mumin <meikal@mumin.de> Cc: "bill.jouris@insidethestack.com" <bill.jouris@insidethestack.com>; Michael Bauland <Michael.Bauland@knipp.de>; "Tan Tanaka, Dennis via Latingp" <latingp@icann.org> Sent: Friday, May 18, 2018 10:20 AM Subject: Re: [Latingp] Variant cross-script analysis worksheets we must deal with such confusable characters or sequences of characters in the context of variants No, we don’t. Confusability is not in scope. We established the Latin panel will deal with homoglyphs or nearly homoglyphs (i.e. font variation) in the context of cross-scripts.
It's been clear for some time, even before Brussels, that you think we should only look at homoglyphs. (Also that you don't think that there are any in-script homoglyphs. See the discussion about the schwa and the turned e.) But there is a world of difference between agreeing, and merely deciding not to waste time arguing with a closed mind. Which, for me, is what happened in the discussion in Brussels. Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 925-855-9512 (direct) From: "Tan Tanaka, Dennis" <dtantanaka@verisign.com> To: Bill Jouris <bill.jouris@insidethestack.com>; Meikal Mumin <meikal@mumin.de> Cc: Michael Bauland <Michael.Bauland@knipp.de>; "Tan Tanaka, Dennis via Latingp" <latingp@icann.org> Sent: Friday, May 18, 2018 1:43 PM Subject: Re: [Latingp] Variant cross-script analysis worksheets #yiv8870410561 #yiv8870410561 -- _filtered #yiv8870410561 {panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv8870410561 {font-family:DengXian;panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered #yiv8870410561 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv8870410561 {panose-1:2 0 5 3 0 0 0 2 0 4;} _filtered #yiv8870410561 {panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered #yiv8870410561 {font-family:HelveticaNeue;panose-1:2 0 5 3 0 0 0 2 0 4;}#yiv8870410561 #yiv8870410561 p.yiv8870410561MsoNormal, #yiv8870410561 li.yiv8870410561MsoNormal, #yiv8870410561 div.yiv8870410561MsoNormal {margin:0in;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv8870410561 a:link, #yiv8870410561 span.yiv8870410561MsoHyperlink {color:blue;text-decoration:underline;}#yiv8870410561 a:visited, #yiv8870410561 span.yiv8870410561MsoHyperlinkFollowed {color:purple;text-decoration:underline;}#yiv8870410561 p.yiv8870410561msonormal0, #yiv8870410561 li.yiv8870410561msonormal0, #yiv8870410561 div.yiv8870410561msonormal0 {margin-right:0in;margin-left:0in;font-size:11.0pt;font-family:sans-serif;}#yiv8870410561 p.yiv8870410561msonormal0, #yiv8870410561 li.yiv8870410561msonormal0, #yiv8870410561 div.yiv8870410561msonormal0 {margin-right:0in;margin-left:0in;font-size:11.0pt;font-family:sans-serif;}#yiv8870410561 p.yiv8870410561msonormal, #yiv8870410561 li.yiv8870410561msonormal, #yiv8870410561 div.yiv8870410561msonormal {margin-right:0in;margin-left:0in;font-size:11.0pt;font-family:sans-serif;}#yiv8870410561 p.yiv8870410561msochpdefault, #yiv8870410561 li.yiv8870410561msochpdefault, #yiv8870410561 div.yiv8870410561msochpdefault {margin-right:0in;margin-left:0in;font-size:11.0pt;font-family:sans-serif;}#yiv8870410561 span.yiv8870410561msohyperlink {}#yiv8870410561 span.yiv8870410561msohyperlinkfollowed {}#yiv8870410561 span.yiv8870410561emailstyle18 {}#yiv8870410561 p.yiv8870410561msonormal1, #yiv8870410561 li.yiv8870410561msonormal1, #yiv8870410561 div.yiv8870410561msonormal1 {margin:0in;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv8870410561 span.yiv8870410561msohyperlink1 {color:#0563C1;text-decoration:underline;}#yiv8870410561 span.yiv8870410561msohyperlinkfollowed1 {color:#954F72;text-decoration:underline;}#yiv8870410561 p.yiv8870410561msonormal01, #yiv8870410561 li.yiv8870410561msonormal01, #yiv8870410561 div.yiv8870410561msonormal01 {margin-right:0in;margin-left:0in;font-size:11.0pt;font-family:sans-serif;}#yiv8870410561 span.yiv8870410561emailstyle181 {font-family:sans-serif;color:windowtext;}#yiv8870410561 p.yiv8870410561msochpdefault1, #yiv8870410561 li.yiv8870410561msochpdefault1, #yiv8870410561 div.yiv8870410561msochpdefault1 {margin-right:0in;margin-left:0in;font-size:10.0pt;font-family:sans-serif;}#yiv8870410561 span.yiv8870410561EmailStyle30 {font-family:sans-serif;color:windowtext;}#yiv8870410561 .yiv8870410561MsoChpDefault {font-size:10.0pt;} _filtered #yiv8870410561 {margin:1.0in 1.0in 1.0in 1.0in;}#yiv8870410561 div.yiv8870410561WordSection1 {}#yiv8870410561 I believe we delimited the scope of variants for the Latin script in the face to face meeting in Brussels, did we not? From:Bill Jouris <bill.jouris@insidethestack.com> Reply-To: Bill Jouris <bill.jouris@insidethestack.com> Date: Friday, May 18, 2018 at 2:18 PM To: Dennis Tan Tanaka <dtantanaka@verisign.com>, Meikal Mumin <meikal@mumin.de> Cc: Michael Bauland <Michael.Bauland@knipp.de>, "Tan Tanaka, Dennis via Latingp" <latingp@icann.org> Subject: [EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets It is pretty clear, if one reads the MSR-3 document, that we are supposed to deal withVariants. Which include, but are NOT limited to, homoglyphs. Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 925-855-9512 (direct) From: "Tan Tanaka, Dennis" <dtantanaka@verisign.com> To: Meikal Mumin <meikal@mumin.de> Cc: "bill.jouris@insidethestack.com" <bill.jouris@insidethestack.com>; Michael Bauland <Michael.Bauland@knipp.de>; "Tan Tanaka, Dennis via Latingp" <latingp@icann.org> Sent: Friday, May 18, 2018 10:20 AM Subject: Re: [Latingp] Variant cross-script analysis worksheets we must deal with such confusable characters or sequences of characters in the context of variants No, we don’t. Confusability is not in scope. We established the Latin panel will deal with homoglyphs or nearly homoglyphs (i.e. font variation) in the context of cross-scripts.
Dear All, This is indeed a complex matter to address, and is therefore requiring this continued discussion. It may also be useful here to refer back to the RZ-LGR Procedure <https://www.icann.org/en/system/files/files/lgr-procedure-20mar13-en.pdf> . The RZ-LGR Procedure, while defining “IDN variants” says that: * “An IDN variant, as understood here, is an alternate code point (or sequence of code points) that could be substituted for a code point (or sequence of code points) in a candidate label to create a variant label that is considered the “same” in some measure by a given community of Internet users.” However, the Procedure also acknowledges immediately following the definition that: * “There is not general agreement of what that sameness requires, and many of the things people seem to want from that sameness are not technically achievable.” While noting the benefits of defining IDN variants, the procedure also acknowledges the limitations. * “The primary benefit of the LGR process is as a mechanism that delivers hands-off evaluation for these aspects. * “By doing so, the process may not be able to replace case-by-case analysis altogether: there will still be a role for additional types of review, such as for String Similarity, and which are not included in the LGR process.” So, not all matters can be settled in the LGR. A line has to be drawn between “same” and “similar”. The LGR Procedure does note what is desirable to be in the scope to LGR: * “the LGR process is designed to clear the table of all the straightforward, non-subjective cases, mainly by returning a “blocked” disposition. * “Even for variants based on visual similarity, there exists a subset of evaluation rules that could be applied in an automated manner, obviating the need for further case-by case or even contextual review.” But notes that this should not go too far into the string similarity discussion: * “While the process described here could be expanded to address cases of visual similarity, that is not the primary intention” * “Finally, in investigating the possible variant relations, Generation Panels should ignore cases where the relation is based exclusively on aspects of visual similarity.” One could infer from these statements in the RZ-LGR Procedure that: 1. If two code points are considered “same” by the user community, these should be included as IDN variants (this is not limited to visual similarity, but could also include semantic equivalence, like in Chinese, orthographic conventions or spelling simplification, like in Arabic, homophonic relations, like in Ethiopic, etc., as determined the respective script community) 2. The “straightforward, non-subjective cases” of visual similarity could be included as IDN variants and blocked 3. Beyond these, the analysis goes into the realm of string similarity review, which is beyond the intention of the LGR Generation Panels have been asked to draw the line based on these guidelines provided in the RZ-LGR Procedure. For example, Cyrillic GP agreed to consider homoglyph relations with other related scripts for this purpose. Neo-Brahmi GP has used a slightly different technique, where it considers cross-script variants those code points which members of both scripts in question find such code points “indistinguishable” even if these are not homoglyphs (see the blog <https://www.icann.org/news/blog/the-south-asian-eleven-progress-on-supportin...> for some more details). Of course, the Latin GP also needs to draw these lines for the analysis for identifying within-script and cross-script IDN variant cases. Regards, Sarmad From: Latingp [mailto:latingp-bounces@icann.org] On Behalf Of Bill Jouris Sent: Saturday, May 19, 2018 5:28 AM To: Tan Tanaka, Dennis <dtantanaka@verisign.com>; Meikal Mumin <meikal@mumin.de> Cc: Tan Tanaka, Dennis via Latingp <latingp@icann.org> Subject: Re: [Latingp] Variant cross-script analysis worksheets It's been clear for some time, even before Brussels, that you think we should only look at homoglyphs. (Also that you don't think that there are any in-script homoglyphs. See the discussion about the schwa and the turned e.) But there is a world of difference between agreeing, and merely deciding not to waste time arguing with a closed mind. Which, for me, is what happened in the discussion in Brussels. Bill Jouris Inside Products bill.jouris@insidethestack.com <mailto:bill.jouris@insidethestack.com> 831-659-8360 925-855-9512 (direct) _____ From: "Tan Tanaka, Dennis" <dtantanaka@verisign.com <mailto:dtantanaka@verisign.com> > To: Bill Jouris <bill.jouris@insidethestack.com <mailto:bill.jouris@insidethestack.com> >; Meikal Mumin <meikal@mumin.de <mailto:meikal@mumin.de> > Cc: Michael Bauland <Michael.Bauland@knipp.de <mailto:Michael.Bauland@knipp.de> >; "Tan Tanaka, Dennis via Latingp" <latingp@icann.org <mailto:latingp@icann.org> > Sent: Friday, May 18, 2018 1:43 PM Subject: Re: [Latingp] Variant cross-script analysis worksheets I believe we delimited the scope of variants for the Latin script in the face to face meeting in Brussels, did we not? From: Bill Jouris <bill.jouris@insidethestack.com <mailto:bill.jouris@insidethestack.com> > Reply-To: Bill Jouris <bill.jouris@insidethestack.com <mailto:bill.jouris@insidethestack.com> > Date: Friday, May 18, 2018 at 2:18 PM To: Dennis Tan Tanaka <dtantanaka@verisign.com <mailto:dtantanaka@verisign.com> >, Meikal Mumin <meikal@mumin.de <mailto:meikal@mumin.de> > Cc: Michael Bauland <Michael.Bauland@knipp.de <mailto:Michael.Bauland@knipp.de> >, "Tan Tanaka, Dennis via Latingp" <latingp@icann.org <mailto:latingp@icann.org> > Subject: [EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets It is pretty clear, if one reads the MSR-3 document, that we are supposed to deal with Variants. Which include, but are NOT limited to, homoglyphs. Bill Jouris Inside Products bill.jouris@insidethestack.com <mailto:bill.jouris@insidethestack.com> 831-659-8360 925-855-9512 (direct) _____ From: "Tan Tanaka, Dennis" <dtantanaka@verisign.com <mailto:dtantanaka@verisign.com> > To: Meikal Mumin <meikal@mumin.de <mailto:meikal@mumin.de> > Cc: "bill.jouris@insidethestack.com <mailto:bill.jouris@insidethestack.com> " <bill.jouris@insidethestack.com <mailto:bill.jouris@insidethestack.com> >; Michael Bauland <Michael.Bauland@knipp.de <mailto:Michael.Bauland@knipp.de> >; "Tan Tanaka, Dennis via Latingp" <latingp@icann.org <mailto:latingp@icann.org> > Sent: Friday, May 18, 2018 10:20 AM Subject: Re: [Latingp] Variant cross-script analysis worksheets we must deal with such confusable characters or sequences of characters in the context of variants No, we don’t. Confusability is not in scope. We established the Latin panel will deal with homoglyphs or nearly homoglyphs (i.e. font variation) in the context of cross-scripts.
Dear colleagues, I thought I'd try to take up this thread again after some silence. Hopefully Michael is back from a nice holiday and could chime in on the discussion too (I think he might have not received Sarmad's earlier email). Obviously I can not say what was decided in Brussels, since I could not join the group, and that is why I had tried to put a question to our subgroup. I think Sarmad has provided us with nearly all additional references we should consider as guidance on how to approach this highly complex task. My conclusion is that it is more complex than reducing things to "homoglyphs" but I do not think that (at least linguistically) we have a strong definition of homoglyphs, which would clearly set them apart from confusables, near-confusables, near-homoglyphs (and all the other terms we may have used to find an understanding of one another and the issues at hand). Regarding the 'minority report' by Bill - Skimming over it I thought it would form an excellent basis for our chapter on variants and this work should not got to waste in my eyes. Equally, we can re-use Sarmad's summary and expand it to integrate it into the introduction of the variant section of the proposal. As I tried to argue in the last tele-conference, I believe it is important that we present not only the results or outcome of our work, but also the way we took to arrive at it, which means we have to discuss - at least briefly - the different considerations guiding our work. As a pragmatic step I would suggest we continue for the moment with the very useful tables Dennis created, adding those few additional variant pairs I had suggested in comments. I don't think it would be too much overhead including them in our 2-pass review, and if both reviewers happen to come to the same conclusion that those potential variant pairs have a 3-5 rating - that is that they are in-fact not variants - we also do not need to have a theoretical discussion on the difference between homoglyphs, near-homoglyphs, confusables, etc. In this way, our decision would be driven by decisions based on a careful analysis of the data, rather than any a before-hand conceptions on what the categorical relationship exists between some of these code-points, that is an a posteriori rather than a priori analysis if you so will. I hope this is helpful but let's keep up the discussion. I think we were making good progress with the tables and the 1-5 rating scale (rather than a binary choice). Best wishes, Meikal On 19 May 2018 at 05:10, Sarmad Hussain <sarmad.hussain@icann.org> wrote:
Dear All,
This is indeed a complex matter to address, and is therefore requiring this continued discussion. It may also be useful here to refer back to the RZ-LGR Procedure <https://www.icann.org/en/system/files/files/lgr-procedure-20mar13-en.pdf> .
The RZ-LGR Procedure, while defining “IDN variants” says that:
- “An IDN variant, as understood here, is an alternate code point (or sequence of code points) that could be substituted for a code point (or sequence of code points) in a candidate label to create a variant label that is considered the “same” in some measure by a given community of Internet users.”
However, the Procedure also acknowledges immediately following the definition that:
- “There is not general agreement of what that sameness requires, and many of the things people seem to want from that sameness are not technically achievable.”
While noting the benefits of defining IDN variants, the procedure also acknowledges the limitations.
- “The primary benefit of the LGR process is as a mechanism that delivers hands-off evaluation for these aspects. - “By doing so, the process may not be able to replace case-by-case analysis altogether: there will still be a role for additional types of review, such as for String Similarity, and which are not included in the LGR process.”
So, not all matters can be settled in the LGR. A line has to be drawn between “same” and “similar”.
The LGR Procedure does note what is desirable to be in the scope to LGR:
- “the LGR process is designed to clear the table of all the straightforward, non-subjective cases, mainly by returning a “blocked” disposition. - “Even for variants based on visual similarity, there exists a subset of evaluation rules that could be applied in an automated manner, obviating the need for further case-by case or even contextual review.”
But notes that this should not go too far into the string similarity discussion:
- “While the process described here could be expanded to address cases of visual similarity, that is not the primary intention” - “Finally, in investigating the possible variant relations, Generation Panels should ignore cases where the relation is based exclusively on aspects of visual similarity.”
One could infer from these statements in the RZ-LGR Procedure that:
1. If two code points are considered “same” by the user community, these should be included as IDN variants (this is not limited to visual similarity, but could also include semantic equivalence, like in Chinese, orthographic conventions or spelling simplification, like in Arabic, homophonic relations, like in Ethiopic, etc., as determined the respective script community) 2. The “straightforward, non-subjective cases” of visual similarity could be included as IDN variants and blocked 3. Beyond these, the analysis goes into the realm of string similarity review, which is beyond the intention of the LGR
Generation Panels have been asked to draw the line based on these guidelines provided in the RZ-LGR Procedure. For example, Cyrillic GP agreed to consider homoglyph relations with other related scripts for this purpose. Neo-Brahmi GP has used a slightly different technique, where it considers cross-script variants those code points which members of both scripts in question find such code points “indistinguishable” even if these are not homoglyphs (see the blog <https://www.icann.org/news/blog/the-south-asian-eleven-progress-on-supportin...> for some more details).
Of course, the Latin GP also needs to draw these lines for the analysis for identifying within-script and cross-script IDN variant cases.
Regards, Sarmad
*From:* Latingp [mailto:latingp-bounces@icann.org] *On Behalf Of *Bill Jouris *Sent:* Saturday, May 19, 2018 5:28 AM *To:* Tan Tanaka, Dennis <dtantanaka@verisign.com>; Meikal Mumin < meikal@mumin.de> *Cc:* Tan Tanaka, Dennis via Latingp <latingp@icann.org> *Subject:* Re: [Latingp] Variant cross-script analysis worksheets
It's been clear for some time, even before Brussels, that you think we should only look at homoglyphs. (Also that you don't think that there are any in-script homoglyphs. See the discussion about the schwa and the turned e.)
But there is a world of difference between agreeing, and merely deciding not to waste time arguing with a closed mind. Which, for me, is what happened in the discussion in Brussels.
Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 925-855-9512 (direct)
------------------------------
*From:* "Tan Tanaka, Dennis" <dtantanaka@verisign.com> *To:* Bill Jouris <bill.jouris@insidethestack.com>; Meikal Mumin < meikal@mumin.de> *Cc:* Michael Bauland <Michael.Bauland@knipp.de>; "Tan Tanaka, Dennis via Latingp" <latingp@icann.org> *Sent:* Friday, May 18, 2018 1:43 PM *Subject:* Re: [Latingp] Variant cross-script analysis worksheets
I believe we delimited the scope of variants for the Latin script in the face to face meeting in Brussels, did we not?
*From: *Bill Jouris <bill.jouris@insidethestack.com> *Reply-To: *Bill Jouris <bill.jouris@insidethestack.com> *Date: *Friday, May 18, 2018 at 2:18 PM *To: *Dennis Tan Tanaka <dtantanaka@verisign.com>, Meikal Mumin < meikal@mumin.de> *Cc: *Michael Bauland <Michael.Bauland@knipp.de>, "Tan Tanaka, Dennis via Latingp" <latingp@icann.org> *Subject: *[EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets
It is pretty clear, if one reads the MSR-3 document, that we are supposed to deal with *Variants*. Which include, *but are NOT limited to*, homoglyphs.
Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 925-855-9512 (direct)
------------------------------
*From:* "Tan Tanaka, Dennis" <dtantanaka@verisign.com> *To:* Meikal Mumin <meikal@mumin.de> *Cc:* "bill.jouris@insidethestack.com" <bill.jouris@insidethestack.com>; Michael Bauland <Michael.Bauland@knipp.de>; "Tan Tanaka, Dennis via Latingp" <latingp@icann.org> *Sent:* Friday, May 18, 2018 10:20 AM *Subject:* Re: [Latingp] Variant cross-script analysis worksheets
we must deal with such confusable characters or sequences of characters in the context of variants
No, we don’t. Confusability is not in scope. We established the Latin panel will deal with homoglyphs or nearly homoglyphs (i.e. font variation) in the context of cross-scripts.
Dear Meikal, et al The meeting minutes<https://docs.google.com/document/d/13KIVQlkHYc6_ib1ZDSbxRcLVlSHnhAVvc99ZrGwO...> may not answer all the questions, but does provide a light into what transpired during the Brussels workshop and what the panel members agreed on. The following Principles<https://docs.google.com/document/d/1IrT_kfildf1SumYUqjkaIkMT-TYx9IRqtuPMV4Yv...> document (still in development) is based on those decisions in Brussels. -Dennis From: Meikal Mumin <meikal.mumin@uni-koeln.de> Date: Tuesday, May 29, 2018 at 10:14 AM To: Bill Jouris <bill.jouris@insidethestack.com>, Dennis Tan Tanaka <dtantanaka@verisign.com>, Michael Bauland <Michael.Bauland@knipp.de>, Sarmad Hussain <sarmad.hussain@icann.org> Cc: Latin GP <LatinGP@icann.org> Subject: [EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets Dear colleagues, I thought I'd try to take up this thread again after some silence. Hopefully Michael is back from a nice holiday and could chime in on the discussion too (I think he might have not received Sarmad's earlier email). Obviously I can not say what was decided in Brussels, since I could not join the group, and that is why I had tried to put a question to our subgroup. I think Sarmad has provided us with nearly all additional references we should consider as guidance on how to approach this highly complex task. My conclusion is that it is more complex than reducing things to "homoglyphs" but I do not think that (at least linguistically) we have a strong definition of homoglyphs, which would clearly set them apart from confusables, near-confusables, near-homoglyphs (and all the other terms we may have used to find an understanding of one another and the issues at hand). Regarding the 'minority report' by Bill - Skimming over it I thought it would form an excellent basis for our chapter on variants and this work should not got to waste in my eyes. Equally, we can re-use Sarmad's summary and expand it to integrate it into the introduction of the variant section of the proposal. As I tried to argue in the last tele-conference, I believe it is important that we present not only the results or outcome of our work, but also the way we took to arrive at it, which means we have to discuss - at least briefly - the different considerations guiding our work. As a pragmatic step I would suggest we continue for the moment with the very useful tables Dennis created, adding those few additional variant pairs I had suggested in comments. I don't think it would be too much overhead including them in our 2-pass review, and if both reviewers happen to come to the same conclusion that those potential variant pairs have a 3-5 rating - that is that they are in-fact not variants - we also do not need to have a theoretical discussion on the difference between homoglyphs, near-homoglyphs, confusables, etc. In this way, our decision would be driven by decisions based on a careful analysis of the data, rather than any a before-hand conceptions on what the categorical relationship exists between some of these code-points, that is an a posteriori rather than a priori analysis if you so will. I hope this is helpful but let's keep up the discussion. I think we were making good progress with the tables and the 1-5 rating scale (rather than a binary choice). Best wishes, Meikal On 19 May 2018 at 05:10, Sarmad Hussain <sarmad.hussain@icann.org<mailto:sarmad.hussain@icann.org>> wrote: Dear All, This is indeed a complex matter to address, and is therefore requiring this continued discussion. It may also be useful here to refer back to the RZ-LGR Procedure<https://www.icann.org/en/system/files/files/lgr-procedure-20mar13-en.pdf>. The RZ-LGR Procedure, while defining “IDN variants” says that: · “An IDN variant, as understood here, is an alternate code point (or sequence of code points) that could be substituted for a code point (or sequence of code points) in a candidate label to create a variant label that is considered the “same” in some measure by a given community of Internet users.” However, the Procedure also acknowledges immediately following the definition that: · “There is not general agreement of what that sameness requires, and many of the things people seem to want from that sameness are not technically achievable.” While noting the benefits of defining IDN variants, the procedure also acknowledges the limitations. · “The primary benefit of the LGR process is as a mechanism that delivers hands-off evaluation for these aspects. · “By doing so, the process may not be able to replace case-by-case analysis altogether: there will still be a role for additional types of review, such as for String Similarity, and which are not included in the LGR process.” So, not all matters can be settled in the LGR. A line has to be drawn between “same” and “similar”. The LGR Procedure does note what is desirable to be in the scope to LGR: · “the LGR process is designed to clear the table of all the straightforward, non-subjective cases, mainly by returning a “blocked” disposition. · “Even for variants based on visual similarity, there exists a subset of evaluation rules that could be applied in an automated manner, obviating the need for further case-by case or even contextual review.” But notes that this should not go too far into the string similarity discussion: · “While the process described here could be expanded to address cases of visual similarity, that is not the primary intention” · “Finally, in investigating the possible variant relations, Generation Panels should ignore cases where the relation is based exclusively on aspects of visual similarity.” One could infer from these statements in the RZ-LGR Procedure that: 1. If two code points are considered “same” by the user community, these should be included as IDN variants (this is not limited to visual similarity, but could also include semantic equivalence, like in Chinese, orthographic conventions or spelling simplification, like in Arabic, homophonic relations, like in Ethiopic, etc., as determined the respective script community) 2. The “straightforward, non-subjective cases” of visual similarity could be included as IDN variants and blocked 3. Beyond these, the analysis goes into the realm of string similarity review, which is beyond the intention of the LGR Generation Panels have been asked to draw the line based on these guidelines provided in the RZ-LGR Procedure. For example, Cyrillic GP agreed to consider homoglyph relations with other related scripts for this purpose. Neo-Brahmi GP has used a slightly different technique, where it considers cross-script variants those code points which members of both scripts in question find such code points “indistinguishable” even if these are not homoglyphs (see the blog<https://www.icann.org/news/blog/the-south-asian-eleven-progress-on-supportin...> for some more details). Of course, the Latin GP also needs to draw these lines for the analysis for identifying within-script and cross-script IDN variant cases. Regards, Sarmad From: Latingp [mailto:latingp-bounces@icann.org<mailto:latingp-bounces@icann.org>] On Behalf Of Bill Jouris Sent: Saturday, May 19, 2018 5:28 AM To: Tan Tanaka, Dennis <dtantanaka@verisign.com<mailto:dtantanaka@verisign.com>>; Meikal Mumin <meikal@mumin.de<mailto:meikal@mumin.de>> Cc: Tan Tanaka, Dennis via Latingp <latingp@icann.org<mailto:latingp@icann.org>> Subject: Re: [Latingp] Variant cross-script analysis worksheets It's been clear for some time, even before Brussels, that you think we should only look at homoglyphs. (Also that you don't think that there are any in-script homoglyphs. See the discussion about the schwa and the turned e.) But there is a world of difference between agreeing, and merely deciding not to waste time arguing with a closed mind. Which, for me, is what happened in the discussion in Brussels. Bill Jouris Inside Products bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com> 831-659-8360 925-855-9512 (direct) ________________________________ From: "Tan Tanaka, Dennis" <dtantanaka@verisign.com<mailto:dtantanaka@verisign.com>> To: Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>>; Meikal Mumin <meikal@mumin.de<mailto:meikal@mumin.de>> Cc: Michael Bauland <Michael.Bauland@knipp.de<mailto:Michael.Bauland@knipp.de>>; "Tan Tanaka, Dennis via Latingp" <latingp@icann.org<mailto:latingp@icann.org>> Sent: Friday, May 18, 2018 1:43 PM Subject: Re: [Latingp] Variant cross-script analysis worksheets I believe we delimited the scope of variants for the Latin script in the face to face meeting in Brussels, did we not? From: Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> Reply-To: Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> Date: Friday, May 18, 2018 at 2:18 PM To: Dennis Tan Tanaka <dtantanaka@verisign.com<mailto:dtantanaka@verisign.com>>, Meikal Mumin <meikal@mumin.de<mailto:meikal@mumin.de>> Cc: Michael Bauland <Michael.Bauland@knipp.de<mailto:Michael.Bauland@knipp.de>>, "Tan Tanaka, Dennis via Latingp" <latingp@icann.org<mailto:latingp@icann.org>> Subject: [EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets It is pretty clear, if one reads the MSR-3 document, that we are supposed to deal with Variants. Which include, but are NOT limited to, homoglyphs. Bill Jouris Inside Products bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com> 831-659-8360 925-855-9512 (direct) ________________________________ From: "Tan Tanaka, Dennis" <dtantanaka@verisign.com<mailto:dtantanaka@verisign.com>> To: Meikal Mumin <meikal@mumin.de<mailto:meikal@mumin.de>> Cc: "bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>" <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>>; Michael Bauland <Michael.Bauland@knipp.de<mailto:Michael.Bauland@knipp.de>>; "Tan Tanaka, Dennis via Latingp" <latingp@icann.org<mailto:latingp@icann.org>> Sent: Friday, May 18, 2018 10:20 AM Subject: Re: [Latingp] Variant cross-script analysis worksheets we must deal with such confusable characters or sequences of characters in the context of variants No, we don’t. Confusability is not in scope. We established the Latin panel will deal with homoglyphs or nearly homoglyphs (i.e. font variation) in the context of cross-scripts.
From: Meikal Mumin <meikal.mumin@uni-koeln.de> Date: Tuesday, May 29, 2018 at 10:14 AM To: Bill Jouris <bill.jouris@insidethestack.com>, Dennis Tan Tanaka <dtantanaka@verisign.com>, Michael Bauland <Michael.Bauland@knipp.de>, Sarmad Hussain <sarmad.hussain@icann.org> Cc: Latin GP <LatinGP@icann.org> Subject: [EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets My conclusion is that it is more complex than reducing things to "homoglyphs" but I do not think that (at least linguistically) we have a strong definition of homoglyphs On homoglyphs the Latin GP has received the following guidance from IP, in writing and verbally (during the Brussels workshop) “In the context of the Root Zone, the Procedure is quite clear in that it considers simple similarity of appearance to be outside the scope of the Root Zone LGR. In admitting exact homoglyphs, the IP has been making the argument that ‘e’ in Latin (U+0065) and ‘е’ in Cyrillic(U+0435) are not just visually indistinguishable, but that their distinct code points effectively represent a disunification by script property.” – Email from IP to Latin GP of 18 October 2017 in response to our draft Principles for Inclusion and Exclusion of Code Points in Latin Script for the Root Zone, and in particular to our Analysis of Variants in the Latin Script for the Root Zone. “The kinds of variants to be defined in the Root Zone LGR are limited to homoglyphs, which are characters with essentially identical appearance by design, instead of merely similar appearance.” – Integration Panel feedback to Latin GP proposal of 22 March 2017.
And yet, in speaking with members of the IP (at San Juan), on the subject of the Least Astonishment Principle, what they said was "We are looking to the Generation Panels for guidance." And said further (on the matter of the breve and caron) "When I am typing something that includes one of them, I have to copy and paste because I can't tell which I am lookIng at." In short, while the two diacritics are clearly not identical, at least with sufficient magnification, as far as he was concerned there was no reason that they could not be classified by us as varients. What constitutes "identical in appearance" depends enormously on just how much magnification is assumed. The rationale for assuming anything larger than 12 point type is not at all obvious. Bill Sent from Yahoo Mail on Android On Wed, May 30, 2018 at 1:08 PM, Tan Tanaka, Dennis<dtantanaka@verisign.com> wrote: #yiv4522631482 #yiv4522631482 -- _filtered #yiv4522631482 {panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv4522631482 {font-family:DengXian;panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered #yiv4522631482 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv4522631482 {panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered #yiv4522631482 {font-family:-webkit-standard;panose-1:2 11 6 4 2 2 2 2 2 4;}#yiv4522631482 #yiv4522631482 p.yiv4522631482MsoNormal, #yiv4522631482 li.yiv4522631482MsoNormal, #yiv4522631482 div.yiv4522631482MsoNormal {margin:0in;margin-bottom:.0001pt;font-size:11.0pt;font-family:sans-serif;}#yiv4522631482 a:link, #yiv4522631482 span.yiv4522631482MsoHyperlink {color:#0563C1;text-decoration:underline;}#yiv4522631482 a:visited, #yiv4522631482 span.yiv4522631482MsoHyperlinkFollowed {color:#954F72;text-decoration:underline;}#yiv4522631482 p.yiv4522631482msonormal0, #yiv4522631482 li.yiv4522631482msonormal0, #yiv4522631482 div.yiv4522631482msonormal0 {margin-right:0in;margin-left:0in;font-size:11.0pt;font-family:sans-serif;}#yiv4522631482 span.yiv4522631482apple-converted-space {}#yiv4522631482 span.yiv4522631482EmailStyle19 {font-family:sans-serif;color:windowtext;}#yiv4522631482 .yiv4522631482MsoChpDefault {font-size:10.0pt;} _filtered #yiv4522631482 {margin:1.0in 1.0in 1.0in 1.0in;}#yiv4522631482 div.yiv4522631482WordSection1 {}#yiv4522631482 From:Meikal Mumin <meikal.mumin@uni-koeln.de> Date: Tuesday, May 29, 2018 at 10:14 AM To: Bill Jouris <bill.jouris@insidethestack.com>, Dennis Tan Tanaka <dtantanaka@verisign.com>, Michael Bauland <Michael.Bauland@knipp.de>, Sarmad Hussain <sarmad.hussain@icann.org> Cc: Latin GP <LatinGP@icann.org> Subject: [EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets My conclusion is that it is more complex than reducing things to "homoglyphs" but I do not think that (at least linguistically) we have a strong definition of homoglyphs On homoglyphs the Latin GP has received the following guidance from IP, in writing and verbally (during the Brussels workshop) “In the context of the Root Zone, the Procedure is quite clear in that it considers simple similarity of appearance to be outside the scope of the Root Zone LGR. In admitting exact homoglyphs, the IP has been making the argument that ‘e’ in Latin (U+0065) and ‘е’ in Cyrillic(U+0435) are not just visually indistinguishable, but that their distinct code points effectively represent a disunification by script property.” – Email from IP to Latin GP of 18 October 2017 in response to our draft Principles for Inclusion and Exclusion of Code Points in Latin Script for the Root Zone, and in particular to our Analysis of Variants in the Latin Script for the Root Zone. “The kinds of variants to be defined in the Root Zone LGR are limited to homoglyphs, which are characters with essentially identical appearance by design, instead of merely similar appearance.” – Integration Panel feedback to Latin GP proposal of 22 March 2017.
It would be helpful if we could get the IP to issue an statement on the points you are raising below. -Dennis From: Bill Jouris <bill.jouris@insidethestack.com> Reply-To: "bill.jouris@insidethestack.com" <bill.jouris@insidethestack.com> Date: Wednesday, May 30, 2018 at 5:24 PM To: Dennis Tan Tanaka <dtantanaka@verisign.com>, Meikal Mumin <meikal.mumin@uni-koeln.de>, Michael Bauland <Michael.Bauland@knipp.de>, Sarmad Hussain <sarmad.hussain@icann.org> Cc: Latin GP <LatinGP@icann.org> Subject: [EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets And yet, in speaking with members of the IP (at San Juan), on the subject of the Least Astonishment Principle, what they said was "We are looking to the Generation Panels for guidance." And said further (on the matter of the breve and caron) "When I am typing something that includes one of them, I have to copy and paste because I can't tell which I am lookIng at." In short, while the two diacritics are clearly not identical, at least with sufficient magnification, as far as he was concerned there was no reason that they could not be classified by us as varients. What constitutes "identical in appearance" depends enormously on just how much magnification is assumed. The rationale for assuming anything larger than 12 point type is not at all obvious. Bill Sent from Yahoo Mail on Androi<https://go.onelink.me/107872968?pid=InProduct&c=Global_Internal_YGrowth_AndroidEmailSig__AndroidUsers&af_wl=ym&af_sub1=Internal&af_sub2=Global_YGrowth&af_sub3=EmailSignature>d On Wed, May 30, 2018 at 1:08 PM, Tan Tanaka, Dennis <dtantanaka@verisign.com> wrote: From: Meikal Mumin <meikal.mumin@uni-koeln.de> Date: Tuesday, May 29, 2018 at 10:14 AM To: Bill Jouris <bill.jouris@insidethestack.com>, Dennis Tan Tanaka <dtantanaka@verisign.com>, Michael Bauland <Michael.Bauland@knipp.de>, Sarmad Hussain <sarmad.hussain@icann.org> Cc: Latin GP <LatinGP@icann.org> Subject: [EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets My conclusion is that it is more complex than reducing things to "homoglyphs" but I do not think that (at least linguistically) we have a strong definition of homoglyphs On homoglyphs the Latin GP has received the following guidance from IP, in writing and verbally (during the Brussels workshop) “In the context of the Root Zone, the Procedure is quite clear in that it considers simple similarity of appearance to be outside the scope of the Root Zone LGR. In admitting exact homoglyphs, the IP has been making the argument that ‘e’ in Latin (U+0065) and ‘е’ in Cyrillic(U+0435) are not just visually indistinguishable, but that their distinct code points effectively represent a disunification by script property.” – Email from IP to Latin GP of 18 October 2017 in response to our draft Principles for Inclusion and Exclusion of Code Points in Latin Script for the Root Zone, and in particular to our Analysis of Variants in the Latin Script for the Root Zone. “The kinds of variants to be defined in the Root Zone LGR are limited to homoglyphs, which are characters with essentially identical appearance by design, instead of merely similar appearance.” – Integration Panel feedback to Latin GP proposal of 22 March 2017.
Dear colleuages, personally, I think we should conduct our own data-driven analysis based on our own interpretation of the rules and guidelines, and then ask IP for a review. After all, we are the community and it is our choice - they are only the ones to judge wether we meet the common criteria. That would be in line we the view they expressed, that the GPs are to guide IP. Best, Meikal On 31 May 2018 at 15:14, Tan Tanaka, Dennis <dtantanaka@verisign.com> wrote:
It would be helpful if we could get the IP to issue an statement on the points you are raising below.
-Dennis
*From: *Bill Jouris <bill.jouris@insidethestack.com> *Reply-To: *"bill.jouris@insidethestack.com" <bill.jouris@insidethestack. com> *Date: *Wednesday, May 30, 2018 at 5:24 PM *To: *Dennis Tan Tanaka <dtantanaka@verisign.com>, Meikal Mumin < meikal.mumin@uni-koeln.de>, Michael Bauland <Michael.Bauland@knipp.de>, Sarmad Hussain <sarmad.hussain@icann.org> *Cc: *Latin GP <LatinGP@icann.org> *Subject: *[EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets
And yet, in speaking with members of the IP (at San Juan), on the subject of the Least Astonishment Principle, what they said was "We are looking to the Generation Panels for guidance."
And said further (on the matter of the breve and caron) "When I am typing something that includes one of them, I have to copy and paste because I can't tell which I am lookIng at." In short, while the two diacritics are clearly not identical, at least with sufficient magnification, as far as he was concerned there was no reason that they could not be classified by us as varients.
What constitutes "identical in appearance" depends enormously on just how much magnification is assumed. The rationale for assuming anything larger than 12 point type is not at all obvious.
Bill
Sent from Yahoo Mail on Androi <https://go.onelink.me/107872968?pid=InProduct&c=Global_Internal_YGrowth_Andr...> d
On Wed, May 30, 2018 at 1:08 PM, Tan Tanaka, Dennis
<dtantanaka@verisign.com> wrote:
*From: *Meikal Mumin <meikal.mumin@uni-koeln.de> *Date: *Tuesday, May 29, 2018 at 10:14 AM *To: *Bill Jouris <bill.jouris@insidethestack.com>, Dennis Tan Tanaka < dtantanaka@verisign.com>, Michael Bauland <Michael.Bauland@knipp.de>, Sarmad Hussain <sarmad.hussain@icann.org> *Cc: *Latin GP <LatinGP@icann.org> *Subject: *[EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets
My conclusion is that it is more complex than reducing things to " homoglyphs" but I do not think that (at least linguistically) we have a strong definition of homoglyphs
On homoglyphs the Latin GP has received the following guidance from IP, in writing and verbally (during the Brussels workshop)
*“In the context of the Root Zone, the Procedure is quite clear in that it considers simple similarity of appearance to be outside the scope of the Root Zone LGR. In admitting exact homoglyphs, the IP has been making the argument that ‘e’ in Latin (U+0065) and ‘е’ in Cyrillic(U+0435) are not just visually indistinguishable, but that their distinct code points effectively represent a disunification by script property.”* – Email from IP to Latin GP of 18 October 2017 in response to our draft Principles for Inclusion and Exclusion of Code Points in Latin Script for the Root Zone, and in particular to our Analysis of Variants in the Latin Script for the Root Zone.
*“The kinds of variants to be defined in the Root Zone LGR are limited to homoglyphs, which are characters with essentially identical appearance by design, instead of merely similar appearance.”* – Integration Panel feedback to Latin GP proposal of 22 March 2017.
Dear Colleagues, My opinion on this matter is the same as Meikal stated in this mail. We should make our own interpretation of rules and guidelines and stick to it. In the further discussion with IP we could modify our findings if there is enough reasons for it. Regards Mirjana From: Latin GP <latingp-bounces@icann.org> on behalf of Meikal Mumin <meikal.mumin@uni-koeln.de> Date: Thursday, May 31, 2018 at 10:00 To: "Tan Tanaka, Dennis" <dtantanaka@verisign.com> Cc: Latin GP <LatinGP@icann.org> Subject: Re: [Latingp] Variant cross-script analysis worksheets Dear colleuages, personally, I think we should conduct our own data-driven analysis based on our own interpretation of the rules and guidelines, and then ask IP for a review. After all, we are the community and it is our choice - they are only the ones to judge wether we meet the common criteria. That would be in line we the view they expressed, that the GPs are to guide IP. Best, Meikal On 31 May 2018 at 15:14, Tan Tanaka, Dennis <dtantanaka@verisign.com<mailto:dtantanaka@verisign.com>> wrote: It would be helpful if we could get the IP to issue an statement on the points you are raising below. -Dennis From: Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> Reply-To: "bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>" <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> Date: Wednesday, May 30, 2018 at 5:24 PM To: Dennis Tan Tanaka <dtantanaka@verisign.com<mailto:dtantanaka@verisign.com>>, Meikal Mumin <meikal.mumin@uni-koeln.de<mailto:meikal.mumin@uni-koeln.de>>, Michael Bauland <Michael.Bauland@knipp.de<mailto:Michael.Bauland@knipp.de>>, Sarmad Hussain <sarmad.hussain@icann.org<mailto:sarmad.hussain@icann.org>> Cc: Latin GP <LatinGP@icann.org<mailto:LatinGP@icann.org>> Subject: [EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets And yet, in speaking with members of the IP (at San Juan), on the subject of the Least Astonishment Principle, what they said was "We are looking to the Generation Panels for guidance." And said further (on the matter of the breve and caron) "When I am typing something that includes one of them, I have to copy and paste because I can't tell which I am lookIng at." In short, while the two diacritics are clearly not identical, at least with sufficient magnification, as far as he was concerned there was no reason that they could not be classified by us as varients. What constitutes "identical in appearance" depends enormously on just how much magnification is assumed. The rationale for assuming anything larger than 12 point type is not at all obvious. Bill Sent from Yahoo Mail on Androi<https://go.onelink.me/107872968?pid=InProduct&c=Global_Internal_YGrowth_AndroidEmailSig__AndroidUsers&af_wl=ym&af_sub1=Internal&af_sub2=Global_YGrowth&af_sub3=EmailSignature>d On Wed, May 30, 2018 at 1:08 PM, Tan Tanaka, Dennis <dtantanaka@verisign.com<mailto:dtantanaka@verisign.com>> wrote: From: Meikal Mumin <meikal.mumin@uni-koeln.de<mailto:meikal.mumin@uni-koeln.de>> Date: Tuesday, May 29, 2018 at 10:14 AM To: Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>>, Dennis Tan Tanaka <dtantanaka@verisign.com<mailto:dtantanaka@verisign.com>>, Michael Bauland <Michael.Bauland@knipp.de<mailto:Michael.Bauland@knipp.de>>, Sarmad Hussain <sarmad.hussain@icann.org<mailto:sarmad.hussain@icann.org>> Cc: Latin GP <LatinGP@icann.org<mailto:LatinGP@icann.org>> Subject: [EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets My conclusion is that it is more complex than reducing things to "homoglyphs" but I do not think that (at least linguistically) we have a strong definition of homoglyphs On homoglyphs the Latin GP has received the following guidance from IP, in writing and verbally (during the Brussels workshop) “In the context of the Root Zone, the Procedure is quite clear in that it considers simple similarity of appearance to be outside the scope of the Root Zone LGR. In admitting exact homoglyphs, the IP has been making the argument that ‘e’ in Latin (U+0065) and ‘е’ in Cyrillic(U+0435) are not just visually indistinguishable, but that their distinct code points effectively represent a disunification by script property.” – Email from IP to Latin GP of 18 October 2017 in response to our draft Principles for Inclusion and Exclusion of Code Points in Latin Script for the Root Zone, and in particular to our Analysis of Variants in the Latin Script for the Root Zone. “The kinds of variants to be defined in the Root Zone LGR are limited to homoglyphs, which are characters with essentially identical appearance by design, instead of merely similar appearance.” – Integration Panel feedback to Latin GP proposal of 22 March 2017.
Dear colleagues, Thanks for these additional sources, Dennis. I also looked up the minutes from Brussels, which state that "GP expects that variants in the Latin Script for the RZ LGR be limited to homoglyphs (i.e., visual identical characters)". The issue I see here is that the term homoglyph suggest a categorical difference, which linguistically does not exist in my opinion. In other words, there is no systematic way of telling apart characters "of merely similar appearance" from "characters with essentially identical appearance by design" in the context of Latin script. There is no categorical difference between a visually identical character vs. a visually similar character, but such a categorization is an arbitrary choice out of a continuum, and that continuum is shaped based on different factors including font, context, the user and the user's education, the user's client, as well as technical, linguistic, and neurological aspects relating to human perception. As for the former email date October 18, 2017, regarding the theory about disunification by script property - I believe that this is an ahistorical view, which may be correct only from a synchronic point of view, that is from today's point of view only . Fact of the matter is that for historical reasons, everything is inter-related so that visual similarity is not accidental but - in most cases - because of a historical connection. All these different shapes were once one and the same, and minute differences once considered mere variation by the hands of individual writers have come to form the basis for forming new letters (which is precisely what I was looking to demonstrate in the draft text of the proposal which the group considered irrelevant because it was discussing the history of the script). So in my view, sticking to "homogylphs" does not help us because the term is void of meaning in a linguistic context. If there is a modern definition in a technical context, I would appreciate if anyone could quote it for me. I hope this helps contextualize the facts from a linguistic point of view. Best, Meikal On 30 May 2018 at 22:08, Tan Tanaka, Dennis <dtantanaka@verisign.com> wrote:
*From: *Meikal Mumin <meikal.mumin@uni-koeln.de> *Date: *Tuesday, May 29, 2018 at 10:14 AM *To: *Bill Jouris <bill.jouris@insidethestack.com>, Dennis Tan Tanaka < dtantanaka@verisign.com>, Michael Bauland <Michael.Bauland@knipp.de>, Sarmad Hussain <sarmad.hussain@icann.org> *Cc: *Latin GP <LatinGP@icann.org> *Subject: *[EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets
My conclusion is that it is more complex than reducing things to " homoglyphs" but I do not think that (at least linguistically) we have a strong definition of homoglyphs
On homoglyphs the Latin GP has received the following guidance from IP, in writing and verbally (during the Brussels workshop)
*“In the context of the Root Zone, the Procedure is quite clear in that it considers simple similarity of appearance to be outside the scope of the Root Zone LGR. In admitting exact homoglyphs, the IP has been making the argument that ‘e’ in Latin (U+0065) and ‘е’ in Cyrillic(U+0435) are not just visually indistinguishable, but that their distinct code points effectively represent a disunification by script property.”* – Email from IP to Latin GP of 18 October 2017 in response to our draft Principles for Inclusion and Exclusion of Code Points in Latin Script for the Root Zone, and in particular to our Analysis of Variants in the Latin Script for the Root Zone.
*“The kinds of variants to be defined in the Root Zone LGR are limited to homoglyphs, which are characters with essentially identical appearance by design, instead of merely similar appearance.”* – Integration Panel feedback to Latin GP proposal of 22 March 2017.
The Cyrillic worksheet has been updated, mainly with Meikal’s input. This means that your current assessment may need a revision. Please review at your convenience. Thanks, Dennis From: Dennis Tan Tanaka <dtantanaka@verisign.com> Date: Friday, April 27, 2018 at 4:08 PM To: "bill.jouris@insidethestack.com" <bill.jouris@insidethestack.com>, Meikal Mumin <meikal@mumin.de>, Michael Bauland <Michael.Bauland@knipp.de> Cc: "Tan Tanaka, Dennis via Latingp" <latingp@icann.org> Subject: Re: [EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets Bill, Meikal and Michael Now you can see the assignments in columns K and M. Your input should be recorded in Columns J and L per the scoring system and criteria. Cyrillic: https://docs.google.com/spreadsheets/d/1_AvaG3vgNsozyNs1Cj68HivcWmV7HRwwXz4k... Greek: https://docs.google.com/spreadsheets/d/1EK9HdI64HXvhGP8XkV9ZSZhePVRWPBxBDXId... Armenian: https://docs.google.com/spreadsheets/d/1SpC8hHecvohEeBveFBQz0Fw7wQAxE9HG9Bu-... As a first step, can I ask you to do FIRST INSPECTION ONLY. That means, fill out Column J only by Monday May 14. I will have us meet (if there is no Panel meeting) later that week as a check-point. Work out bugs, feedback, etc and then move one to second inspection. Any questions please let me know. -Dennis From: Latingp <latingp-bounces@icann.org> on behalf of "Tan Tanaka, Dennis via Latingp" <latingp@icann.org> Reply-To: Dennis Tan Tanaka <dtantanaka@verisign.com> Date: Friday, April 27, 2018 at 2:09 PM To: "bill.jouris@insidethestack.com" <bill.jouris@insidethestack.com> Cc: Meikal Mumin <meikal@mumin.de>, "Tan Tanaka, Dennis via Latingp" <latingp@icann.org> Subject: [EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets Thanks Bill. I didn’t want to assume you all will be available. Others? Going once, going twice.... Sent from my iPhone On Apr 27, 2018, at 2:07 PM, Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> wrote: I missed that you were looking for volunteers including those of us already on the Variants group. Sorry Of course I will be glad to help Bill Sent from Yahoo Mail on Android<https://go.onelink.me/107872968?pid=InProduct&c=Global_Internal_YGrowth_Andr...> On Fri, Apr 27, 2018 at 11:03 AM, Tan Tanaka, Dennis via Latingp <latingp@icann.org<mailto:latingp@icann.org>> wrote: _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp <Untitled>
participants (6)
-
Bill Jouris -
Meikal Mumin -
Meikal Mumin -
Mirjana Tasić -
Sarmad Hussain -
Tan Tanaka, Dennis