Re: [Latingp] For consideration at LGP first meeting (Agenda point 4)

Paul Hoffman

Sept. 17, 2015

12:38 a.m.

This request is fairly confusing, at least to me, on three counts: 1) Section B.3.4.2 of the LGR Procedures document says "Finally, in investigating the possible variant relations, Generation Panels should ignore cases where the relation is based exclusively on aspects of visual similarity." 2) The Armenian script does not appear to be related to the Latin script in the sense of the LGR rules. The characters that might have similar usage in the scripts mostly look completely different. 3) Even ignoring the previous two points, the request seems to be for cross-script variants, such as for an Armenian letter that looks like a Latin letter. The "Guidelines for LGR" document indicates that cross-script variants might be created, but gives no hints about why we should do that given that the root zone will consist of labels of a single script (according to Section A.3.1). This feels to me like the Armenian GP's submission is not related to ours, so we should not be commenting on it. --Paul Hoffman

Attachments:

attachment.html (text/html — 2.4 KB)
smime.p7s (application/pkcs7-signature — 4.5 KB)

Show replies by date

Meikal Mumin

September 2015

7:40 a.m.

New subject: For consideration at LGP first meeting (Agenda point 4)

Dear colleagues, I agree with the Paul. Best wishes, Meikal Von meinem iPhone gesendet

...

Am 17.09.2015 um 03:38 schrieb Paul Hoffman <paul.hoffman@icann.org>:

This request is fairly confusing, at least to me, on three counts:

1) Section B.3.4.2 of the LGR Procedures document says "Finally, in investigating the possible variant relations, Generation Panels should ignore cases where the relation is based exclusively on aspects of visual similarity."

2) The Armenian script does not appear to be related to the Latin script in the sense of the LGR rules. The characters that might have similar usage in the scripts mostly look completely different.

3) Even ignoring the previous two points, the request seems to be for cross-script variants, such as for an Armenian letter that looks like a Latin letter. The "Guidelines for LGR" document indicates that cross-script variants might be created, but gives no hints about why we should do that given that the root zone will consist of labels of a single script (according to Section A.3.1).

This feels to me like the Armenian GP's submission is not related to ours, so we should not be commenting on it.

--Paul Hoffman _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp

Dillon, Chris

10:54 a.m.

New subject: For consideration at LGP first meeting (Agenda point 4)

Dear Paul & Meikal, Thank you for your comments. 1) "Finally, in investigating the possible variant relations, Generation Panels should ignore cases where the relation is based exclusively on aspects of visual similarity." I think this may mean variant code point relations, but certainly needs clarifying. If my understanding is correct it would mean that Latin o and the identical Armenian letter would not be declared universally as variant code points. 2) I shall save my comments on this point, as I would not want to influence people in a particular direction. 3) As you write, and also according to B.3.2 in the Procedure to develop and maintain the Label Generation Rules for the Root Zone in respect of IDNA labels, mixed script labels at the Top Level currently seem unlikely. They do exist at lower levels, usually involving the Latin script or Arabic numerals and another script. If my understanding is correct, the issue is whole Armenian labels that are visually similar to whole Latin labels. I think this is why we are being asked to comment. Regards, Chris. -- Research Associate in Linguistic Computing, Centre for Digital Humanities, UCL, Gower St, London WC1E 6BT Tel +44 20 7679 1599 (int 31599) www.ucl.ac.uk/dis/people/chrisdillon<http://www.ucl.ac.uk/dis/people/chrisdillon> From: Paul Hoffman [mailto:paul.hoffman@icann.org] Sent: 17 September 2015 01:38 To: Dillon, Chris <c.dillon@ucl.ac.uk> Cc: LatinGP@icann.org Subject: Re: [Latingp] For consideration at LGP first meeting (Agenda point 4) This request is fairly confusing, at least to me, on three counts: 1) Section B.3.4.2 of the LGR Procedures document says "Finally, in investigating the possible variant relations, Generation Panels should ignore cases where the relation is based exclusively on aspects of visual similarity." 2) The Armenian script does not appear to be related to the Latin script in the sense of the LGR rules. The characters that might have similar usage in the scripts mostly look completely different. 3) Even ignoring the previous two points, the request seems to be for cross-script variants, such as for an Armenian letter that looks like a Latin letter. The "Guidelines for LGR" document indicates that cross-script variants might be created, but gives no hints about why we should do that given that the root zone will consist of labels of a single script (according to Section A.3.1). This feels to me like the Armenian GP's submission is not related to ours, so we should not be commenting on it. --Paul Hoffman

Paul Hoffman

3:03 p.m.

New subject: For consideration at LGP first meeting (Agenda point 4)

...

1) "Finally, in investigating the possible variant relations, Generation Panels should ignore cases where the relation is based exclusively on aspects of visual similarity." I think this may mean variant code point relations, but certainly needs clarifying.

Will the chairs ask the Integration Panel about that? Or do we as a GP need to formulate that message?

...

If my understanding is correct it would mean that Latin o and the identical Armenian letter would not be declared universally as variant code points.

Terms like "identical" are tricky. There is a character in the Armenian script called "Armenian small letter oh" that looks like an italicized "Latin small letter o", but according to Dalby's "Dictionary of Languages", it is *not* the equivalent of the Latin small letter o"; instead, the "Armenian small letter vo" (which looks like an italicized superscripted n) is the equivalent of "Latin small letter o".

...

2) I shall save my comments on this point, as I would not want to influence people in a particular direction.

3) As you write, and also according to B.3.2 in the Procedure to develop and maintain the Label Generation Rules for the Root Zone in respect of IDNA labels, mixed script labels at the Top Level currently seem unlikely. They do exist at lower levels, usually involving the Latin script or Arabic numerals and another script.

Please correct me if I'm wrong, but I thought this GP was for the use of Latin at the root, and our decisions are supposed to be based on that. If I'm right, then we can use the "single script" rule as a way of making our decisions.

...

If my understanding is correct, the issue is whole Armenian labels that are visually similar to whole Latin labels.

Sure, that could be. However, given that instruction to "ignore cases where the relation is based exclusively on aspects of visual similarity", that doesn't seem relevant. Worse, there are numerous scripts that have characters that will look like a vertical line ("Latin small letter l") or a horizontal line ("hyphen minus" and "low line"). I would hope that each GP with such a character is not expected to communicate with every other GP to find if they have such characters.

...

I think this is why we are being asked to comment.

Can we see the original request from the Armenian GP or from the Integration Panel? It would help us decide how best to respond. --Paul Hoffman

Dillon, Chris

8:22 a.m.

New subject: For consideration at LGP first meeting (Agenda point 4)

Dear Paul, Thank you for your comments. 1) I have asked ICANN staff to contact the IP about a clarification of the sentence. 3) This GP is indeed only for the use of Latin in the root. The original request for us to comment came from ICANN staff when the Armenian proposal was put up for public comment. Please find it below. Regards, Chris. -- Research Associate in Linguistic Computing, Centre for Digital Humanities, UCL, Gower St, London WC1E 6BT Tel +44 20 7679 1599 (int 31599) www.ucl.ac.uk/dis/people/chrisdillon<http://www.ucl.ac.uk/dis/people/chrisdillon> From: Sarmad Hussain [mailto:sarmad.hussain@icann.org] Sent: 02 September 2015 07:25 To: Dillon, Chris <c.dillon@ucl.ac.uk<mailto:c.dillon@ucl.ac.uk>>; Eric Brunner-Williams (ebw@abenaki.wabanaki.net<mailto:ebw@abenaki.wabanaki.net>) <ebw@abenaki.wabanaki.net<mailto:ebw@abenaki.wabanaki.net>> Cc: Alireza Saleh <alireza.saleh@icann.org<mailto:alireza.saleh@icann.org>> Subject: Request for comments by Latin GP on Root Zone LGR proposal by Armenian GP Dear Chris and Eric, You may be aware that Armenian GP has completed its work and has released the final proposal for Root Zone LGR for the Armenian script. The proposal and its documentation is available for public comments at https://www.icann.org/public-comments/proposal-armenian-lgr-2015-07-22-en as a final step before its evaluation by IP and eventual integration into the Root Zone LGR. In their proposal documentation they discuss homoglyph relations between Armenian and Latin scripts, listing the following: ======= զ U+0566 Armenian small letter ZA q U+0071 Latin small letter Q հ U+0570 Armenian small letter HO h U+0068 Latin small letter H ո U+0578 Armenian small letter VO n U+006E Latin small letter N ս U+057D Armenian small letter SHE u U+0075 Latin small letter U ց U+0581 Armenian small letter CO g U+0067 Latin small letter G ւ U+0582 Armenian small letter YIWN ɩ U+0269 Latin small letter IOTA օ U+0585 Armenian small letter OH o U+006F Latin small letter O Notes: The ARMENIAN SMALL LETTER ZA and the LATIN SMALL LETTER Q are not perfect homoglyphs but the difference may not be perceivable at normal size. A label, such as .զսօ would be readily accepted by users as “the same” as the label .quo (in Latin). The ‘g’ homoglyph situation only exists in sans-serif style, which, however, is a very common choice for user interfaces. Armenian YIWN and Latin IOTA are considered homoglyphs as their visual perception is quite similar. 6 The ARMENIAN LETTER YI and j U+006A LATIN SMALL LETTER J are not considered homoglyphs as “dot” changes the visual perception of the letter. That is why this case is not included into the cross-script variants. ======= We would request the Latin GP to kindly look at the Armenian proposal documentation and submit a public comment either supporting their proposal or suggesting any changes/additions based on Latin GP point of view. The deadline for public comments is 30 Sept. We look forward to the response by Latin GP. Best regards, Sarmad From: Paul Hoffman [mailto:paul.hoffman@icann.org] Sent: 17 September 2015 16:03 To: Dillon, Chris <c.dillon@ucl.ac.uk>; LatinGP@icann.org Cc: Eric Brunner-Williams <ebw@abenaki.wabanaki.net>; Alireza Saleh <alireza.saleh@icann.org>; Sarmad Hussain <sarmad.hussain@icann.org> Subject: Re: [Latingp] For consideration at LGP first meeting (Agenda point 4) 1) "Finally, in investigating the possible variant relations, Generation Panels should ignore cases where the relation is based exclusively on aspects of visual similarity." I think this may mean variant code point relations, but certainly needs clarifying. Will the chairs ask the Integration Panel about that? Or do we as a GP need to formulate that message? If my understanding is correct it would mean that Latin o and the identical Armenian letter would not be declared universally as variant code points. Terms like "identical" are tricky. There is a character in the Armenian script called "Armenian small letter oh" that looks like an italicized "Latin small letter o", but according to Dalby's "Dictionary of Languages", it is *not* the equivalent of the Latin small letter o"; instead, the "Armenian small letter vo" (which looks like an italicized superscripted n) is the equivalent of "Latin small letter o". 2) I shall save my comments on this point, as I would not want to influence people in a particular direction. 3) As you write, and also according to B.3.2 in the Procedure to develop and maintain the Label Generation Rules for the Root Zone in respect of IDNA labels, mixed script labels at the Top Level currently seem unlikely. They do exist at lower levels, usually involving the Latin script or Arabic numerals and another script. Please correct me if I'm wrong, but I thought this GP was for the use of Latin at the root, and our decisions are supposed to be based on that. If I'm right, then we can use the "single script" rule as a way of making our decisions. If my understanding is correct, the issue is whole Armenian labels that are visually similar to whole Latin labels. Sure, that could be. However, given that instruction to "ignore cases where the relation is based exclusively on aspects of visual similarity", that doesn't seem relevant. Worse, there are numerous scripts that have characters that will look like a vertical line ("Latin small letter l") or a horizontal line ("hyphen minus" and "low line"). I would hope that each GP with such a character is not expected to communicate with every other GP to find if they have such characters. I think this is why we are being asked to comment. Can we see the original request from the Armenian GP or from the Integration Panel? It would help us decide how best to respond. --Paul Hoffman

Meikal Mumin

10:01 a.m.

New subject: For consideration at LGP first meeting (Agenda point 4)

Dear all, on ArabicGP we never considered or even commented on any possible visual similarity between Arabic script Alif and capital Roman/Latin script i., to give an (albeit less convincing) example. From what I understand, script mixing is not permitted at root zone level and in any case Integration Panel would deal with such cases at Integration - as probably required for CJK scripts e.g. Best, Meikal 2015-09-18 11:22 GMT+03:00 Dillon, Chris <c.dillon@ucl.ac.uk>:

...

Dear Paul,

Thank you for your comments.

1) I have asked ICANN staff to contact the IP about a clarification of the sentence.

3) This GP is indeed only for the use of Latin in the root.

The original request for us to comment came from ICANN staff when the Armenian proposal was put up for public comment. Please find it below.

Regards,

Chris.

--

Research Associate in Linguistic Computing, Centre for Digital Humanities, UCL, Gower St, London WC1E 6BT Tel +44 20 7679 1599 (int 31599) www.ucl.ac.uk/dis/people/chrisdillon

*From:* Sarmad Hussain [mailto:sarmad.hussain@icann.org <sarmad.hussain@icann.org>] *Sent:* 02 September 2015 07:25 *To:* Dillon, Chris <c.dillon@ucl.ac.uk>; Eric Brunner-Williams ( ebw@abenaki.wabanaki.net) <ebw@abenaki.wabanaki.net> *Cc:* Alireza Saleh <alireza.saleh@icann.org> *Subject:* Request for comments by Latin GP on Root Zone LGR proposal by Armenian GP

Dear Chris and Eric,

You may be aware that Armenian GP has completed its work and has released the final proposal for Root Zone LGR for the Armenian script. The proposal and its documentation is available for public comments at https://www.icann.org/public-comments/proposal-armenian-lgr-2015-07-22-en as a final step before its evaluation by IP and eventual integration into the Root Zone LGR.

In their proposal documentation they discuss homoglyph relations between Armenian and Latin scripts, listing the following:

=======

զ U+0566 Armenian small letter ZA q U+0071 Latin small letter Q

հ U+0570 Armenian small letter HO h U+0068 Latin small letter H

ո U+0578 Armenian small letter VO n U+006E Latin small letter N

ս U+057D Armenian small letter SHE u U+0075 Latin small letter U

ց U+0581 Armenian small letter CO g U+0067 Latin small letter G

ւ U+0582 Armenian small letter YIWN ɩ U+0269 Latin small letter IOTA

օ U+0585 Armenian small letter OH o U+006F Latin small letter O

Notes: The ARMENIAN SMALL LETTER ZA and the LATIN SMALL LETTER Q are not perfect homoglyphs but the difference may not be perceivable at normal size. A label, such as .զսօ would be readily accepted by users as “the same” as the label .quo (in Latin). The ‘g’ homoglyph situation only exists in sans-serif style, which, however, is a very common choice for user interfaces. Armenian YIWN and Latin IOTA are considered homoglyphs as their visual perception is quite similar. 6 The ARMENIAN LETTER YI and j U+006A LATIN SMALL LETTER J are not considered homoglyphs as “dot” changes the visual perception of the letter. That is why this case is not included into the cross-script variants.

=======

We would request the Latin GP to kindly look at the Armenian proposal documentation and submit a public comment either supporting their proposal or suggesting any changes/additions based on Latin GP point of view. The deadline for public comments is 30 Sept. We look forward to the response by Latin GP.

Best regards,

Sarmad

*From:* Paul Hoffman [mailto:paul.hoffman@icann.org] *Sent:* 17 September 2015 16:03 *To:* Dillon, Chris <c.dillon@ucl.ac.uk>; LatinGP@icann.org *Cc:* Eric Brunner-Williams <ebw@abenaki.wabanaki.net>; Alireza Saleh < alireza.saleh@icann.org>; Sarmad Hussain <sarmad.hussain@icann.org> *Subject:* Re: [Latingp] For consideration at LGP first meeting (Agenda point 4)

1) "Finally, in investigating the possible variant relations, Generation Panels should ignore cases where the relation is based exclusively on aspects of visual similarity."

I think this may mean variant code point relations, but certainly needs clarifying.

Will the chairs ask the Integration Panel about that? Or do we as a GP need to formulate that message?

If my understanding is correct it would mean that Latin o and the identical Armenian letter would not be declared universally as variant code points.

Terms like "identical" are tricky. There is a character in the Armenian script called "Armenian small letter oh" that looks like an italicized "Latin small letter o", but according to Dalby's "Dictionary of Languages", it is *not* the equivalent of the Latin small letter o"; instead, the "Armenian small letter vo" (which looks like an italicized superscripted n) is the equivalent of "Latin small letter o".

2) I shall save my comments on this point, as I would not want to influence people in a particular direction.

3) As you write, and also according to B.3.2 in the *Procedure to develop and maintain the Label Generation Rules for the Root Zone in respect of IDNA labels*, mixed script labels at the Top Level currently seem unlikely. They do exist at lower levels, usually involving the Latin script or Arabic numerals and another script.

Please correct me if I'm wrong, but I thought this GP was for the use of Latin at the root, and our decisions are supposed to be based on that. If I'm right, then we can use the "single script" rule as a way of making our decisions.

If my understanding is correct, the issue is whole Armenian labels that are visually similar to whole Latin labels.

Sure, that could be. However, given that instruction to "ignore cases where the relation is based exclusively on aspects of visual similarity", that doesn't seem relevant. Worse, there are numerous scripts that have characters that will look like a vertical line ("Latin small letter l") or a horizontal line ("hyphen minus" and "low line"). I would hope that each GP with such a character is not expected to communicate with every other GP to find if they have such characters.

I think this is why we are being asked to comment.

Can we see the original request from the Armenian GP or from the Integration Panel? It would help us decide how best to respond.

--Paul Hoffman

_______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp

3881

Age (days ago)

3882

Last active (days ago)

List overview

Download

5 comments

4 participants

participants (4)

Dillon, Chris
Meikal Mumin
Meikal Mumin
Paul Hoffman