Dear All,

This is indeed a complex matter to address, and is therefore requiring this continued discussion. It may also be useful here to refer back to the RZ-LGR Procedure.

The RZ-LGR Procedure, while defining “IDN variants” says that:

“An IDN variant, as understood here, is an alternate code point (or sequence of code points) that could be substituted for a code point (or sequence of code points) in a candidate label to create a variant label that is considered the “same” in some measure by a given community of Internet users.”

However, the Procedure also acknowledges immediately following the definition that:

“There is not general agreement of what that sameness requires, and many of the things people seem to want from that sameness are not technically achievable.”

While noting the benefits of defining IDN variants, the procedure also acknowledges the limitations.

“The primary benefit of the LGR process is as a mechanism that delivers hands-off evaluation for these aspects.
“By doing so, the process may not be able to replace case-by-case analysis altogether: there will still be a role for additional types of review, such as for String Similarity, and which are not included in the LGR process.”

So, not all matters can be settled in the LGR. A line has to be drawn between “same” and “similar”.

The LGR Procedure does note what is desirable to be in the scope to LGR:

“the LGR process is designed to clear the table of all the straightforward, non-subjective cases, mainly by returning a “blocked” disposition.
“Even for variants based on visual similarity, there exists a subset of evaluation rules that could be applied in an automated manner, obviating the need for further case-by case or even contextual review.”

But notes that this should not go too far into the string similarity discussion:

“While the process described here could be expanded to address cases of visual similarity, that is not the primary intention”
“Finally, in investigating the possible variant relations, Generation Panels should ignore cases where the relation is based exclusively on aspects of visual similarity.”

One could infer from these statements in the RZ-LGR Procedure that:

If two code points are considered “same” by the user community, these should be included as IDN variants (this is not limited to visual similarity, but could also include semantic equivalence, like in Chinese, orthographic conventions or spelling simplification, like in Arabic, homophonic relations, like in Ethiopic, etc., as determined the respective script community)
The “straightforward, non-subjective cases” of visual similarity could be included as IDN variants and blocked
Beyond these, the analysis goes into the realm of string similarity review, which is beyond the intention of the LGR

Generation Panels have been asked to draw the line based on these guidelines provided in the RZ-LGR Procedure. For example, Cyrillic GP agreed to consider homoglyph relations with other related scripts for this purpose. Neo-Brahmi GP has used a slightly different technique, where it considers cross-script variants those code points which members of both scripts in question find such code points “indistinguishable” even if these are not homoglyphs (see the blog for some more details).

Of course, the Latin GP also needs to draw these lines for the analysis for identifying within-script and cross-script IDN variant cases.

Regards,
Sarmad

From: Latingp [mailto:latingp-bounces@icann.org] On Behalf Of Bill Jouris
Sent: Saturday, May 19, 2018 5:28 AM
To: Tan Tanaka, Dennis <dtantanaka@verisign.com>; Meikal Mumin <meikal@mumin.de>
Cc: Tan Tanaka, Dennis via Latingp <latingp@icann.org>
Subject: Re: [Latingp] Variant cross-script analysis worksheets

It's been clear for some time, even before Brussels, that you think we should only look at homoglyphs. (Also that you don't think that there are any in-script homoglyphs. See the discussion about the schwa and the turned e.)

But there is a world of difference between agreeing, and merely deciding not to waste time arguing with a closed mind. Which, for me, is what happened in the discussion in Brussels.

Bill Jouris
Inside Products
bill.jouris@insidethestack.com
831-659-8360
925-855-9512 (direct)

From: "Tan Tanaka, Dennis" <dtantanaka@verisign.com>
To: Bill Jouris <bill.jouris@insidethestack.com>; Meikal Mumin <meikal@mumin.de>
Cc: Michael Bauland <Michael.Bauland@knipp.de>; "Tan Tanaka, Dennis via Latingp" <latingp@icann.org>
Sent: Friday, May 18, 2018 1:43 PM
Subject: Re: [Latingp] Variant cross-script analysis worksheets

I believe we delimited the scope of variants for the Latin script in the face to face meeting in Brussels, did we not?

From: Bill Jouris <bill.jouris@insidethestack.com>
Reply-To: Bill Jouris <bill.jouris@insidethestack.com>
Date: Friday, May 18, 2018 at 2:18 PM
To: Dennis Tan Tanaka <dtantanaka@verisign.com>, Meikal Mumin <meikal@mumin.de>
Cc: Michael Bauland <Michael.Bauland@knipp.de>, "Tan Tanaka, Dennis via Latingp" <latingp@icann.org>
Subject: [EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets

It is pretty clear, if one reads the MSR-3 document, that we are supposed to deal with Variants. Which include, but are NOT limited to, homoglyphs.

Bill Jouris
Inside Products
bill.jouris@insidethestack.com
831-659-8360
925-855-9512 (direct)

From: "Tan Tanaka, Dennis" <dtantanaka@verisign.com>
To: Meikal Mumin <meikal@mumin.de>
Cc: "bill.jouris@insidethestack.com" <bill.jouris@insidethestack.com>; Michael Bauland <Michael.Bauland@knipp.de>; "Tan Tanaka, Dennis via Latingp" <latingp@icann.org>
Sent: Friday, May 18, 2018 10:20 AM
Subject: Re: [Latingp] Variant cross-script analysis worksheets

we must deal with such confusable characters or sequences of characters in the context of variants

No, we don’t. Confusability is not in scope. We established the Latin panel will deal with homoglyphs or nearly homoglyphs (i.e. font variation) in the context of cross-scripts.