Variant discussion and samples

March 22, 2018

      Dear all,

Here is the information I shared during the meeting and rewriting again for clarification:
……………………

Adding on Mats intervention regards CJK and IP meeting. The CJK has been conducting the variants analysis by using the semantic methodology e.g. 国 and 國 are variants due to their same meaning and same pronunciation. However, IP raised that CJK GP should also include the sameness of the glyph as well. Therefore, the IP pointed out to the text in the LGR procedure https://www.icann.org/en/system/files/files/draft-lgr-procedure-20mar13-en.p...

A3.2.  Variants
… An IDN variant, as understood here, is an alternate code point (or sequence of code points) that could be substituted for a code point (or sequence of code points)in a candidate label to create a variant label that is considered the “same” in some measure by a given community of Internet users. There is not general agreement of what that sameness requires, and many of the things people seem to want from that sameness are not technically achievable.

The IP shared the ‘Sameness’ is depend on the GP analysis.

The IP also suggested not using the word ‘visual similarity’ as it is interpreted in this context as not primary objective of the LGR.

A.3.3.Benefits of a well-defined LGR process
There are a number of aspects of label and variant evaluation for root zone allocation that are not properly the subject of case-by-case analysis. The primary benefit of the LGR process is as a mechanism that delivers hands-off evaluation for these aspects. By doing so, the process may not be able to replace case-by-case analysis altogether: there will still be a role for additional types of review, such as for String Similarity, and which are not included in the LGR process. Instead the LGR process is designed to clear the table of all the straightforward, non-subjective cases, mainly by returning a “blocked” disposition(see Section B.1). Even for variants based on visual similarity, there exists a subset of evaluation rules that could be applied in an automated manner, obviating the need for further case-by-case or even contextual review. The LGR process as drafted would allow such rules, but only within the bounds established for the types of evaluation rules permissible under this process. While the process described here could be expanded to address cases of visual similarity, that is not the primary intention; if such rules are desirable, they should be a result of evolution of the string similarity rules.

At one end, the homoglyph or identical case should be defined as variants as stated in A.3.3. “…the LGR process is designed to clear the table of all the straightforward, non-subjective cases”.
On the other end, the ‘visual similarity’ could be the role of other mechanism as also stated in A3.3.3. “… the (LGR) process may not be able to replace case-by-case analysis altogether: there will still be a role for additional types of review, such as for String Similarity”
…………………..

Besides that, I’d like to share some sample from other scripts – Devanagari and Gurmukhi, the below two pairs are very similar. One is defined as variant, included in the LGR XML, while the other is only added to the appendix for further use by String Similarity panel, and will not be included in the LGR XML.

Variant set:

Devanagari

Gurmukhi

प्टे (092A 094D 091F 0947)

ਏ (0A0F)

Appendix set:

Devanagari

Gurmukhi

प्टॆ (092A 094D 091F 0946)

ਏ (0A0F)

The rational is because the Devanagari users could distinguish the second case, but not the first case. The NBGP identify the identical cases as a baseline variant sets, then the indistinguishable cases were also included.
The decision of distinguishable or not for each pair was made by the GP discussion over the online meeting.  You can find the full list of Neo-Brahmi cross-script variants analysis at NBGP wiki page or here (work in progress) https://community.icann.org/download/attachments/53776300/NBGP-cross-script-... .

This might be one of the possible methods other GPs could use. However, please note that the NBGP proposals has not been published,  and the final decision on analyzing methodology is depend on each GP.

Regards,
Pitinan

Pitinan Kooarmornpatana

tags

participants (1)