Dear colleagues,

Thanks for these additional sources, Dennis.

I also looked up the minutes from Brussels, which state that "GP expects that variants in the Latin Script for the RZ LGR be limited to homoglyphs (i.e., visual identical characters)". The issue I see here is that the term homoglyph suggest a categorical difference, which linguistically does not exist in my opinion. In other words, there is no systematic way of telling apart characters "of merely similar appearance" from "characters with essentially identical appearance by design" in the context of Latin script.  There is no categorical difference between a visually identical character vs. a visually similar character, but such a categorization is an arbitrary choice out of a continuum, and that continuum is shaped based on different factors including font, context, the user and the user's education, the user's client, as well as technical, linguistic, and neurological aspects relating to human perception.

As for the former email date October 18, 2017, regarding the theory about disunification by script property - I believe that this is an ahistorical view, which may be correct only from a synchronic point of view, that is from today's point of view only . Fact of the matter is that for historical reasons, everything is inter-related so that visual similarity is not accidental but - in most cases - because of a historical connection. All these different shapes were once one and the same, and minute differences once considered mere variation by the hands of individual writers have come to form the basis for forming new letters (which is precisely what I was looking to demonstrate in the draft text of the proposal which the group considered irrelevant because it was discussing the history of the script).

So in my view, sticking to "homogylphs" does not help us because the term is void of meaning in a linguistic context. If there is a modern definition in a technical context, I would appreciate if anyone could quote it for me.

I hope this helps contextualize the facts from a linguistic point of view.

Best,

Meikal

On 30 May 2018 at 22:08, Tan Tanaka, Dennis <dtantanaka@verisign.com> wrote:

From: Meikal Mumin <meikal.mumin@uni-koeln.de>
Date: Tuesday, May 29, 2018 at 10:14 AM
To: Bill Jouris <bill.jouris@insidethestack.com>, Dennis Tan Tanaka <dtantanaka@verisign.com>, Michael Bauland <Michael.Bauland@knipp.de>, Sarmad Hussain <sarmad.hussain@icann.org>
Cc: Latin GP <LatinGP@icann.org>
Subject: [EXTERNAL] Re: [Latingp] Variant cross-script analysis worksheets

 

My conclusion is that it is more complex than reducing things to "homoglyphs" but I do not think that (at least linguistically) we have a strong definition of homoglyphs

 

 

On homoglyphs the Latin GP has received the following guidance from IP, in writing and verbally (during the Brussels workshop)

 

“In the context of the Root Zone, the Procedure is quite clear in that it considers simple similarity of appearance to be outside the scope of the Root Zone LGR. In admitting exact homoglyphs, the IP has been making the argument that ‘e’ in Latin (U+0065) and ‘е’ in Cyrillic(U+0435) are not just visually indistinguishable, but that their distinct code points effectively represent a disunification by script property.” – Email from IP to Latin GP of 18 October 2017 in response to our draft Principles for Inclusion and Exclusion of Code Points in Latin Script for the Root Zone, and in particular to our Analysis of Variants in the Latin Script for the Root Zone.

 

“The kinds of variants to be defined in the Root Zone LGR are limited to homoglyphs, which are characters with essentially identical appearance by design, instead of merely similar appearance.” – Integration Panel feedback to Latin GP proposal of 22 March 2017.