[Gnso-latin-diacritics] Re: Scope issues

March 25, 2025

      First, I’d like to thank Michael for his patience, at our meetingat ICANN82, as I tried to put my concern about the phrasing of our intowords.  Unsuccessfully.  I hope I can do better here. 
When the Latin Generation Panel was creating the RZ-LGR, withjust a couple of exceptions we were looking at visual confusable similarity.That is, we compared a code point (ASCII letter or that letter plus one ormore diacritics) to another.  If, in aside by side comparison, we found it very difficult to distinguish between them,they were designated variants.  If wecould distinguish them, they were not variants. Note that, when one code point was just an ASCII letter, and the otherwas that letter plus a diacritic, there were no cases where they were variants.  None. Zero. 
So let's look at the type case which motivated our PDP: Quebecvs Québec.  Clearly e and é are not visuallyconfusable; the Latin GP already ruled on that.  So what are we even looking at?  Why are we here?   If a French-speaking resident of Québec is presented withQuebec (that is, without the diacritic), will it occur to him, even for an instant, that he is looking at a differentword?  Never.  He will see it as an error, due to the lack ofthe diacritic – whether due to ignorance on the part of one of the English speakers that surround Québecin North America, or due to even a French speaker using a keyboard which doesn’tprovide for diacritics.  Wrong, but NOTdifferent. They are not visually confusable, but rather functionallyinterchangeable/confusable. 
The various other examples given in our directionfrom the GNSO Council all have the same characteristic.  There’s no problem seeing thedifference; no visual confusion.  It’s just that thedifference, in those cases, is not considered significant by the user. 
Now there are doubtless some cases, in some languages, where the addition of adiacritic does result in a different word.  But the question before us is, I think, do we try to go thru all thehundreds of languages which use the Latin script, trying to identify thosecases?  Or do we make a rule which can beapplied across all cases involving a diacritic, whether they are real words or not? I submit that the latter is the only feasible course.
Bill Jouris
    On Monday, March 24, 2025 at 10:16:37 AM PDT, Amadeu Abril i Abril (CORE) via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> wrote:  

 Hello all.

I would say “yes” to both your questions.

As I explained in previous mail exchanges, we should not base our decisions and distinctions in linguistic, semantic, morphological or otrhographic criteria. TLDs may be grammaticaly incorrect, or mean jut nothing in no language.

And indeed, what is a diacritic varies from context to context. In linguistic terms, it also varies historically: w was,and still is in some lalnguages, a variant, a diacritic version, of v. And u and v were just variants of the same character, as G amc C….

The point is not whehter they are considered proper charactars in a given alphabet or a modification/diacritic of another character. The point, for us, is that a set of the Latin charactr repoertoire iknon as ASCII does NOT require specicalencoding inthe DNS. While allthe rest does require althe IDN/punycode encoding. So when we say ASCII and diacrtitics we mean ASCII and non-ASCII (if they are similar enough to raise issues under the String Evaluation test which is a requirement for new gTLDs).

And indeed there may be more than one “similar” IDN/non-ASCII/string with diacritics which falls ntothe “too similar to be allowed” scenario, so the solution(s) we are toduscuss mayapply to more than 2 TLDs strings at once. Even if unlikely for practical (including financial) reasons.

Regards.

AAA
...
El 24 març 2025, a les 15:14, Tapani Tarvainen via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> va escriure:
Dear all,
In preparation for our next call, I wrote down a few issues
I'd like to clarify about our scope:
(1) We use the term "diacritics" differently than linguists do.
If I understand correctly, we're including all non-ASCII characters
that various Latin-based alphabets use, such as æ, ŋ, ß, ð, þ etc.
(2) The base ASCII gTLD version of a diacritic is for us by definition
a single ASCII letter with the sole exception of German ß. Other cases
where the conventional ASCII conversion is two or more letters are not
treated similarly (as per RZ-LGR). Presumably we could, however, take
the existence of such as a factor in our recommendations.
E.g., to pick an example from Finnish, consider ".šakki" vs ".sakki",
should the existence of ".shakki" matter? (Both "šakki" and "sakki"
are Finnish words with completely unrelated meanings, while "shakki" is
an alternative spelling of "šakki" - not a computer-era fallback, but
actually older and still more common spelling even though deprecated
by modern ortography.)
Or consider "æ", which also looks obviously similar to "ae".
(3) Our first question talks about "a base ASCII gTLD and *the* Latin
script diacritic version", in singular. It seems pretty clear to me,
however, that we should also consider cases with multiple diacritic
versions of the same ASCII letter, like the "sjö" / "sjø" case I
brought up earlier.
--
Tapani Tarvainen
_______________________________________________
Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org
To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org

Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org
To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org