Bill I have to completely disagree with your statemnts below.

It is a fact that the LFRs are what they are, and that for that set of rules, a letter and the same letter with a diacritic are not varaints. Nothing else to be discussed here.

But the String Similarity rules in the Applicant Guidebook have an autonomous set of cirteria to assess when a string, a label, a propsoed TLD is confusing with anotehr one or not. IN 2012 .unicorn and .unicom were found confusingly similar and could not proceed both of them. Same for .hotels and .hoteis. I am sure that you don’t consider r+n and m, or i and l as variants. And I am sure you think youcvan perfectly distinguish the characters in both cases. But this was the result of applying the rules backthen.

Additionally, I submit that your asumption that any French-speaking user would never be confused by .quebec and .québec, and that would consier .quebec a misspelling is most likely factually wrong. Most of us users of languages with diaccritics have beensooooo overwhemingly used to NOT use them in domains, be that first level or second level, that this is the DEFAULT behavoiur for the vast, overwhemingly vast, majorituy among us. 

For us, for most of us, i8n most cases, .quebec and .qúebec, plicia.info, polícia.info and policía.info are not different, not even confusinglysimilar: are the SAME thing. At very least, the ASCII version is perfectly identicalto the correct grammatical version with diacritics… simply because it is the default version, the one we see used most foten, allthe time,everywhere in brwoser bars, in urls, and domains, in emails….. Infact a person who does not apeak a language with those diacritics is more likely to notice the diacritc, it catxhes her attention as an odd thing. 

AAll i say above is not an unviersal situation: we all would reson like you propsoe in a text. Everywhere… except in DNS.  

But the main point is the first one: rules and criteria for what is a “real variant” and what is “confusingly similar” as a TLD are different.

Amadeu


El 25 març 2025, a les 17:17, Bill Jouris <b_jouris@yahoo.com> va escriure:


First, I’d like to thank Michael for his patience, at our meeting at ICANN82, as I tried to put my concern about the phrasing of our into words.  Unsuccessfully.  I hope I can do better here. 

When the Latin Generation Panel was creating the RZ-LGR, with just a couple of exceptions we were looking at visual confusable similarity. That is, we compared a code point (ASCII letter or that letter plus one or more diacritics) to another.  If, in a side by side comparison, we found it very difficult to distinguish between them, they were designated variants.  If we could distinguish them, they were not variants.  Note that, when one code point was just an ASCII letter, and the other was that letter plus a diacritic, there were no cases where they were variants.  None.  Zero.

So let's look at the type case which motivated our PDP: Quebec vs Québec.  Clearly e and é are not visually confusable; the Latin GP already ruled on that.  So what are we even looking at?  Why are we here? 
  
If a French-speaking resident of Québec is presented with Quebec (that is, without the diacritic), will it occur to him, even for an instant, that he is looking at a different word?  Never.  He will see it as an error, due to the lack of the diacritic – whether due to ignorance on the part of one of the English speakers that surround Québec in North America, or due to even a French speaker using a keyboard which doesn’t provide for diacritics.  Wrong, but NOT different. They are not visually confusable, but rather functionally interchangeable/confusable. 

The various other examples given in our direction from the GNSO Council all have the same characteristic.  There’s no problem seeing the difference; no visual confusion.  It’s just that the difference, in those cases, is not considered significant by the user.

Now there are doubtless some cases, in some languages, where the addition of a diacritic does result in a different word.  But the question before us is, I think, do we try to go thru all the hundreds of languages which use the Latin script, trying to identify those cases?  Or do we make a rule which can be applied across all cases involving a diacritic, whether they are real words or not?  I submit that the latter is the only feasible course.

Bill Jouris

On Monday, March 24, 2025 at 10:16:37 AM PDT, Amadeu Abril i Abril (CORE) via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> wrote:


Hello all.

I would say “yes” to both your questions.

As I explained in previous mail exchanges, we should not base our decisions and distinctions in linguistic, semantic, morphological or otrhographic criteria. TLDs may be grammaticaly incorrect, or mean jut nothing in no language.

And indeed, what is a diacritic varies from context to context. In linguistic terms, it also varies historically: w was,and still is in some lalnguages, a variant, a diacritic version, of v. And u and v were just variants of the same character, as G amc C….

The point is not whehter they are considered proper charactars in a given alphabet or a modification/diacritic of another character. The point, for us, is that a set of the Latin charactr repoertoire iknon as ASCII does NOT require specicalencoding inthe DNS. While allthe rest does require althe IDN/punycode encoding. So when we say ASCII and diacrtitics we mean ASCII and non-ASCII (if they are similar enough to raise issues under the String Evaluation test which is a requirement for new gTLDs).

And indeed there may be more than one “similar” IDN/non-ASCII/string with diacritics which falls ntothe “too similar to be allowed” scenario, so the solution(s) we are toduscuss mayapply to more than 2 TLDs strings at once. Even if unlikely for practical (including financial) reasons.

Regards.

AAA


> El 24 març 2025, a les 15:14, Tapani Tarvainen via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> va escriure:
>
> Dear all,
>
> In preparation for our next call, I wrote down a few issues
> I'd like to clarify about our scope:
>
> (1) We use the term "diacritics" differently than linguists do.
> If I understand correctly, we're including all non-ASCII characters
> that various Latin-based alphabets use, such as æ, ŋ, ß, ð, þ etc.
>
> (2) The base ASCII gTLD version of a diacritic is for us by definition
> a single ASCII letter with the sole exception of German ß. Other cases
> where the conventional ASCII conversion is two or more letters are not
> treated similarly (as per RZ-LGR). Presumably we could, however, take
> the existence of such as a factor in our recommendations.
>
> E.g., to pick an example from Finnish, consider ".šakki" vs ".sakki",
> should the existence of ".shakki" matter? (Both "šakki" and "sakki"
> are Finnish words with completely unrelated meanings, while "shakki" is
> an alternative spelling of "šakki" - not a computer-era fallback, but
> actually older and still more common spelling even though deprecated
> by modern ortography.)
>
> Or consider "æ", which also looks obviously similar to "ae".
>
> (3) Our first question talks about "a base ASCII gTLD and *the* Latin
> script diacritic version", in singular. It seems pretty clear to me,
> however, that we should also consider cases with multiple diacritic
> versions of the same ASCII letter, like the "sjö" / "sjø" case I
> brought up earlier.
>
> --
> Tapani Tarvainen
> _______________________________________________
> Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org
> To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org

_______________________________________________
Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org
To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org