Amadeu, I definitely need to work on my communication skills. Because, if I'm understanding your third and fourth paragraphs correctly, we are precisely in agreement -- that, for users of languages with diacritics, TLDs with and without the diacritics are, however regretably, seen as the same thing. Although, obviously, I may be wrong in what I think you are saying. Bill Sent from Yahoo Mail on Android On Tue, Mar 25, 2025 at 11:15 AM, Amadeu Abril i Abril (CORE)<amadeu.abril@corenic.org> wrote: Bill I have to completely disagree with your statemnts below. It is a fact that the LFRs are what they are, and that for that set of rules, a letter and the same letter with a diacritic are not varaints. Nothing else to be discussed here. But the String Similarity rules in the Applicant Guidebook have an autonomous set of cirteria to assess when a string, a label, a propsoed TLD is confusing with anotehr one or not. IN 2012 .unicorn and .unicom were found confusingly similar and could not proceed both of them. Same for .hotels and .hoteis. I am sure that you don’t consider r+n and m, or i and l as variants. And I am sure you think youcvan perfectly distinguish the characters in both cases. But this was the result of applying the rules backthen. Additionally, I submit that your asumption that any French-speaking user would never be confused by .quebec and .québec, and that would consier .quebec a misspelling is most likely factually wrong. Most of us users of languages with diaccritics have beensooooo overwhemingly used to NOT use them in domains, be that first level or second level, that this is the DEFAULT behavoiur for the vast, overwhemingly vast, majorituy among us. For us, for most of us, i8n most cases, .quebec and .qúebec, plicia.info, polícia.info and policía.info are not different, not even confusinglysimilar: are the SAME thing. At very least, the ASCII version is perfectly identicalto the correct grammatical version with diacritics… simply because it is the default version, the one we see used most foten, allthe time,everywhere in brwoser bars, in urls, and domains, in emails….. Infact a person who does not apeak a language with those diacritics is more likely to notice the diacritc, it catxhes her attention as an odd thing. AAll i say above is not an unviersal situation: we all would reson like you propsoe in a text. Everywhere… except in DNS. But the main point is the first one: rules and criteria for what is a “real variant” and what is “confusingly similar” as a TLD are different. Amadeu El 25 març 2025, a les 17:17, Bill Jouris <b_jouris@yahoo.com> va escriure: First, I’d like to thank Michael for his patience, at our meetingat ICANN82, as I tried to put my concern about the phrasing of our intowords. Unsuccessfully. I hope I can do better here. When the Latin Generation Panel was creating the RZ-LGR, withjust a couple of exceptions we were looking at visual confusable similarity.That is, we compared a code point (ASCII letter or that letter plus one ormore diacritics) to another. If, in aside by side comparison, we found it very difficult to distinguish between them,they were designated variants. If wecould distinguish them, they were not variants. Note that, when one code point was just an ASCII letter, and the otherwas that letter plus a diacritic, there were no cases where they were variants. None. Zero. So let's look at the type case which motivated our PDP: Quebecvs Québec. Clearly e and é are not visuallyconfusable; the Latin GP already ruled on that. So what are we even looking at? Why are we here? If a French-speaking resident of Québec is presented withQuebec (that is, without the diacritic), will it occur to him, even for an instant, that he is looking at a differentword? Never. He will see it as an error, due to the lack ofthe diacritic – whether due to ignorance on the part of one of the English speakers that surround Québecin North America, or due to even a French speaker using a keyboard which doesn’tprovide for diacritics. Wrong, but NOTdifferent. They are not visually confusable, but rather functionallyinterchangeable/confusable. The various other examples given in our directionfrom the GNSO Council all have the same characteristic. There’s no problem seeing thedifference; no visual confusion. It’s just that thedifference, in those cases, is not considered significant by the user. Now there are doubtless some cases, in some languages, where the addition of adiacritic does result in a different word. But the question before us is, I think, do we try to go thru all thehundreds of languages which use the Latin script, trying to identify thosecases? Or do we make a rule which can beapplied across all cases involving a diacritic, whether they are real words or not? I submit that the latter is the only feasible course. Bill Jouris On Monday, March 24, 2025 at 10:16:37 AM PDT, Amadeu Abril i Abril (CORE) via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> wrote: Hello all. I would say “yes” to both your questions. As I explained in previous mail exchanges, we should not base our decisions and distinctions in linguistic, semantic, morphological or otrhographic criteria. TLDs may be grammaticaly incorrect, or mean jut nothing in no language. And indeed, what is a diacritic varies from context to context. In linguistic terms, it also varies historically: w was,and still is in some lalnguages, a variant, a diacritic version, of v. And u and v were just variants of the same character, as G amc C…. The point is not whehter they are considered proper charactars in a given alphabet or a modification/diacritic of another character. The point, for us, is that a set of the Latin charactr repoertoire iknon as ASCII does NOT require specicalencoding inthe DNS. While allthe rest does require althe IDN/punycode encoding. So when we say ASCII and diacrtitics we mean ASCII and non-ASCII (if they are similar enough to raise issues under the String Evaluation test which is a requirement for new gTLDs). And indeed there may be more than one “similar” IDN/non-ASCII/string with diacritics which falls ntothe “too similar to be allowed” scenario, so the solution(s) we are toduscuss mayapply to more than 2 TLDs strings at once. Even if unlikely for practical (including financial) reasons. Regards. AAA
El 24 març 2025, a les 15:14, Tapani Tarvainen via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> va escriure:
Dear all,
In preparation for our next call, I wrote down a few issues I'd like to clarify about our scope:
(1) We use the term "diacritics" differently than linguists do. If I understand correctly, we're including all non-ASCII characters that various Latin-based alphabets use, such as æ, ŋ, ß, ð, þ etc.
(2) The base ASCII gTLD version of a diacritic is for us by definition a single ASCII letter with the sole exception of German ß. Other cases where the conventional ASCII conversion is two or more letters are not treated similarly (as per RZ-LGR). Presumably we could, however, take the existence of such as a factor in our recommendations.
E.g., to pick an example from Finnish, consider ".šakki" vs ".sakki", should the existence of ".shakki" matter? (Both "šakki" and "sakki" are Finnish words with completely unrelated meanings, while "shakki" is an alternative spelling of "šakki" - not a computer-era fallback, but actually older and still more common spelling even though deprecated by modern ortography.)
Or consider "æ", which also looks obviously similar to "ae".
(3) Our first question talks about "a base ASCII gTLD and *the* Latin script diacritic version", in singular. It seems pretty clear to me, however, that we should also consider cases with multiple diacritic versions of the same ASCII letter, like the "sjö" / "sjø" case I brought up earlier.
-- Tapani Tarvainen _______________________________________________ Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org
Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org