Dear all, since there will be no Spanish interpretation on our call today (which I consider a truly serious error, highlighting a complex bias and inequity in the debates), I will now make a contribution to the discussion of our document.
Undoubtedly, it may not be of much relevance to those who do not use diacritical characters in their native language, but for those of us who embrace Latin languages (Hispanic, French, Portuguese, etc.), the diacritical accent is of paramount importance.
As Bill rightly points out, perhaps in some words we only think they are misspelled (with spelling mistakes) and it may be as sensitizing as the example he gives or Québéc, but in others, it radically changes the word and its meaning.
To give an example (there are hundreds), one can write "Revólver" or "revolver" in Spanish.
In the case of "revolver" (it's a verb that means to move something from side to side, around, or up and down. It can also mean to mix or shake. For example: stir my coffee with a spoon)
In the case of a revólver: it's a short firearm whose ammunition is housed in a rotating drum.
As you can see, we're not talking about the same thing.
Our challenge is for the internet to evolve toward a true internationalization of the DNS and incorporate these diacritical characters, also considering Ñ, Ç, ü, etc., and considering cases with multiple diacritics.
This implies moving toward an 8-bit (1-byte) character code that can represent up to 256 different characters, instead of the 7-bit code we currently use for 128 characters.
We could take ISO/IEC 8859-1, which contains the Spanish alphabet, as an example.
The key to this is the addition of that check bit called the Parity Bit, which allows the character size to be changed from 128 to 256.
My two cents

Sergio Salinas Porto
Presidente Internauta Argentina - LACRALO/ICANN
Asociación Argentina de Usuarios de Internet/FeTIA
FUILAC- Federación de Usuarios de Internet de LAC
facebook: salinasporto 
twitter:  sergiosalinas
Mobi:+54 9 223 5 215819
"Ojalá podamos ser desobedientes, cada vez que recibimos órdenes que humillan nuestra 
 conciencia o violan nuestro sentido común" Eduardo Galeano

El mar, 25 mar 2025 a las 13:17, Bill Jouris via Gnso-latin-diacritics (<gnso-latin-diacritics@icann.org>) escribió:

First, I’d like to thank Michael for his patience, at our meeting at ICANN82, as I tried to put my concern about the phrasing of our into words. Unsuccessfully. I hope I can do better here.

When the Latin Generation Panel was creating the RZ-LGR, with just a couple of exceptions we were looking at visual confusable similarity. That is, we compared a code point (ASCII letter or that letter plus one or more diacritics) to another. If, in a side by side comparison, we found it very difficult to distinguish between them, they were designated variants. If we could distinguish them, they were not variants. Note that, when one code point was just an ASCII letter, and the other was that letter plus a diacritic, there were no cases where they were variants. None. Zero.

So let's look at the type case which motivated our PDP: Quebec vs Québec. Clearly e and é are not visually confusable; the Latin GP already ruled on that. So what are we even looking at? Why are we here?

If a French-speaking resident of Québec is presented with Quebec (that is, without the diacritic), will it occur to him, even for an instant, that he is looking at a different word? Never. He will see it as an error, due to the lack of the diacritic – whether due to ignorance on the part of one of the English speakers that surround Québec in North America, or due to even a French speaker using a keyboard which doesn’t provide for diacritics. Wrong, but NOT different. They are not visually confusable, but rather functionally interchangeable/confusable.

The various other examples given in our direction from the GNSO Council all have the same characteristic. There’s no problem seeing the difference; no visual confusion. It’s just that the difference, in those cases, is not considered significant by the user.

Now there are doubtless some cases, in some languages, where the addition of a diacritic does result in a different word. But the question before us is, I think, do we try to go thru all the hundreds of languages which use the Latin script, trying to identify those cases? Or do we make a rule which can be applied across all cases involving a diacritic, whether they are real words or not? I submit that the latter is the only feasible course.
Bill Jouris

On Monday, March 24, 2025 at 10:16:37 AM PDT, Amadeu Abril i Abril (CORE) via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> wrote:

Hello all.

I would say “yes” to both your questions.

As I explained in previous mail exchanges, we should not base our decisions and distinctions in linguistic, semantic, morphological or otrhographic criteria. TLDs may be grammaticaly incorrect, or mean jut nothing in no language.

And indeed, what is a diacritic varies from context to context. In linguistic terms, it also varies historically: w was,and still is in some lalnguages, a variant, a diacritic version, of v. And u and v were just variants of the same character, as G amc C….

The point is not whehter they are considered proper charactars in a given alphabet or a modification/diacritic of another character. The point, for us, is that a set of the Latin charactr repoertoire iknon as ASCII does NOT require specicalencoding inthe DNS. While allthe rest does require althe IDN/punycode encoding. So when we say ASCII and diacrtitics we mean ASCII and non-ASCII (if they are similar enough to raise issues under the String Evaluation test which is a requirement for new gTLDs).

And indeed there may be more than one “similar” IDN/non-ASCII/string with diacritics which falls ntothe “too similar to be allowed” scenario, so the solution(s) we are toduscuss mayapply to more than 2 TLDs strings at once. Even if unlikely for practical (including financial) reasons.

Regards.

AAA

> El 24 març 2025, a les 15:14, Tapani Tarvainen via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> va escriure:
>
> Dear all,
>
> In preparation for our next call, I wrote down a few issues
> I'd like to clarify about our scope:
>
> (1) We use the term "diacritics" differently than linguists do.
> If I understand correctly, we're including all non-ASCII characters
> that various Latin-based alphabets use, such as æ, ŋ, ß, ð, þ etc.
>
> (2) The base ASCII gTLD version of a diacritic is for us by definition
> a single ASCII letter with the sole exception of German ß. Other cases
> where the conventional ASCII conversion is two or more letters are not
> treated similarly (as per RZ-LGR). Presumably we could, however, take
> the existence of such as a factor in our recommendations.
>
> E.g., to pick an example from Finnish, consider ".šakki" vs ".sakki",
> should the existence of ".shakki" matter? (Both "šakki" and "sakki"
> are Finnish words with completely unrelated meanings, while "shakki" is
> an alternative spelling of "šakki" - not a computer-era fallback, but
> actually older and still more common spelling even though deprecated
> by modern ortography.)
>
> Or consider "æ", which also looks obviously similar to "ae".
>
> (3) Our first question talks about "a base ASCII gTLD and *the* Latin
> script diacritic version", in singular. It seems pretty clear to me,
> however, that we should also consider cases with multiple diacritic
> versions of the same ASCII letter, like the "sjö" / "sjø" case I
> brought up earlier.
>
> --
> Tapani Tarvainen
> _______________________________________________
> Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org
> To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org

_______________________________________________
Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org
To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org

_______________________________________________
Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org
To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org