Hello all. I would say “yes” to both your questions. As I explained in previous mail exchanges, we should not base our decisions and distinctions in linguistic, semantic, morphological or otrhographic criteria. TLDs may be grammaticaly incorrect, or mean jut nothing in no language. And indeed, what is a diacritic varies from context to context. In linguistic terms, it also varies historically: w was,and still is in some lalnguages, a variant, a diacritic version, of v. And u and v were just variants of the same character, as G amc C…. The point is not whehter they are considered proper charactars in a given alphabet or a modification/diacritic of another character. The point, for us, is that a set of the Latin charactr repoertoire iknon as ASCII does NOT require specicalencoding inthe DNS. While allthe rest does require althe IDN/punycode encoding. So when we say ASCII and diacrtitics we mean ASCII and non-ASCII (if they are similar enough to raise issues under the String Evaluation test which is a requirement for new gTLDs). And indeed there may be more than one “similar” IDN/non-ASCII/string with diacritics which falls ntothe “too similar to be allowed” scenario, so the solution(s) we are toduscuss mayapply to more than 2 TLDs strings at once. Even if unlikely for practical (including financial) reasons. Regards. AAA
El 24 març 2025, a les 15:14, Tapani Tarvainen via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> va escriure:
Dear all,
In preparation for our next call, I wrote down a few issues I'd like to clarify about our scope:
(1) We use the term "diacritics" differently than linguists do. If I understand correctly, we're including all non-ASCII characters that various Latin-based alphabets use, such as æ, ŋ, ß, ð, þ etc.
(2) The base ASCII gTLD version of a diacritic is for us by definition a single ASCII letter with the sole exception of German ß. Other cases where the conventional ASCII conversion is two or more letters are not treated similarly (as per RZ-LGR). Presumably we could, however, take the existence of such as a factor in our recommendations.
E.g., to pick an example from Finnish, consider ".šakki" vs ".sakki", should the existence of ".shakki" matter? (Both "šakki" and "sakki" are Finnish words with completely unrelated meanings, while "shakki" is an alternative spelling of "šakki" - not a computer-era fallback, but actually older and still more common spelling even though deprecated by modern ortography.)
Or consider "æ", which also looks obviously similar to "ae".
(3) Our first question talks about "a base ASCII gTLD and *the* Latin script diacritic version", in singular. It seems pretty clear to me, however, that we should also consider cases with multiple diacritic versions of the same ASCII letter, like the "sjö" / "sjø" case I brought up earlier.
-- Tapani Tarvainen _______________________________________________ Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org