[Gnso-latin-diacritics] Scope issues

March 24, 2025

      Dear all,

In preparation for our next call, I wrote down a few issues
I'd like to clarify about our scope:

(1) We use the term "diacritics" differently than linguists do.
If I understand correctly, we're including all non-ASCII characters
that various Latin-based alphabets use, such as æ, ŋ, ß, ð, þ etc.

(2) The base ASCII gTLD version of a diacritic is for us by definition
a single ASCII letter with the sole exception of German ß. Other cases
where the conventional ASCII conversion is two or more letters are not
treated similarly (as per RZ-LGR). Presumably we could, however, take
the existence of such as a factor in our recommendations.

E.g., to pick an example from Finnish, consider ".šakki" vs ".sakki",
should the existence of ".shakki" matter? (Both "šakki" and "sakki"
are Finnish words with completely unrelated meanings, while "shakki" is
an alternative spelling of "šakki" - not a computer-era fallback, but
actually older and still more common spelling even though deprecated
by modern ortography.)

Or consider "æ", which also looks obviously similar to "ae".

(3) Our first question talks about "a base ASCII gTLD and *the* Latin
script diacritic version", in singular. It seems pretty clear to me,
however, that we should also consider cases with multiple diacritic
versions of the same ASCII letter, like the "sjö" / "sjø" case I
brought up earlier.

-- 
Tapani Tarvainen