[Gnso-latin-diacritics] Re: Diacritics definition

March 31, 2026

      Dear all,

Let's consider three alternatives:

(1) Decomposable diacritics only.

This is technically easy, single, straightforward rule. The downside
is that it excludes several diacritical letters.

(2) Include also "SMALL LATIN LETTER [A-Z] WITH ..."

Also technically easy (how long did it take Mark to add this to his
tool?), only a bit more complicated in that we'd need two different
rules. It provides unambiguous, machine-testable set of letters and
relationships between the diacritics and their ASCII counterparts, and
it adds 15 letters that are used in at least 35 languages.

(3) Add also letters that aren't real diacritics, like ŋ, æ, ð, þ etc,
but that could still be handled the same way. These would need to be
evaluated individually and automated processing would have to be based
on tables. It would, however, cover even more languages.

I would argue that (2) is closest match to our charter as it
covers almost(?) all diacritics.

I would also argue that those extra letters and languages
supported by (2) are at least symbolically significant.

I do not see a significant difference in technical difficulty
between (1) and (2), but I am open to persuasion here.
Staff input would be appreciated.

I don't think (3) would be technically all that difficult either,
but it would certainly be at least stretching our mandate and
it would require significant amount of extra work.

Looking forward to an interesting discussion tomorrow,

-- 
Tapani Tarvainen