[Gnso-latin-diacritics] Re: Diacritics definition

March 25, 2026

      Dear all,

I would like to simplify the issue a little, skip all the
difficult cases and consider just these two competing
definitions of diacritical character (simplified here):

(1) Unicode-decomposable
(2) LATIN SMALL LETTER ... WITH

* Both are well-defined and objective, they could be mechanically
derived from Unicode tables.

* Both make it unambiguous what is the matching ASCII letter.
There is no need to evaluate confusability in any subjective way.

* Eligibility change would be equally simple to do mechanically
in either case: character name is no harder to get from Unicode
tables than decomposability is.

* Both are stable in the sense that they're unlikely to change
in future Unicode versions (apart from possible new additions).
Although I think the former is actually more likely to change
(that is, Unicode could specify a decomposition for a character
that doesn't have it now), that's unlikely enough not to matter.

* The latter includes more letters and supports more languages.
In that sense it would also be fairer.

* The latter is more closely aligned with linguistics.

* The latter is easier to understand to non-specialists.

* The former is what we currently have and we'd save ourselves
some time and work by sticking with it.

Am I missing something important?

Having read Mark's scoping exercise, I agree that including all
similarly confusable characters like ŋ or æ would be more difficult
and require extra work, so leaving them for later makes sense.

But it's not a binary choice, narrowest possible or widest possible,
and I think option (2) above would be a good compromise.

-- 
Tapani Tarvainen