Dear all, I support Tapani’s point. The current formulation is too restrictive and insufficiently justified. From a Francophone and African perspective, many characters in the Latin Extended sets, such as ø, are effectively treated as diacritic variations of base letters in real usage. What matters here is confusability and linguistic function, not whether a character is decomposable in Unicode. Unicode does not impose such a limitation. This is therefore a policy choice, and it should be clearly explained in the report. The rationale for excluding non-decomposable forms needs to be made explicit. Tapani’s suggestion to consider the “LATIN SMALL LETTER WITH ” pattern offers a more practical and linguistically grounded approach. Best regards, Emmanuel Le lun. 23 mars 2026 à 09:10, Tapani Tarvainen via Gnso-latin-diacritics < gnso-latin-diacritics@icann.org> a écrit :
Dear all,
Looking at the initial report recommendation 1, it gives the chosen definition of diacritics as a fact, without explaining or justifying it in any way.
It should, so that if someone wonders why ø or ŋ are treated differently they should be able to look at the report and find an eplanation. I also note the ISOC Norway comment that wants a technical reason for the exclusion of characters used in Norwegian and Sami.
I tried to think of how to explain it.
As I recall, the argument was that the charter specifies diacritics only and Unicode defines them that way and if we want to include something else we'd have to go back to the council and there's no time for that.
But does Unicode really define them that way?
I went to the source to see what Unicode actually says about the matter, and importantly also what it does *not* say.
Not an easy task - there's *a lot* of stuff in there.
But try as I might, I could not find a general definition of "diacritic" anywhere. Perhaps I've missed it - if so, feel free to point it out.
There's lots of talk about combining diacritic marks and decomposable diacritics - but no claim that those are the only kind of diacritics. It's simply that for Unicode's purposes the non-decomposable diacritics don't need any special treatment, so there's no need to even define them.
For Unicode, decomposable diacritics are special because they need special rules for canonicalization. But for our purposes that doesn't matter.
In effect we took the Unicode table of decomposable diacritics and decided to treat it as if those were the only diacritics.
This decision was not given to us by the charter nor by Unicode, it was our choice.
I see no good reason not to allow all Unicode characters described in Latin-1 Supplement or any of the Latin Extended tables as
"LATIN SMALL LETTER <x> WITH <y>"
where <x> is an ASCII letter and <y> is any glyph that modifies the base letter (which is the general meaning of "diacritic mark") in a way that makes them confusable, regardless of the existence of a decomposed canonical equivalent, as long as they are in the RZ-LGR.
My favorite example here is of course "latin small letter with stroke", ø.
If we conclude that there is a good reason for excluding such non-decomposable diacritical characters, we should spell it out.
I don't see any good *technical* reason to exclude even characters like ð, þ, ŋ, æ etc, but they are not diacritical and thus outside of the charter, and in some cases might require aditional considerations, so I guess we'd have to back to the council for those, or wait for another PDP ("Extended Latin Characters PDP" perhaps?).
(Side note: during a coffee break discussion in Mumbai someone said old issues shouldn't be relitigated *unless* there're new facts or new viewpoints to consider. Here I think there are.)
-- Tapani Tarvainen _______________________________________________ Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org