Tapani,

You are correct that Unicode does not define a diacritic solely by decomposability.

However, canonical decomposition is the only Unicode mechanism that encodes a relationship (and note that the word relationship is central here) between a precomposed character and a base-plus-mark sequence. In Unicode terms, characters with canonical decomposition mappings in 'UnicodeData.txt' have 'Decomposition_Type=Canonical', and normalization exposes that relationship through NFD.

By contrast, the 'Diacritic' property is informative, and Unicode expressly notes that some diacritics are not combining characters and some combining characters are not diacritics. So it's not a derivational test for whether a precomposed Latin character is structurally equivalent to a base letter plus combining mark(s). 'General_Category' has the same limitation in a different way: it classifies marks as characters, but it does not encode whether another code point is derived from a base-plus-mark sequence.

Going back to the main point of my letter: we arrived at the only normative, machine-testable Unicode mechanism that proves such a relationship exists.

I like your "Small Latin Letter +", nevertheless. And I'll keep providing tools for the WG to decide the direction that everyone finds best.

Regards,


On March 29, 2026 4:32:42 AM GMT-03:00, Tapani Tarvainen via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> wrote:
On Wed, Mar 25, 2026 at 11:41:56AM -0300, Mark W. Datysgeld (mark@governanceprimer.com) wrote:

Following up, based on feedback during the WG call, please find a v2 of the
document with a bounded 15 items list. Our Github has also been updated.

Thank you, Mark. That is most helpful.

I did a quick review of the 15 characters in your
"Thesis Section: LATIN SMALL LETTER [A-Z] WITH ..."

But first I'd like to note a couple of Unicode references:

"Diacritic: Characters that linguistically modify the meaning of another character to which they apply. Some diacritics are not combining characters, and some combining characters are not diacritics."

https://www.unicode.org/reports/tr44/#Diacritic

"Diacritic. (1) A mark applied or attached to a symbol to create a new symbol that represents a modified or new value. (2) A mark applied to a symbol irrespective of whether it changes the value of that symbol. In the latter case, the diacritic usually represents an independent value (for example, an accent, tone, or some other linguistic information). Also called diacritical mark or diacritical. (See also combining character and nonspacing mark.)"

https://www.unicode.org/glossary/#diacritic

Unfortunately those descriptions, or definitions if you like, don't include or reference a list of diacritics or a tool for determining them, but they make it clear enough that decomposability is not a definitive characteristic.

I don't recall any mention these or anything much else in Unicode apart from the table of decomposable diacritics in our discussion at the time. So I would not agree that the issue was thoroughly discussed.


Then to those 15 letters. Almost all of them are listed here:

https://en.wikipedia.org/wiki/Category:Latin_letters_with_diacritics

but obviously Wikipedia is not a definitive source either.

They all have their individual Wikipedia pages, too, with some (incomplete) information on the languages and standards using them (I confess to not having checked the references, this is just cut'n'paste from Wikipedia):


ø (LATIN SMALL LETTER O WITH STROKE, U+00F8) Danish, Norwegian, Faroese, Southern Sámi, several languages of Africa like Lendu

đ (LATIN SMALL LETTER D WITH STROKE, U+0111) African reference alphabet, Kven, Latin, South Slavic languages, Sámi, Vietnamese, old Spanish

ł (LATIN SMALL LETTER L WITH STROKE, U+0142) Polish, Kashubian, Sorbian, Silesian, Belarusian Latin, Ukrainian Latin, Kurdish, Wymysorys, Navajo, Dëne Sųłıné, Iñupiaq, Zuni, Hupa, Sm'algya̱x, Nisga'a, Dogrib

ŧ (LATIN SMALL LETTER T WITH STROKE, U+0167) Sámi, Hualapai, several orthographies for African languages

ƈ (LATIN SMALL LETTER C WITH HOOK, U+0188) African languages such as Serer

ƙ (LATIN SMALL LETTER K WITH HOOK, U+0199) Hausa and Karai-karai

ƥ (LATIN SMALL LETTER P WITH HOOK, U+01A5) African languages such as Serer

ƭ (LATIN SMALL LETTER T WITH HOOK, U+01AD) Serer

ƴ (LATIN SMALL LETTER Y WITH HOOK, U+01B4) African languages, such as Fula, Hausa and Noon

ɓ (LATIN SMALL LETTER B WITH HOOK, U+0253) International African Alphabet, various languages, notably Fula, Hausa, and Giziga

ɖ (LATIN SMALL LETTER D WITH TAIL, U+0256) African reference alphabet, Ewe, Fon, Aja, Bassa

ɗ (LATIN SMALL LETTER D WITH HOOK, U+0257) African reference alphabet, Fula, Hausa

ɠ (LATIN SMALL LETTER G WITH HOOK, U+0260) African reference alphabet

ɲ (LATIN SMALL LETTER N WITH LEFT HOOK, U+0272) some African languages, e.g. Bambara and Fula

ɨ (LATIN SMALL LETTER I WITH STROKE, U+0268) IPA

You will notice that, like Emmanuel said, most of them are used in various African languages.

Omitting them on a technicality might give the impression that ICANN doesn't care so much about Africa, even though that obviously isn't our intent.

I hope this will help in our discussion of the subject.

Regards,

Mark W. Datysgeld
Governance Primer & ICANNWiki