Tapani, Mark,


From a Council liaison perspective, I just wanted to offer a small observation that might help as you continue this discussion.

It seems like this exchange is surfacing two different, but equally valid, questions: one around how to define a boundary that is technically testable and implementable, and another around how to ensure that boundary appropriately reflects real-world language use and expectations around inclusion.

The current discussion is engaging both at once, which may be making it harder to converge.


It might be helpful to separate those threads slightly and consider:

– what constitutes a clear, objective, and machine-testable baseline;

– whether that baseline is sufficient for the WG’s goals; and

– if not, what characteristics any expansion would need to have in order to remain finite, predictable, and implementable.


One way to think about this, just as a framing device, is that the group may be defining a “floor” (what can be clearly and objectively established) and considering whether there is also a need for a “ceiling” (what the group ultimately wants to include), and how those two come together. The key question then becomes how to connect those in a way that remains finite, predictable, and implementable.

It may also be useful, if helpful to the group, to have a short, focused discussion with ICANN org to clarify what would be required from an implementation standpoint for any approach the WG is considering.

Just offering this in the spirit of supporting leadership as you work toward a path that balances clarity, inclusivity, and implementability.


All the best,


Christian Dawson
Executive Director
c:  703 623 2612

On Mar 29, 2026, at 6:57 AM, Mark W. Datysgeld via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> wrote:

Tapani,

You are correct that Unicode does not define a diacritic solely by decomposability.

However, canonical decomposition is the only Unicode mechanism that encodes a relationship (and note that the word relationship is central here) between a precomposed character and a base-plus-mark sequence. In Unicode terms, characters with canonical decomposition mappings in 'UnicodeData.txt' have 'Decomposition_Type=Canonical', and normalization exposes that relationship through NFD.

By contrast, the 'Diacritic' property is informative, and Unicode expressly notes that some diacritics are not combining characters and some combining characters are not diacritics. So it's not a derivational test for whether a precomposed Latin character is structurally equivalent to a base letter plus combining mark(s). 'General_Category' has the same limitation in a different way: it classifies marks as characters, but it does not encode whether another code point is derived from a base-plus-mark sequence.

Going back to the main point of my letter: we arrived at the only normative, machine-testable Unicode mechanism that proves such a relationship exists.

I like your "Small Latin Letter +", nevertheless. And I'll keep providing tools for the WG to decide the direction that everyone finds best.

Regards,


On March 29, 2026 4:32:42 AM GMT-03:00, Tapani Tarvainen via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> wrote:
On Wed, Mar 25, 2026 at 11:41:56AM -0300, Mark W. Datysgeld (mark@governanceprimer.com) wrote:

Following up, based on feedback during the WG call, please find a v2 of the
document with a bounded 15 items list. Our Github has also been updated.

Thank you, Mark. That is most helpful.

I did a quick review of the 15 characters in your
"Thesis Section: LATIN SMALL LETTER [A-Z] WITH ..."

But first I'd like to note a couple of Unicode references:

"Diacritic: Characters that linguistically modify the meaning of another character to which they apply. Some diacritics are not combining characters, and some combining characters are not diacritics."

https://www.unicode.org/reports/tr44/#Diacritic

"Diacritic. (1) A mark applied or attached to a symbol to create a new symbol that represents a modified or new value. (2) A mark applied to a symbol irrespective of whether it changes the value of that symbol. In the latter case, the diacritic usually represents an independent value (for example, an accent, tone, or some other linguistic information). Also called diacritical mark or diacritical. (See also combining character and nonspacing mark.)"

https://www.unicode.org/glossary/#diacritic

Unfortunately those descriptions, or definitions if you like, don't include or reference a list of diacritics or a tool for determining them, but they make it clear enough that decomposability is not a definitive characteristic.

I don't recall any mention these or anything much else in Unicode apart from the table of decomposable diacritics in our discussion at the time. So I would not agree that the issue was thoroughly discussed.


Then to those 15 letters. Almost all of them are listed here:

https://en.wikipedia.org/wiki/Category:Latin_letters_with_diacritics

but obviously Wikipedia is not a definitive source either.

They all have their individual Wikipedia pages, too, with some (incomplete) information on the languages and standards using them (I confess to not having checked the references, this is just cut'n'paste from Wikipedia):


ø (LATIN SMALL LETTER O WITH STROKE, U+00F8) Danish, Norwegian, Faroese, Southern Sámi, several languages of Africa like Lendu

đ (LATIN SMALL LETTER D WITH STROKE, U+0111) African reference alphabet, Kven, Latin, South Slavic languages, Sámi, Vietnamese, old Spanish

ł (LATIN SMALL LETTER L WITH STROKE, U+0142) Polish, Kashubian, Sorbian, Silesian, Belarusian Latin, Ukrainian Latin, Kurdish, Wymysorys, Navajo, Dëne Sųłıné, Iñupiaq, Zuni, Hupa, Sm'algya̱x, Nisga'a, Dogrib

ŧ (LATIN SMALL LETTER T WITH STROKE, U+0167) Sámi, Hualapai, several orthographies for African languages

ƈ (LATIN SMALL LETTER C WITH HOOK, U+0188) African languages such as Serer

ƙ (LATIN SMALL LETTER K WITH HOOK, U+0199) Hausa and Karai-karai

ƥ (LATIN SMALL LETTER P WITH HOOK, U+01A5) African languages such as Serer

ƭ (LATIN SMALL LETTER T WITH HOOK, U+01AD) Serer

ƴ (LATIN SMALL LETTER Y WITH HOOK, U+01B4) African languages, such as Fula, Hausa and Noon

ɓ (LATIN SMALL LETTER B WITH HOOK, U+0253) International African Alphabet, various languages, notably Fula, Hausa, and Giziga

ɖ (LATIN SMALL LETTER D WITH TAIL, U+0256) African reference alphabet, Ewe, Fon, Aja, Bassa

ɗ (LATIN SMALL LETTER D WITH HOOK, U+0257) African reference alphabet, Fula, Hausa

ɠ (LATIN SMALL LETTER G WITH HOOK, U+0260) African reference alphabet

ɲ (LATIN SMALL LETTER N WITH LEFT HOOK, U+0272) some African languages, e.g. Bambara and Fula

ɨ (LATIN SMALL LETTER I WITH STROKE, U+0268) IPA

You will notice that, like Emmanuel said, most of them are used in various African languages.

Omitting them on a technicality might give the impression that ICANN doesn't care so much about Africa, even though that obviously isn't our intent.

I hope this will help in our discussion of the subject.

Regards,

Mark W. Datysgeld
Governance Primer & ICANNWiki
_______________________________________________
Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org
To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org