Tapani, Mark,
From a Council liaison perspective, I just wanted to offer a small observation that might help as you continue this discussion.
It seems like this exchange is surfacing two different, but equally valid, questions: one around how to define a boundary that is technically testable and implementable, and another around how to ensure that boundary appropriately reflects real-world language use and expectations around inclusion.
The current discussion is engaging both at once, which may be making it harder to converge.
It might be helpful to separate those threads slightly and consider:
– what constitutes a clear, objective, and machine-testable baseline;
– whether that baseline is sufficient for the WG’s goals; and
– if not, what characteristics any expansion would need to have in order to remain finite, predictable, and implementable.
One way to think about this, just as a framing device, is that the group may be defining a “floor” (what can be clearly and objectively established) and considering whether there is also a need for a “ceiling” (what the group ultimately wants to include), and how those two come together. The key question then becomes how to connect those in a way that remains finite, predictable, and implementable.
It may also be useful, if helpful to the group, to have a short, focused discussion with ICANN org to clarify what would be required from an implementation standpoint for any approach the WG is considering.
Just offering this in the spirit of supporting leadership as you work toward a path that balances clarity, inclusivity, and implementability.
All the best,
On Mar 29, 2026, at 6:57 AM, Mark W. Datysgeld via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> wrote:_______________________________________________Tapani,
You are correct that Unicode does not define a diacritic solely by decomposability.
However, canonical decomposition is the only Unicode mechanism that encodes a relationship (and note that the word relationship is central here) between a precomposed character and a base-plus-mark sequence. In Unicode terms, characters with canonical decomposition mappings in 'UnicodeData.txt' have 'Decomposition_Type=Canonical', and normalization exposes that relationship through NFD.
By contrast, the 'Diacritic' property is informative, and Unicode expressly notes that some diacritics are not combining characters and some combining characters are not diacritics. So it's not a derivational test for whether a precomposed Latin character is structurally equivalent to a base letter plus combining mark(s). 'General_Category' has the same limitation in a different way: it classifies marks as characters, but it does not encode whether another code point is derived from a base-plus-mark sequence.
Going back to the main point of my letter: we arrived at the only normative, machine-testable Unicode mechanism that proves such a relationship exists.
I like your "Small Latin Letter +", nevertheless. And I'll keep providing tools for the WG to decide the direction that everyone finds best.
Regards,On March 29, 2026 4:32:42 AM GMT-03:00, Tapani Tarvainen via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> wrote:Mark W. Datysgeld
Governance Primer & ICANNWiki
Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org
To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org