[Gnso-latin-diacritics] Re: Diacritics definition

March 25, 2026

      Hi Tapani and all.
Thanks for bringing this topic back to the table, for the approach taken in this email. I do confess I have both sympathy and skepticism for this issue of the small o ewith stroke + some other chasracters that we have left out as they do not fit in the definition we are using.

Starting with my reserves about moing into that direction is that we need to provide a clear definition of what is acceptable for these Latin Diactritic bundles. We can opt for either a list or a definition,  one or more criteria. The problem with the lsit is how can this evolve? Would we need a new PDP if, say Unicode changes something? if it appears we have made a msitake by picking or not picking a concrete character? The problem with the definition is that I have not seen so far one that really addresses the issue. We simply cannot say “all characters”; “the sidest possible set”, or use trms such as “obvious” or “similar”, or “resaonable”. If “do what is reasonable” would suffice, this whole PP would not be here ;-)

We also have to always bear in mind we are dealing with aan exception process, which, by nature must be restrictive in nature. Open-eneded exceptions are not exceptions, are overrulings ;-)

Now, let me go to the other side for a moment: I cannot see any material reason to exclude, say Latin small letter o with stroke, except, as said, that it does not fit in the working definition we have. To give a practical example in a different but closely related scenario, when establishing the **second-level** IIDN tables for LATN script for the TLDs we manage, we have considered that all “latin Letter x with y” are “pseudo variants” of the base Latin small letter. So, for instance, Latin small lettr o and o iwth acute, o with grave, o with tilde…. and o with stroke, yes, are all part of the same bundle, and the “same entity” principle applies.

We have done so, precisely, by the method Tapani describes in his last email. Now, the difference is that for a second-level IDN Table for a given TLD, if there is something to be changed, the Registry may submit a new table to IANA, and apply the changes to the Registry. The lsit has a clear amendment proces, with all the required caveats for existing domains, if you want. So, here we still need a formal definition, simple or complex (simple is always preferable, not always possible) that does not require a PDP every now and then ofr a single character to be added or removed….

I am not sure if the method Tapani propsoes solves all problems or, more importantly, create some additional ones. I ahve not checked the whole list. Not snce we prepared the IDN tables I mentioned above, over 10 years ago in most caseds. But it is an area to explore.

Otherwise, if we cannot find a definition that is clear and does not create additional problems, I would be relcutant to spend six additional months on the o with stroke, quite frankly…..

For your info, here are the TLD that use the second level IDN Tables for Latin script with these “bdiacritic bundles”, and including the famous U+99F8, o with stroke:

.barcelona; .bcn; .erni,; .eurovision; .eus, .gal; .lacaixa, .madrid; .rdio; .scot; .seat; .sport; .swiss

Amadeu
...
Missatge enviat per Tapani Tarvainen via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> el dia 25 març 2026 a les 8:56:
Dear all,
I would like to simplify the issue a little, skip all the
difficult cases and consider just these two competing
definitions of diacritical character (simplified here):
(1) Unicode-decomposable
(2) LATIN SMALL LETTER ... WITH
* Both are well-defined and objective, they could be mechanically
derived from Unicode tables.
* Both make it unambiguous what is the matching ASCII letter.
There is no need to evaluate confusability in any subjective way.
* Eligibility change would be equally simple to do mechanically
in either case: character name is no harder to get from Unicode
tables than decomposability is.
* Both are stable in the sense that they're unlikely to change
in future Unicode versions (apart from possible new additions).
Although I think the former is actually more likely to change
(that is, Unicode could specify a decomposition for a character
that doesn't have it now), that's unlikely enough not to matter.
* The latter includes more letters and supports more languages.
In that sense it would also be fairer.
* The latter is more closely aligned with linguistics.
* The latter is easier to understand to non-specialists.
* The former is what we currently have and we'd save ourselves
some time and work by sticking with it.
Am I missing something important?
Having read Mark's scoping exercise, I agree that including all
similarly confusable characters like ŋ or æ would be more difficult
and require extra work, so leaving them for later makes sense.
But it's not a binary choice, narrowest possible or widest possible,
and I think option (2) above would be a good compromise.
--
Tapani Tarvainen
_______________________________________________
Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org
To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org

[Gnso-latin-diacritics] Re: Diacritics definition

Amadeu Abril i Abril (CORE)