[Gnso-latin-diacritics] Diacritics definition

April 22, 2026

      On Wed, Apr 08, 2026 at 02:57:06PM +0000, John Emery via Gnso-latin-diacritics (gnso-latin-diacritics@icann.org) wrote:
...
Inclusion principle and conservatism principle works generally
overall. Inclusion principle discusses two ways of managing
characters: start with everything and exclude, or start with an
empty set and include only necessary characters.
Unfortunately the latter approach does not really help us here,
because it leaves us with the question: necessary *for what*?

We haven't been given a criteria for necessity. If we go that route,
we'll have to come up with our own. I don't think that would make our
task any easier. At least to me it seems exclusion rules would be
easier to formulate and justify in a way that'd be easy to understand.

If conservatism principle implies we should aim for the smallest
possible set of characters, I'll note that first, the set of
decomposable diacritics is bigger than the set of precomposed ones
(latin letter with ...), and second, we could get an even smaller set
from their intersection, that is, by allowing only characters that are
BOTH decomposable AND precomposed. Even smaller sets would be possible.

Our charter doesn't say "all diacritics" but neither does it say
"some diacritics". I don't see how that would require minimizing
the set of acceptable characters.

In any case we should justify and explain our decision.

Here's a tentative attempt:

* Only characters that are in RZ-LGR.

  This is not controversial, nobody's suggested anything to the
  contrary.

* Only characters that are unambigously formed from a base Latin
  letter by adding a glyph or two.

  This excludes characters like æ, þ, ð, ŋ. This can be justified by
  noting that allowing them would require extra work to resolve
  possible issues arising from multiple potentially confusable
  characters, without needing to refer to any diacritic definition.

  We could note and maybe recommend that a future PDP could deal with
  such cases.

* Only characters that are based on ASCII letters (a-z).

  This is implied by our restriction that the base TLD must be ASCII.
  Besides Greek diacritics &c this excludes also Latin extensions like
  ǯ (latin letter ezh with caron).

  If a subsequent PDP takes on cases without base ASCII TLD, out rules
  would probably work almost unchanged for dealing with, say, someone
  wanting .vuaʒʒ, .vuâǯǯ and .vuäǯǯ.

I don't see a need for anything else.

All of those are satisfied equally well with both decomposable
and precomposed (latin letter with ...) diacritics.

-- 
Tapani Tarvainen