On Wed, Apr 08, 2026 at 02:57:06PM +0000, John Emery via Gnso-latin-diacritics (gnso-latin-diacritics@icann.org) wrote:
Inclusion principle and conservatism principle works generally overall. Inclusion principle discusses two ways of managing characters: start with everything and exclude, or start with an empty set and include only necessary characters.
Unfortunately the latter approach does not really help us here, because it leaves us with the question: necessary *for what*? We haven't been given a criteria for necessity. If we go that route, we'll have to come up with our own. I don't think that would make our task any easier. At least to me it seems exclusion rules would be easier to formulate and justify in a way that'd be easy to understand. If conservatism principle implies we should aim for the smallest possible set of characters, I'll note that first, the set of decomposable diacritics is bigger than the set of precomposed ones (latin letter with ...), and second, we could get an even smaller set from their intersection, that is, by allowing only characters that are BOTH decomposable AND precomposed. Even smaller sets would be possible. Our charter doesn't say "all diacritics" but neither does it say "some diacritics". I don't see how that would require minimizing the set of acceptable characters. In any case we should justify and explain our decision. Here's a tentative attempt: * Only characters that are in RZ-LGR. This is not controversial, nobody's suggested anything to the contrary. * Only characters that are unambigously formed from a base Latin letter by adding a glyph or two. This excludes characters like æ, þ, ð, ŋ. This can be justified by noting that allowing them would require extra work to resolve possible issues arising from multiple potentially confusable characters, without needing to refer to any diacritic definition. We could note and maybe recommend that a future PDP could deal with such cases. * Only characters that are based on ASCII letters (a-z). This is implied by our restriction that the base TLD must be ASCII. Besides Greek diacritics &c this excludes also Latin extensions like ǯ (latin letter ezh with caron). If a subsequent PDP takes on cases without base ASCII TLD, out rules would probably work almost unchanged for dealing with, say, someone wanting .vuaʒʒ, .vuâǯǯ and .vuäǯǯ. I don't see a need for anything else. All of those are satisfied equally well with both decomposable and precomposed (latin letter with ...) diacritics. -- Tapani Tarvainen