Hi all, For considerations (1) and (2), I’d like to point to the Latin script RZ-LGR which may be useful to scope the discussion as a Latin gTLD string must be valid as per the Latin script RZ-LGR. Please see the code point repertoire at https://www.icann.org/sites/default/files/packages/lgr/rz-lgr-6-latin-script.... There are two types of base character and diacritic encoding. Type 1: The base character and the diacritic are encoded as a single code point. Examples: U+00E0 à LATIN SMALL LETTER A WITH GRAVE U+00E1 á LATIN SMALL LETTER A WITH ACUTE Type 2: The base character and the diacritic are encoded as two separate code points. This is because there is no single code point for such character with diacritic and that character is required for some languages e.g. Neur in the examples below. Examples: U+0061 U+0331 a̱ LATIN SMALL LETTER A + COMBINING MACRON BELOW U+0065 U+0331 e̱ LATIN SMALL LETTER E + COMBINING MACRON BELOW U+025B U+0331 U+0308 ɛ̱̈ LATIN SMALL LETTER OPEN E + COMBINING MACRON BELOW + COMBINING DIAERESIS Please also note that the separate diacritics are not included in the repertoire on its own, it only comes in as a sequence of a base character. There are 20 cases of Type 2 encoding in the Latin RZ-LGR to support the following languages: Neur, Guarani, Marshallese, Raga, Malagasy, Dinka, Cubeo, Yoruba. Regards, Pitinan On 1/4/26, 14:50, "Michael Bauland via Gnso-latin-diacritics" <gnso-latin-diacritics@icann.org <mailto:gnso-latin-diacritics@icann.org>> wrote: Hi all, thanks to everybody who provided input to this topic. Really great discussions with interesting arguments. I won't join into the discussion with my own opinion, but just would like to mention two facts: Am 31.03.2026 um 18:37 schrieb Tapani Tarvainen via Gnso-latin-diacritics:
Dear all,
Let's consider three alternatives:
(1) Decomposable diacritics only.
This is technically easy, single, straightforward rule. The downside is that it excludes several diacritical letters.
(2) Include also "SMALL LATIN LETTER [A-Z] WITH ..."
Also technically easy (how long did it take Mark to add this to his tool?), only a bit more complicated in that we'd need two different rules.
We would not necessarily need to implement two rules. The rule for (2) would already include all cases of (1). All decomposable diacritics already have a corresponding name, e.g., "LATIN SMALL LETTER E WITH ACUTE", which is part of (1).
It provides unambiguous, machine-testable set of letters and relationships between the diacritics and their ASCII counterparts, and it adds 15 letters that are used in at least 35 languages.
(3) Add also letters that aren't real diacritics, like ŋ, æ, ð, þ etc, but that could still be handled the same way. These would need to be evaluated individually and automated processing would have to be based on tables. It would, however, cover even more languages.
The problem with these is not only that they aren't real diacritics (and we are restricted to diacritics), some of them don't map clearly to an ASCII letter. Take for example þ (U+00FE, LATIN SMALL LETTER THORN): would the base ASCII letter be "p" or "b"? Both choices are equally viable. Let's continue the discussion during our call. Cheers, Michael -- ____________________________________________________________________ | | | knipp | Knipp Medien und Kommunikation GmbH ------- Technologiepark Martin-Schmeisser-Weg 9 44227 Dortmund Germany Dipl.-Informatiker Fon: +49 231 9703-0 Fax: +49 231 9703-200 Dr. Michael Bauland SIP: Michael.Bauland@knipp.de <mailto:Michael.Bauland@knipp.de> Software Development E-mail: Michael.Bauland@knipp.de <mailto:Michael.Bauland@knipp.de> Register Court: Amtsgericht Dortmund, HRB 13728 Chief Executive Officers: Dietmar Knipp, Elmar Knipp Certified according DIN ISO/IEC 27001:2017 _______________________________________________ Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org <mailto:gnso-latin-diacritics@icann.org> To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org <mailto:gnso-latin-diacritics-leave@icann.org>