On Wed, Apr 01, 2026 at 11:44:33AM +0200, Michael Bauland via Gnso-latin-diacritics (gnso-latin-diacritics@icann.org) wrote:
I see that I possibly have oversimplified this. It's a bit more complex:
If we just use "LATIN SMALL LETTER [ASCII] WITH [diacritic]", this would leave out the previously included letters with two diacritics. So it would need to be "LATIN SMALL LETTER [ASCII] WITH [diacritic](AND [diacritic]"
But even then, another group of characters that would be missed (and have previously been included) are the ones that are not pre-composed. These would be the ones classified as Type 2 in Pitinan's e-mail, e.g., U+0061 U+0331 a̱ LATIN SMALL LETTER A + COMBINING MACRON BELOW
So, it appears the rule might be slightly more complex, in the end.
Yes. There are in effect two, partially overlapping sets of diacritical letters: * precomposed ones that have a single code point form; these would be included by LATIN SMALL LETTER [A-Z] WITH DIACRITIC(s) * those that have a multi-character (decomposed) form, like LATIN SMALL LETTER [A-Z] + COMBINING DIACRITIC(s) Most diacriticals actually belong to both of those sets, but some belong only to one or the other. So to cover all bases two rules would be needed. - In theory some character might be a combination of those, like LATIN SMALL LETTER A WITH ACUTE ACCENT + COMBINING CARON, but I think there are none that only exist in such a form (that is, all such would also have one of the above forms). -- Tapani Tarvainen