On Sun, Mar 29, 2026 at 07:57:19AM -0300, Mark W. Datysgeld (mark@governanceprimer.com) wrote:
You are correct that Unicode does not define a diacritic solely by decomposability.
Unfortunately I misunderstood that point when we discussed this last year.
However, canonical decomposition is the only Unicode mechanism that encodes a relationship (and note that the word relationship is central here) between a precomposed character and a base-plus-mark sequence.
I beg to differ. Or maybe I misunderstand you. But the "SMALL LATIN LETTER ... WITH" makes the relationship between the named letter and its ASCII counterpart unambiguous and easy to determine automatically. It encodes the relationship in the name of the letter, but while that may sound, er, less technical, it is just as explicit and immutable in Unicode as decomposability, if not more so. The only apparent downside would be that we'd need two criteria instead of one, but it would still be simple enough. -- Tapani Tarvainen