Dear all,
I would like to simplify the issue a little, skip all the
difficult cases and consider just these two competing
definitions of diacritical character (simplified here):
(1) Unicode-decomposable
(2) LATIN SMALL LETTER ... WITH
* Both are well-defined and objective, they could be mechanically
derived from Unicode tables.
* Both make it unambiguous what is the matching ASCII letter.
There is no need to evaluate confusability in any subjective way.
* Eligibility change would be equally simple to do mechanically
in either case: character name is no harder to get from Unicode
tables than decomposability is.
* Both are stable in the sense that they're unlikely to change
in future Unicode versions (apart from possible new additions).
Although I think the former is actually more likely to change
(that is, Unicode could specify a decomposition for a character
that doesn't have it now), that's unlikely enough not to matter.
* The latter includes more letters and supports more languages.
In that sense it would also be fairer.
* The latter is more closely aligned with linguistics.
* The latter is easier to understand to non-specialists.
* The former is what we currently have and we'd save ourselves
some time and work by sticking with it.
Am I missing something important?
Having read Mark's scoping exercise, I agree that including all
similarly confusable characters like ŋ or æ would be more difficult
and require extra work, so leaving them for later makes sense.
But it's not a binary choice, narrowest possible or widest possible,
and I think option (2) above would be a good compromise.