Dear all, I would like to simplify the issue a little, skip all the difficult cases and consider just these two competing definitions of diacritical character (simplified here): (1) Unicode-decomposable (2) LATIN SMALL LETTER ... WITH * Both are well-defined and objective, they could be mechanically derived from Unicode tables. * Both make it unambiguous what is the matching ASCII letter. There is no need to evaluate confusability in any subjective way. * Eligibility change would be equally simple to do mechanically in either case: character name is no harder to get from Unicode tables than decomposability is. * Both are stable in the sense that they're unlikely to change in future Unicode versions (apart from possible new additions). Although I think the former is actually more likely to change (that is, Unicode could specify a decomposition for a character that doesn't have it now), that's unlikely enough not to matter. * The latter includes more letters and supports more languages. In that sense it would also be fairer. * The latter is more closely aligned with linguistics. * The latter is easier to understand to non-specialists. * The former is what we currently have and we'd save ourselves some time and work by sticking with it. Am I missing something important? Having read Mark's scoping exercise, I agree that including all similarly confusable characters like ŋ or æ would be more difficult and require extra work, so leaving them for later makes sense. But it's not a binary choice, narrowest possible or widest possible, and I think option (2) above would be a good compromise. -- Tapani Tarvainen