Dear all,
I support Tapani’s point.
The current formulation is too restrictive and insufficiently justified. From a Francophone and African perspective, many characters in the Latin Extended sets, such as ø, are effectively treated as diacritic variations of base letters in real usage. What matters here is confusability and linguistic function, not whether a character is decomposable in Unicode. Unicode does not impose such a limitation. This is therefore a policy choice, and it should be clearly explained in the report. The rationale for excluding non-decomposable forms needs to be made explicit. Tapani’s suggestion to consider the “LATIN SMALL LETTER WITH ” pattern offers a more practical and linguistically grounded approach.
Best regards,
Emmanuel
Dear all,
Looking at the initial report recommendation 1, it gives the chosen
definition of diacritics as a fact, without explaining or justifying
it in any way.
It should, so that if someone wonders why ø or ŋ are treated
differently they should be able to look at the report and find an
eplanation. I also note the ISOC Norway comment that wants a technical
reason for the exclusion of characters used in Norwegian and Sami.
I tried to think of how to explain it.
As I recall, the argument was that the charter specifies diacritics
only and Unicode defines them that way and if we want to include
something else we'd have to go back to the council and there's no time
for that.
But does Unicode really define them that way?
I went to the source to see what Unicode actually says about the
matter, and importantly also what it does *not* say.
Not an easy task - there's *a lot* of stuff in there.
But try as I might, I could not find a general definition
of "diacritic" anywhere. Perhaps I've missed it - if so,
feel free to point it out.
There's lots of talk about combining diacritic marks and
decomposable diacritics - but no claim that those are the only kind
of diacritics. It's simply that for Unicode's purposes the
non-decomposable diacritics don't need any special treatment, so
there's no need to even define them.
For Unicode, decomposable diacritics are special because they need
special rules for canonicalization. But for our purposes that doesn't
matter.
In effect we took the Unicode table of decomposable diacritics
and decided to treat it as if those were the only diacritics.
This decision was not given to us by the charter nor by
Unicode, it was our choice.
I see no good reason not to allow all Unicode characters described in
Latin-1 Supplement or any of the Latin Extended tables as
"LATIN SMALL LETTER <x> WITH <y>"
where <x> is an ASCII letter and <y> is any glyph that modifies the
base letter (which is the general meaning of "diacritic mark") in a
way that makes them confusable, regardless of the existence of a
decomposed canonical equivalent, as long as they are in the RZ-LGR.
My favorite example here is of course "latin small letter with stroke", ø.
If we conclude that there is a good reason for excluding such
non-decomposable diacritical characters, we should spell it out.
I don't see any good *technical* reason to exclude even characters
like ð, þ, ŋ, æ etc, but they are not diacritical and thus outside of
the charter, and in some cases might require aditional considerations,
so I guess we'd have to back to the council for those, or wait for
another PDP ("Extended Latin Characters PDP" perhaps?).
(Side note: during a coffee break discussion in Mumbai someone
said old issues shouldn't be relitigated *unless* there're new
facts or new viewpoints to consider. Here I think there are.)
--
Tapani Tarvainen
_______________________________________________
Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org
To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org