Hi Bill, On 03.07.2019 21:59, Bill Jouris wrote:
Dear colleagues,
When we were developing our repertoire, our go-to reference for what glyphs are used in any given language was Omniglot.
At the ICANN meeting in Marrakech last week, I was talking to a group of people about diacritics and such. And I mentioned in passing that (as shown in Omniglot https://www.omniglot.com/writing/spanish.htm ) the only diacritic used in Spanish is the tilde over an N. A couple of native speakers of Spanish immediately corrected me, saying that the acute and diaeresis are also used. (A quick search with Google confirms this.)
The good news is, all of those glyphs are already in our repertoire. So no immediate problem there.
The bad news, it seems to me, is this: in how many /other/ languages does Omniglot fail to capture all of the diacritics or diacritic/letter combinations actually used? And how many of those result in glyphs which are not in our repertoire currently? (Which might resolve the mystery of why Unicode has so many pre-composed combinations which we didn't find.)
I realize that answering that question necessarily involves going back through the repertoire research process again. Presumably using other sources. But I wonder if we can, in good conscience, fail to do so.
I agree with you that it is not unlikely that there may be further errors in other Omniglot languages. I wouldn't be surprised if more could be found. The question is, what is the alternative? I can only speak for languages that I know (English, German, Finnish). For these I can - with a high degree of confidence - decide whether all glyphs have been included, but not for the rest. So, even IF (and that's not a given) we find a better source for our list of languages, who is to say that those lists of glyphs are complete and correct. Those lists could also contain too many/wrong glyphs. Unless we find a native speaker for each of our languages who can list us all glyphs (and even then, he/she can be mistaken, so we would probably need at least three independent native speakers for each language to get a reasonable degree of confidence), we will always have the problem that whatever source we use, it may be incorrect. My suggestion therefore is to go with the list we created and wait for the public (or IP) comments. If someone complains and tells us we missed a certain glyph, we of course have to and will add it. I fear we have to get to a conclusion in the near future. It's like writing a book: whenever you re-read it, you will most likely find another problem or something to improve. It's almost impossible to get it perfect. At some point you will have to decide whether you want to publish the book (even if not 100% perfect) or continue improving it until the end of days/ICANN. ;-) Considering the fact that we're all volunteering our time here, I'd rather come to a conclusion sooner than later. This does not mean that if we find an actual error we shouldn't fix it. I want to submit something that as far as we know is correct. However, we shouldn't spend too much time searching for more potential errors at this point. But that's of course only my personal opinion. We can talk about this later today and get other opinions. Michael -- ____________________________________________________________________ | | | knipp | Knipp Medien und Kommunikation GmbH ------- Technologiepark Martin-Schmeisser-Weg 9 44227 Dortmund Germany Dipl.-Informatiker Fon: +49 231 9703-0 Fax: +49 231 9703-200 Dr. Michael Bauland SIP: Michael.Bauland@knipp.de Software Development E-mail: Michael.Bauland@knipp.de Register Court: Amtsgericht Dortmund, HRB 13728 Chief Executive Officers: Dietmar Knipp, Elmar Knipp