I added to the draft the reference that points to Obolo’s EGDIS scale 5-6a designation. The source I found aggregates groups into blocks, e.g. 0-4, 5-6a, etc. If someone could find a reference with the exact specific scale we can replace
it — I agree with Bill, what we have found is that languages are in one group. But this source seems to represent data in a slightly different way, hence the grouping.
Dennis
From:
Latingp <latingp-bounces@icann.org> on behalf of Bill Jouris via Latingp <latingp@icann.org>
Reply-To: Bill Jouris <b_jouris@yahoo.com>
Date: Sunday, December 5, 2021 at 9:02 PM
To: Latin GP <latingp@icann.org>
Subject: [EXTERNAL] [Latingp] Repertoire Expansion
|
Caution: This email originated from outside the organization. Do not click links or open attachments unless
you recognize the sender and know the content is safe. |
Dear colleagues,
During our meeting last week, there were concerns expressed about expanding the number of languages we consider for our repertoire. In particular
the number of additional languages was thought to be large (Dennis’ draft response says “hundreds, if not thousands, of languages around the world”) and the time required to deal with them amounting to weeks. Good news:
·
There aren’t
·
It didn’t
There may be as many as a thousand languages which, at one time or another, were written using the Latin script. But the IDN project has been
clear from early on that it would only consider scripts from “living languages.” It does not seem unreasonable, therefore, that only living languages should be considered when analyzing a given script. And the number of living languages using the Latin script
is approximately 450. (Of which, we have already done over 200.) Furthermore, to be worth including in our analysis a language would need to have what the EGIDS 5 definition calls “literature in a standardized form”. (Dennis, in his comments in the meeting
and his draft response, appears to suggest conflating EGDIS 5 and EGIDS 6a. However, it seems to me that this confuses the issue. The languages in EGDIS 6a do not involve (yet) a standard orthography; that’s why they aren’t EGDIS 5.) What the comments are
therefore suggesting, it seems to me, is eliminating the 1,000,000 native speakers threshold, and including every language which EGDIS 5. An expansion of our work; but not, as we shall see, an enormous one.
Mirjana noted during the meeting that it had taken her 3 months to compile the list of languages which we analyzed initially. The implication
being that it would take as long to do the same this time. Fortunately, the fruits of her labors then are still available to us now. In particular, her compilation of languages using the Latin script which are EGDIS 5. See
https://docs.google.com/document/d/1PwUa4Tkqpp2GGz8-hYDbKz357BSlMG6vkbrAmUqBcTQ/edit#
The total number of languages which are EGDIS 5? 110. Better yet, some 30 of those are already included in our work. Further, 4 are no longer
using the Latin script (although they did in the past) and one appears to no longer have living native speakers. So we are left with 75 languages additional languages. Not thousands. Not hundreds. 75.
I’ve created a spreadsheet (attached), the first tab of which builds on Mirjana’s list, but includes columns for the new code points, if any,
which appear in each language (as well as the Unicode and name). The first column is a flag. Languages which are already included have a green flag
, languages which are no longer EGDIS 5 have a pink flag
, and the languages to be added have a yellow flag
.
How many new code points are there? 26. (Or possibly 28, if someone smarter than me can contrive a way to produce Latin Small Letter E with
Breve and Combining Circumflex, or Latin Small Letter O with Breve and Combining Circumflex, that look like the images given in Omniglot for the Jarai language.) Happily, the analysis didn’t take weeks. More like 2 half days.
The second tab of the spreadsheet gives the new code points, in the same format used in our repertoire tables, ready to be folded in to our
existing tables. Of course, that requires having the References available. So a second attachment gives all the references, in proper form, ready to be tacked on to the end of our Reference section.
Of course, we can still refuse, on principle, to include any EGDIS 5 language with less than a million native speakers. Assuming that someone
can come up with such a principle. But we need to be clear that the time and effort required to analyze the additional languages cannot be our excuse to do so.
Bill