Dear colleagues, During our meeting last week, there were concerns expressedabout expanding the number of languages we consider for our repertoire. In particular the number of additionallanguages was thought to be large (Dennis’ draft response says “hundreds, ifnot thousands, of languages around the world”) and the time required to deal withthem amounting to weeks. Good news: - There aren’t - It didn’t There may be as many as a thousand languages which, at onetime or another, were written using the Latin script. But the IDN project has been clear from earlyon that it would only consider scripts from “living languages.” It does not seem unreasonable, therefore,that only living languages should be considered when analyzing a givenscript. And the number of living languagesusing the Latin script is approximately 450. (Of which, we have already done over 200.) Furthermore, to be worth including in our analysisa language would need to have what the EGIDS 5 definition calls “literature ina standardized form”. (Dennis, in hiscomments in the meeting and his draft response, appears to suggest conflating EGDIS5 and EGIDS 6a. However, it seems to methat this confuses the issue. The languagesin EGDIS 6a do not involve (yet) a standard orthography; that’s why they aren’tEGDIS 5.) What the comments aretherefore suggesting, it seems to me, is eliminating the 1,000,000 nativespeakers threshold, and including every language which EGDIS 5. An expansion of our work; but not, as weshall see, an enormous one. Mirjana noted during the meeting that it had taken her 3months to compile the list of languages which we analyzed initially. The implication being that it would take aslong to do the same this time. Fortunately, the fruits of her labors then are still available to usnow. In particular, her compilation of languagesusing the Latin script which are EGDIS 5. See https://docs.google.com/document/d/1PwUa4Tkqpp2GGz8-hYDbKz357BSlMG6vkbrAmUqB... The total number of languages which are EGDIS 5? 110. Betteryet, some 30 of those are already included in our work. Further, 4 are no longer using the Latinscript (although they did in the past) and one appears to no longer have livingnative speakers. So we are left with 75languages additional languages. Notthousands. Not hundreds. 75. I’ve created a spreadsheet (attached), the first tab ofwhich builds on Mirjana’s list, but includes columns for the new code points,if any, which appear in each language (as well as the Unicode and name). The first column is a flag. Languages which are already included have agreen flag , languages which are nolonger EGDIS 5 have a pink flag , andthe languages to be added have a yellow flag . How many new code points are there? 26. (Or possibly 28, if someone smarter than me can contrive a way to produceLatin Small Letter E with Breve and Combining Circumflex, or Latin Small LetterO with Breve and Combining Circumflex, that look like the images given inOmniglot for the Jarai language.) Happily, the analysis didn’t take weeks. More like 2 half days. The second tab of the spreadsheet gives the new code points,in the same format used in our repertoire tables, ready to be folded in to ourexisting tables. Of course, thatrequires having the References available. So a second attachment gives all the references, in proper form, readyto be tacked on to the end of our Reference section. Of course, we can still refuse, on principle, to include anyEGDIS 5 language with less than a million native speakers. Assuming that someone can come up with such aprinciple. But we need to be clear thatthe time and effort required to analyze the additional languages cannot be our excuse to do so. Bill
Dear All, Thank you Bill for the effort. I would like to further clarify what may be needed to consider additional languages. As you know the Integration Panel (IP) has recommended GPs to align what the common and widespread use requirement by the LGR Procedure<https://www.icann.org/en/system/files/files/lgr-procedure-20mar13-en.pdf> (e.g. see discussion on pp. 38-39). The IP had suggested EGIDS scale as a possible measure to meet this requirement, where languages between EGIDS value 1-4 are clear for inclusion and those with EGIDS value 5 are borderline and may require additional evidence for inclusion. The latter means that an argument has to be made for the languages with EGIDS level 5 because these are not automatically included. Latin GP had made an argument on the basis of population of 1 million people using the languages with EGIDS level 5, which the IP has not objected to-date. The LGR Procedure says: “This would be an area for judgment by the integration panel. For scripts where there is doubt that they meet the criteria of eligibility, the default action under the Conservatism and Inclusion principles would be provisional exclusion until positive evidence is brought forward that establishes widespread use for a living language.” Please note that if the Latin GP changes the criteria of including languages in the Latin LGR proposal, this will not be automatically accepted by the IP. The IP would do the evaluations and may request evidence of “widespread use for a living language” for the languages being included with EGIDS level 5 to meet the requirements of the LGR Procedure. Thus, in case the GP decides to expand the existing set on languages for inclusion, the additional effort by Latin GP may not be limited to the evaluation of the code points by the GP. The work may also require finding and documenting concrete evidence of widespread usage for each of the languages being proposed from EGIDS level 5. Regards, Sarmad From: Latingp <latingp-bounces@icann.org> on behalf of Latin GP <latingp@icann.org> Reply-To: Bill Jouris <b_jouris@yahoo.com> Date: Monday, December 6, 2021 at 7:07 AM To: Latin GP <latingp@icann.org> Subject: [Latingp] Repertoire Expansion Dear colleagues, During our meeting last week, there were concerns expressed about expanding the number of languages we consider for our repertoire. In particular the number of additional languages was thought to be large (Dennis’ draft response says “hundreds, if not thousands, of languages around the world”) and the time required to deal with them amounting to weeks. Good news: * There aren’t * It didn’t There may be as many as a thousand languages which, at one time or another, were written using the Latin script. But the IDN project has been clear from early on that it would only consider scripts from “living languages.” It does not seem unreasonable, therefore, that only living languages should be considered when analyzing a given script. And the number of living languages using the Latin script is approximately 450. (Of which, we have already done over 200.) Furthermore, to be worth including in our analysis a language would need to have what the EGIDS 5 definition calls “literature in a standardized form”. (Dennis, in his comments in the meeting and his draft response, appears to suggest conflating EGDIS 5 and EGIDS 6a. However, it seems to me that this confuses the issue. The languages in EGDIS 6a do not involve (yet) a standard orthography; that’s why they aren’t EGDIS 5.) What the comments are therefore suggesting, it seems to me, is eliminating the 1,000,000 native speakers threshold, and including every language which EGDIS 5. An expansion of our work; but not, as we shall see, an enormous one. Mirjana noted during the meeting that it had taken her 3 months to compile the list of languages which we analyzed initially. The implication being that it would take as long to do the same this time. Fortunately, the fruits of her labors then are still available to us now. In particular, her compilation of languages using the Latin script which are EGDIS 5. See https://docs.google.com/document/d/1PwUa4Tkqpp2GGz8-hYDbKz357BSlMG6vkbrAmUqB... [docs.google.com]<https://urldefense.com/v3/__https:/docs.google.com/document/d/1PwUa4Tkqpp2GG...> The total number of languages which are EGDIS 5? 110. Better yet, some 30 of those are already included in our work. Further, 4 are no longer using the Latin script (although they did in the past) and one appears to no longer have living native speakers. So we are left with 75 languages additional languages. Not thousands. Not hundreds. 75. I’ve created a spreadsheet (attached), the first tab of which builds on Mirjana’s list, but includes columns for the new code points, if any, which appear in each language (as well as the Unicode and name). The first column is a flag. Languages which are already included have a green flag , languages which are no longer EGDIS 5 have a pink flag , and the languages to be added have a yellow flag . How many new code points are there? 26. (Or possibly 28, if someone smarter than me can contrive a way to produce Latin Small Letter E with Breve and Combining Circumflex, or Latin Small Letter O with Breve and Combining Circumflex, that look like the images given in Omniglot for the Jarai language.) Happily, the analysis didn’t take weeks. More like 2 half days. The second tab of the spreadsheet gives the new code points, in the same format used in our repertoire tables, ready to be folded in to our existing tables. Of course, that requires having the References available. So a second attachment gives all the references, in proper form, ready to be tacked on to the end of our Reference section. Of course, we can still refuse, on principle, to include any EGDIS 5 language with less than a million native speakers. Assuming that someone can come up with such a principle. But we need to be clear that the time and effort required to analyze the additional languages cannot be our excuse to do so. Bill
Dear Bill, I find it commendable, that you are putting time and effort into the project to extend the number of languages. In general I'm all favour for supporting minorities. In the case of TLDs, however, I wonder if this effort is really making any difference in practise. If a language only has 100,000 speakers, how many of them would likely register a domain name under a non-ASCII TLD? Already now the number of domain registrations in IDN TLDs is much lower than for ASCII TLDs. And even for ASCII TLDs the number of people registering domains is not very high. I compared some figures and found that the number of domains in respect to the population is 0.1% (for some arbitrary TLDs I checked, this is of course not a representable research). For IDN TLDs the number is likely to be much lower. But even with 0.1% and a population of 100,000, this would lead to 100 domain registration ... for an ASCII TLD. I honestly doubt that interest in applying for such TLDs would be high, if the expected number of domains remains below 100. Finally, speakers of such a language are not per se excluded from applying for a TLD in their language. Just a small percentage of words would be excluded (namely those having a letter that is not already in our repertoire). Together with the fact (as Sarmad just said) that for each of those additional languages we need to find positive evidence for their inclusion, I would prefer to keep our threshold at 1,000,000. Cheers, Michael -- ____________________________________________________________________ | | | knipp | Knipp Medien und Kommunikation GmbH ------- Technologiepark Martin-Schmeisser-Weg 9 44227 Dortmund Germany Dipl.-Informatiker Fon: +49 231 9703-0 Fax: +49 231 9703-200 Dr. Michael Bauland SIP: Michael.Bauland@knipp.de Software Development E-mail: Michael.Bauland@knipp.de Register Court: Amtsgericht Dortmund, HRB 13728 Chief Executive Officers: Dietmar Knipp, Elmar Knipp
Dear Michael, I am trying, without much success I confess, to understand what harm would be done by including letters from languages which have fewer speakers. Who is harmed? How are they harmed? I'm just not seeing it. Bill On Monday, December 6, 2021, 12:02:22 AM PST, Michael Bauland via Latingp <latingp@icann.org> wrote: Dear Bill, I find it commendable, that you are putting time and effort into the project to extend the number of languages. In general I'm all favour for supporting minorities. In the case of TLDs, however, I wonder if this effort is really making any difference in practise. If a language only has 100,000 speakers, how many of them would likely register a domain name under a non-ASCII TLD? Already now the number of domain registrations in IDN TLDs is much lower than for ASCII TLDs. And even for ASCII TLDs the number of people registering domains is not very high. I compared some figures and found that the number of domains in respect to the population is 0.1% (for some arbitrary TLDs I checked, this is of course not a representable research). For IDN TLDs the number is likely to be much lower. But even with 0.1% and a population of 100,000, this would lead to 100 domain registration ... for an ASCII TLD. I honestly doubt that interest in applying for such TLDs would be high, if the expected number of domains remains below 100. Finally, speakers of such a language are not per se excluded from applying for a TLD in their language. Just a small percentage of words would be excluded (namely those having a letter that is not already in our repertoire). Together with the fact (as Sarmad just said) that for each of those additional languages we need to find positive evidence for their inclusion, I would prefer to keep our threshold at 1,000,000. Cheers, Michael -- ____________________________________________________________________ | | | knipp | Knipp Medien und Kommunikation GmbH ------- Technologiepark Martin-Schmeisser-Weg 9 44227 Dortmund Germany Dipl.-Informatiker Fon: +49 231 9703-0 Fax: +49 231 9703-200 Dr. Michael Bauland SIP: Michael.Bauland@knipp.de Software Development E-mail: Michael.Bauland@knipp.de Register Court: Amtsgericht Dortmund, HRB 13728 Chief Executive Officers: Dietmar Knipp, Elmar Knipp _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on. | | Virus-free. www.avg.com |
Hi Bill, On 06.12.2021 09:20, Bill Jouris wrote:
Dear Michael,
I am trying, without much success I confess, to understand what /harm/ would be done by including letters from languages which have fewer speakers. Who is harmed? How are they harmed? I'm just not seeing it.
I don't know. I agree with you that I also do not see any harm being done. Cheers, Michael -- ____________________________________________________________________ | | | knipp | Knipp Medien und Kommunikation GmbH ------- Technologiepark Martin-Schmeisser-Weg 9 44227 Dortmund Germany Dipl.-Informatiker Fon: +49 231 9703-0 Fax: +49 231 9703-200 Dr. Michael Bauland SIP: Michael.Bauland@knipp.de Software Development E-mail: Michael.Bauland@knipp.de Register Court: Amtsgericht Dortmund, HRB 13728 Chief Executive Officers: Dietmar Knipp, Elmar Knipp
I'm with Michael here. While I appreciate your initiative and drive, Bill. I could not justify the effort required to achieve what you are seeking. We have a good product that we can deliver now. Dennis On 12/6/21, 3:02 AM, "Latingp on behalf of Michael Bauland via Latingp" <latingp-bounces@icann.org on behalf of latingp@icann.org> wrote: Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Dear Bill, I find it commendable, that you are putting time and effort into the project to extend the number of languages. In general I'm all favour for supporting minorities. In the case of TLDs, however, I wonder if this effort is really making any difference in practise. If a language only has 100,000 speakers, how many of them would likely register a domain name under a non-ASCII TLD? Already now the number of domain registrations in IDN TLDs is much lower than for ASCII TLDs. And even for ASCII TLDs the number of people registering domains is not very high. I compared some figures and found that the number of domains in respect to the population is 0.1% (for some arbitrary TLDs I checked, this is of course not a representable research). For IDN TLDs the number is likely to be much lower. But even with 0.1% and a population of 100,000, this would lead to 100 domain registration ... for an ASCII TLD. I honestly doubt that interest in applying for such TLDs would be high, if the expected number of domains remains below 100. Finally, speakers of such a language are not per se excluded from applying for a TLD in their language. Just a small percentage of words would be excluded (namely those having a letter that is not already in our repertoire). Together with the fact (as Sarmad just said) that for each of those additional languages we need to find positive evidence for their inclusion, I would prefer to keep our threshold at 1,000,000. Cheers, Michael -- ____________________________________________________________________ | | | knipp | Knipp Medien und Kommunikation GmbH ------- Technologiepark Martin-Schmeisser-Weg 9 44227 Dortmund Germany Dipl.-Informatiker Fon: +49 231 9703-0 Fax: +49 231 9703-200 Dr. Michael Bauland SIP: Michael.Bauland@knipp.de Software Development E-mail: Michael.Bauland@knipp.de Register Court: Amtsgericht Dortmund, HRB 13728 Chief Executive Officers: Dietmar Knipp, Elmar Knipp _______________________________________________ Latingp mailing list Latingp@icann.org https://secure-web.cisco.com/1tpRCQsA0ZwszK0A1zrwKIVDLy-CRFqSrd_XIfg-nR_REcO... _______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://secure-web.cisco.com/1wusWzV14SZBnNeBYy0tuyO6peLTyPBbYnX53cRIW6EW8GR...) and the website Terms of Service (https://secure-web.cisco.com/1ug7pi5-vTzSr8ThghHn_fEUXc48Pfhbu8mHX6z97aGE6F6... You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
What effort??? We're basically talking about a half hour to edit the repertoire section. - Remove the statement about the 1 million threshold - Add the various code point rows in tab 2 to the two tables of code points- Add the References (a straight Copy and Paste) If we can't justify that much effort, we can't justify the effort required to talk about the comments at length. Just say "Thank you for your input. We disagree." Then publish and be done with it. We can then kick back and relax for the holidays. After all, it's not like the Panel has the least intention of responding otherwise (with the possible exception of the request to alphabetize the list of languages by language name). Bill Sent from Yahoo Mail on Android On Mon, Dec 6, 2021 at 7:21 AM, Tan Tanaka, Dennis via Latingp<latingp@icann.org> wrote: I'm with Michael here. While I appreciate your initiative and drive, Bill. I could not justify the effort required to achieve what you are seeking. We have a good product that we can deliver now. Dennis On 12/6/21, 3:02 AM, "Latingp on behalf of MiWe're basically tLkingchael Bauland via Latingp" <latingp-bounces@icann.org on behalf of latingp@icann.org> wrote: Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Dear Bill, I find it commendable, that you are putting time and effort into the project to extend the number of languages. In general I'm all favour for supporting minorities. In the case of TLDs, however, I wonder if this effort is really making any difference in practise. If a language only has 100,000 speakers, how many of them would likely register a domain name under a non-ASCII TLD? Already now the number of domain registrations in IDN TLDs is much lower than for ASCII TLDs. And even for ASCII TLDs the number of people registering domains is not very high. I compared some figures and found that the number of domains in respect to the population is 0.1% (for some arbitrary TLDs I checked, this is of course not a representable research). For IDN TLDs the number is likely to be much lower. But even with 0.1% and a population of 100,000, this would lead to 100 domain registration ... for an ASCII TLD. I honestly doubt that interest in applying for such TLDs would be high, if the expected number of domains remains below 100. Finally, speakers of such a language are not per se excluded from applying for a TLD in their language. Just a small percentage of words would be excluded (namely those having a letter that is not already in our repertoire). Together with the fact (as Sarmad just said) that for each of those additional languages we need to find positive evidence for their inclusion, I would prefer to keep our threshold at 1,000,000. Cheers, Michael -- ____________________________________________________________________ | | | knipp | Knipp Medien und Kommunikation GmbH ------- Technologiepark Martin-Schmeisser-Weg 9 44227 Dortmund Germany Dipl.-Informatiker Fon: +49 231 9703-0 Fax: +49 231 9703-200 Dr. Michael Bauland SIP: Michael.Bauland@knipp.de Software Development E-mail: Michael.Bauland@knipp.de Register Court: Amtsgericht Dortmund, HRB 13728 Chief Executive Officers: Dietmar Knipp, Elmar Knipp _______________________________________________ Latingp mailing list Latingp@icann.org https://secure-web.cisco.com/1tpRCQsA0ZwszK0A1zrwKIVDLy-CRFqSrd_XIfg-nR_REcO... _______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://secure-web.cisco.com/1wusWzV14SZBnNeBYy0tuyO6peLTyPBbYnX53cRIW6EW8GR...) and the website Terms of Service (https://secure-web.cisco.com/1ug7pi5-vTzSr8ThghHn_fEUXc48Pfhbu8mHX6z97aGE6F6... You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on. _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
Icelandic is a level 2 language with only some 300,000 speakers. Icelandic was never questioned. Mats -- --- Mats Dufberg mats.dufberg@internetstiftelsen.se Technical Expert Internetstiftelsen (The Swedish Internet Foundation) Mobile: +46 73 065 3899 https://internetstiftelsen.se/ -----Original Message----- From: Latingp <latingp-bounces@icann.org> on behalf of ICANN Latin GP <latingp@icann.org> Reply to: Michael Bauland <Michael.Bauland@knipp.de> Date: Monday, 6 December 2021 at 09:02 To: ICANN Latin GP <latingp@icann.org> Subject: Re: [Latingp] Repertoire Expansion Dear Bill, I find it commendable, that you are putting time and effort into the project to extend the number of languages. In general I'm all favour for supporting minorities. In the case of TLDs, however, I wonder if this effort is really making any difference in practise. If a language only has 100,000 speakers, how many of them would likely register a domain name under a non-ASCII TLD? Already now the number of domain registrations in IDN TLDs is much lower than for ASCII TLDs. And even for ASCII TLDs the number of people registering domains is not very high. I compared some figures and found that the number of domains in respect to the population is 0.1% (for some arbitrary TLDs I checked, this is of course not a representable research). For IDN TLDs the number is likely to be much lower. But even with 0.1% and a population of 100,000, this would lead to 100 domain registration ... for an ASCII TLD. I honestly doubt that interest in applying for such TLDs would be high, if the expected number of domains remains below 100. Finally, speakers of such a language are not per se excluded from applying for a TLD in their language. Just a small percentage of words would be excluded (namely those having a letter that is not already in our repertoire). Together with the fact (as Sarmad just said) that for each of those additional languages we need to find positive evidence for their inclusion, I would prefer to keep our threshold at 1,000,000. Cheers, Michael -- ____________________________________________________________________ | | | knipp | Knipp Medien und Kommunikation GmbH ------- Technologiepark Martin-Schmeisser-Weg 9 44227 Dortmund Germany Dipl.-Informatiker Fon: +49 231 9703-0 Fax: +49 231 9703-200 Dr. Michael Bauland SIP: Michael.Bauland@knipp.de Software Development E-mail: Michael.Bauland@knipp.de Register Court: Amtsgericht Dortmund, HRB 13728 Chief Executive Officers: Dietmar Knipp, Elmar Knipp _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
Dear colleagues, why don't we confront IP with the fact that we do have languages with far less speakers included and that the choice of IP of this EGIDS scale as cut-off for inclusion of languages seems to prove to be a problem in the case of our user community and the range of languages represented in Latin script, and that we have received public comments pointing out this issue. We could point out that we neither provided evidence for the relevance of the Icelandic community to the specific context of IDNs, and ask them what kind of evidence they would then need for those other 75 languages. I'm confident they will give us a workable answer. Best, Meikal Am 6. Dez. 2021, 17:08 +0100 schrieb Mats Dufberg via Latingp <latingp@icann.org>:
Icelandic is a level 2 language with only some 300,000 speakers. Icelandic was never questioned.
Mats
--
--- Mats Dufberg mats.dufberg@internetstiftelsen.se Technical Expert Internetstiftelsen (The Swedish Internet Foundation) Mobile: +46 73 065 3899 https://internetstiftelsen.se/
-----Original Message----- From: Latingp <latingp-bounces@icann.org> on behalf of ICANN Latin GP <latingp@icann.org> Reply to: Michael Bauland <Michael.Bauland@knipp.de> Date: Monday, 6 December 2021 at 09:02 To: ICANN Latin GP <latingp@icann.org> Subject: Re: [Latingp] Repertoire Expansion
Dear Bill,
I find it commendable, that you are putting time and effort into the project to extend the number of languages. In general I'm all favour for supporting minorities. In the case of TLDs, however, I wonder if this effort is really making any difference in practise.
If a language only has 100,000 speakers, how many of them would likely register a domain name under a non-ASCII TLD? Already now the number of domain registrations in IDN TLDs is much lower than for ASCII TLDs. And even for ASCII TLDs the number of people registering domains is not very high. I compared some figures and found that the number of domains in respect to the population is 0.1% (for some arbitrary TLDs I checked, this is of course not a representable research). For IDN TLDs the number is likely to be much lower. But even with 0.1% and a population of 100,000, this would lead to 100 domain registration ... for an ASCII TLD.
I honestly doubt that interest in applying for such TLDs would be high, if the expected number of domains remains below 100.
Finally, speakers of such a language are not per se excluded from applying for a TLD in their language. Just a small percentage of words would be excluded (namely those having a letter that is not already in our repertoire).
Together with the fact (as Sarmad just said) that for each of those additional languages we need to find positive evidence for their inclusion, I would prefer to keep our threshold at 1,000,000.
Cheers,
Michael
-- ____________________________________________________________________ | | | knipp | Knipp Medien und Kommunikation GmbH ------- Technologiepark Martin-Schmeisser-Weg 9 44227 Dortmund Germany
Dipl.-Informatiker Fon: +49 231 9703-0 Fax: +49 231 9703-200 Dr. Michael Bauland SIP: Michael.Bauland@knipp.de Software Development E-mail: Michael.Bauland@knipp.de
Register Court: Amtsgericht Dortmund, HRB 13728
Chief Executive Officers: Dietmar Knipp, Elmar Knipp _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp
_______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
_______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp
_______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
I added to the draft the reference that points to Obolo’s EGDIS scale 5-6a designation. The source I found aggregates groups into blocks, e.g. 0-4, 5-6a, etc. If someone could find a reference with the exact specific scale we can replace it — I agree with Bill, what we have found is that languages are in one group. But this source seems to represent data in a slightly different way, hence the grouping. Dennis From: Latingp <latingp-bounces@icann.org> on behalf of Bill Jouris via Latingp <latingp@icann.org> Reply-To: Bill Jouris <b_jouris@yahoo.com> Date: Sunday, December 5, 2021 at 9:02 PM To: Latin GP <latingp@icann.org> Subject: [EXTERNAL] [Latingp] Repertoire Expansion Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Dear colleagues, During our meeting last week, there were concerns expressed about expanding the number of languages we consider for our repertoire. In particular the number of additional languages was thought to be large (Dennis’ draft response says “hundreds, if not thousands, of languages around the world”) and the time required to deal with them amounting to weeks. Good news: · There aren’t · It didn’t There may be as many as a thousand languages which, at one time or another, were written using the Latin script. But the IDN project has been clear from early on that it would only consider scripts from “living languages.” It does not seem unreasonable, therefore, that only living languages should be considered when analyzing a given script. And the number of living languages using the Latin script is approximately 450. (Of which, we have already done over 200.) Furthermore, to be worth including in our analysis a language would need to have what the EGIDS 5 definition calls “literature in a standardized form”. (Dennis, in his comments in the meeting and his draft response, appears to suggest conflating EGDIS 5 and EGIDS 6a. However, it seems to me that this confuses the issue. The languages in EGDIS 6a do not involve (yet) a standard orthography; that’s why they aren’t EGDIS 5.) What the comments are therefore suggesting, it seems to me, is eliminating the 1,000,000 native speakers threshold, and including every language which EGDIS 5. An expansion of our work; but not, as we shall see, an enormous one. Mirjana noted during the meeting that it had taken her 3 months to compile the list of languages which we analyzed initially. The implication being that it would take as long to do the same this time. Fortunately, the fruits of her labors then are still available to us now. In particular, her compilation of languages using the Latin script which are EGDIS 5. See https://docs.google.com/document/d/1PwUa4Tkqpp2GGz8-hYDbKz357BSlMG6vkbrAmUqBcTQ/edit#<https://docs.google.com/document/d/1PwUa4Tkqpp2GGz8-hYDbKz357BSlMG6vkbrAmUqBcTQ/edit> The total number of languages which are EGDIS 5? 110. Better yet, some 30 of those are already included in our work. Further, 4 are no longer using the Latin script (although they did in the past) and one appears to no longer have living native speakers. So we are left with 75 languages additional languages. Not thousands. Not hundreds. 75. I’ve created a spreadsheet (attached), the first tab of which builds on Mirjana’s list, but includes columns for the new code points, if any, which appear in each language (as well as the Unicode and name). The first column is a flag. Languages which are already included have a green flag , languages which are no longer EGDIS 5 have a pink flag , and the languages to be added have a yellow flag . How many new code points are there? 26. (Or possibly 28, if someone smarter than me can contrive a way to produce Latin Small Letter E with Breve and Combining Circumflex, or Latin Small Letter O with Breve and Combining Circumflex, that look like the images given in Omniglot for the Jarai language.) Happily, the analysis didn’t take weeks. More like 2 half days. The second tab of the spreadsheet gives the new code points, in the same format used in our repertoire tables, ready to be folded in to our existing tables. Of course, that requires having the References available. So a second attachment gives all the references, in proper form, ready to be tacked on to the end of our Reference section. Of course, we can still refuse, on principle, to include any EGDIS 5 language with less than a million native speakers. Assuming that someone can come up with such a principle. But we need to be clear that the time and effort required to analyze the additional languages cannot be our excuse to do so. Bill
participants (6)
-
Bill Jouris -
Mats Dufberg -
Meikal Mumin -
Michael Bauland -
Sarmad Hussain -
Tan Tanaka, Dennis