character-based analysis

Textual Solutions

May 25, 2017

7:21 a.m.

Dear All, Each member of the Rep. group may be invited to look at one character only across the languages listed. What do you think? Pls see sample attached and comment. Thanks. NPK

Attachments:

attachment.html (text/html — 225 bytes)
repertoireg-workinprogress.xlsx (application/vnd.openxmlformats-officedocument.spreadsheetml.sheet — 10.2 KB)

Show replies by date

1. If not found we still do not know if it should be included or not. 2. We have to return to all languages for characters that we have not found elsewhere. 3. We have to investigate all characters in every language anyway to make to see if it has any combination of base character and combining mark. For every character (or combination) that we want to include we should find evidence that it is used according to the principles. To have a firm ground we should not just register for one language, but for several, in case some language is excluded at a later stage or that evidence is found to be invalid. Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From: <latingp-bounces@icann.org> on behalf of Textual Solutions <textualsolutions@gmail.com> Date: Thursday 25 May 2017 at 09:21 To: Latin GP <latingp@icann.org> Subject: [Latingp] character-based analysis Dear All, Each member of the Rep. group may be invited to look at one character only across the languages listed. What do you think? Pls see sample attached and comment. Thanks. NPK

Mirjana Tasić

2:49 p.m.

Dear Nebiye, I am trying to understand the idea behind your proposal. What is the purpose of looking for specific characters through all languages. Are you trying to develop the Repertoire of all characters used in languages with Latin script for future processing? Regards Mirjana From: <latingp-bounces@icann.org> on behalf of Mats Dufberg <mats.dufberg@iis.se> Date: Thursday, May 25, 2017 at 12:17 To: Textual Solutions <textualsolutions@gmail.com>, Latin GP <latingp@icann.org> Subject: Re: [Latingp] character-based analysis 1. If not found we still do not know if it should be included or not. 2. We have to return to all languages for characters that we have not found elsewhere. 3. We have to investigate all characters in every language anyway to make to see if it has any combination of base character and combining mark. For every character (or combination) that we want to include we should find evidence that it is used according to the principles. To have a firm ground we should not just register for one language, but for several, in case some language is excluded at a later stage or that evidence is found to be invalid. Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From: <latingp-bounces@icann.org> on behalf of Textual Solutions <textualsolutions@gmail.com> Date: Thursday 25 May 2017 at 09:21 To: Latin GP <latingp@icann.org> Subject: [Latingp] character-based analysis Dear All, Each member of the Rep. group may be invited to look at one character only across the languages listed. What do you think? Pls see sample attached and comment. Thanks. NPK

Ahmed Bakhat

June 2017

8:20 p.m.

Dear Mirjana and all group members of Repertoire sub group, I think first we have to focus on available characters under available Unicode charts for Latin Script, then we have to devise principles / rules for inclusion / exclusion / deffer, on the basis of usage in different languages. After having a table, we have to look for the usage in language. I am attaching first draft of principles for Latin Script, available Unicode charts and MSR-2 documents, for start of the discussion of the group, thous 1st chart (0000 to 007F) does not need any discussion as it is already in use as ASCII code. Best Regards, Ahmed Bakht On Thursday, May 25, 2017, 7:49:47 PM GMT+5, Mirjana Tasić <Mirjana.Tasic@rnids.rs> wrote: Dear Nebiye, I am trying to understand the idea behind your proposal. What is the purpose of looking for specific characters through all languages. Are you trying to develop the Repertoire of all characters used in languages with Latin script for future processing? Regards Mirjana From: <latingp-bounces@icann.org> on behalf of Mats Dufberg <mats.dufberg@iis.se> Date: Thursday, May 25, 2017 at 12:17 To: Textual Solutions <textualsolutions@gmail.com>, Latin GP <latingp@icann.org> Subject: Re: [Latingp] character-based analysis 1. If not found we still do not know if it should be included or not. 2. We have to return to all languages for characters that we have not found elsewhere. 3. We have to investigate all characters in every language anyway to make to see if it has any combination of base character and combining mark. For every character (or combination) that we want to include we should find evidence that it is used according to the principles. To have a firm ground we should not just register for one language, but for several, in case some language is excluded at a later stage or that evidence is found to be invalid. Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From: <latingp-bounces@icann.org> on behalf of Textual Solutions <textualsolutions@gmail.com> Date: Thursday 25 May 2017 at 09:21 To: Latin GP <latingp@icann.org> Subject: [Latingp] character-based analysis Dear All, Each member of the Rep. group may be invited to look at one character only across the languages listed. What do you think? Pls see sample attached and comment. Thanks. NPK _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp

Mats Dufberg

10:31 a.m.

Ahmed, Your inclusion principles read

...

...
2.1. Letter code point which is a letter and has established contemporary use in a language <<

This is straight forward.

...

...
2.2. Mark code point which represents a required mark, where at least one of the letters it forms has established contemporary use in a language <<

Firstly, I do not think that we want to include mark code points without contextual limitation, i.e. the combination of letter code point and mark or marks is what we want to include. Secondly, the principle should be that the combination has an established use in a language, shouldn't it?

...

...
2.3. Code point which represents a combination of letters in a language which has established contemporary use, where at least one of the constituent letters cannot be represented by a combination of letter code points and mark code points. <<

Can you give an example of what you mean?

...

...
2.4. Code point which represents a lexical word or phrase in a language, which has established contemporary use and cannot be decomposed into a sequence of code points representing letter code points and mark code points. <<

Can you give an example of what you mean? You have a third type of principles, "deferral principles". Deferred to when and what? You refer to "Language Table submitted by ccTLD in the context of IDNA2008 in the IANA repository". My experience is that the ccTLD IDN tables are mostly country based (i.e. supporting multiple languages in a country) rather than language based. Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From: Ahmed Bakhat <ahmedbakhat@yahoo.com> Date: Saturday 3 June 2017 at 22:20 To: Mirjana Tasić <Mirjana.Tasic@rnids.rs>, Mats Dufberg <mats.dufberg@iis.se>, Textual Solutions <textualsolutions@gmail.com>, Latin GP <latingp@icann.org>, Sarmad Hussain <sarmad.hussain@icann.org> Subject: Re: [Latingp] character-based analysis Dear Mirjana and all group members of Repertoire sub group, I think first we have to focus on available characters under available Unicode charts for Latin Script, then we have to devise principles / rules for inclusion / exclusion / deffer, on the basis of usage in different languages. After having a table, we have to look for the usage in language. I am attaching first draft of principles for Latin Script, available Unicode charts and MSR-2 documents, for start of the discussion of the group, thous 1st chart (0000 to 007F) does not need any discussion as it is already in use as ASCII code. Best Regards, Ahmed Bakht On Thursday, May 25, 2017, 7:49:47 PM GMT+5, Mirjana Tasić <Mirjana.Tasic@rnids.rs> wrote: Dear Nebiye, I am trying to understand the idea behind your proposal. What is the purpose of looking for specific characters through all languages. Are you trying to develop the Repertoire of all characters used in languages with Latin script for future processing? Regards Mirjana From: <latingp-bounces@icann.org> on behalf of Mats Dufberg <mats.dufberg@iis.se> Date: Thursday, May 25, 2017 at 12:17 To: Textual Solutions <textualsolutions@gmail.com>, Latin GP <latingp@icann.org> Subject: Re: [Latingp] character-based analysis 1. If not found we still do not know if it should be included or not. 2. We have to return to all languages for characters that we have not found elsewhere. 3. We have to investigate all characters in every language anyway to make to see if it has any combination of base character and combining mark. For every character (or combination) that we want to include we should find evidence that it is used according to the principles. To have a firm ground we should not just register for one language, but for several, in case some language is excluded at a later stage or that evidence is found to be invalid. Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From: <latingp-bounces@icann.org> on behalf of Textual Solutions <textualsolutions@gmail.com> Date: Thursday 25 May 2017 at 09:21 To: Latin GP <latingp@icann.org> Subject: [Latingp] character-based analysis Dear All, Each member of the Rep. group may be invited to look at one character only across the languages listed. What do you think? Pls see sample attached and comment. Thanks. NPK _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp

Textual Solutions

May 2017

3:17 p.m.

thanks Mirjana and Mats for respective messages and comments, since there are 180 languages at the start and that's a big number to be looking at, even for a linguist, I suggest each member of our group should look at one character across all 180 languages and obtain/create/establish etc. the context for it. 25 May 2017 13:17 tarihinde "Mats Dufberg" <mats.dufberg@iis.se> yazdı:

...

1. If not found we still do not know if it should be included or not.

2. We have to return to all languages for characters that we have not found elsewhere.

3. We have to investigate all characters in every language anyway to make to see if it has any combination of base character and combining mark.

For every character (or combination) that we want to include we should find evidence that it is used according to the principles. To have a firm ground we should not just register for one language, but for several, in case some language is excluded at a later stage or that evidence is found to be invalid.

Mats

---

Mats Dufberg

DNS Specialist, IIS

Mobile: +46 73 065 3899 <+46%2073%20065%2038%2099>

https://www.iis.se/en/

*From: *<latingp-bounces@icann.org> on behalf of Textual Solutions < textualsolutions@gmail.com> *Date: *Thursday 25 May 2017 at 09:21 *To: *Latin GP <latingp@icann.org> *Subject: *[Latingp] character-based analysis

Dear All,

Each member of the Rep. group may be invited to look at one character only across the languages listed. What do you think? Pls see sample attached and comment. Thanks.

NPK

Mirjana Tasić

6:34 p.m.

Dear Nebiye, I need some more clarification about the new process you are proposing. Suppose we start working as you proposed, what we shall have at the end of the whole process which will last few months. We shall have a table in which will be all 180 languages with all letters from these languages without any connection to MSR2<https://www.icann.org/news/announcement-2-2015-04-27-en> . How this table can help us to finish our job? MSR2 was produced by Integration Panel. The purpose of our panel is to check weather all characters used by Latin languages are included in MSR2 and find also which characters from MSR2 should be excluded because those characters are not in everyday use. In our Proposal we promised that we shall do this as it was described in our Work Plan. We also decided to process 180 languages in the first round. Link to MSR2 is Maximal Starting Repertoire Version 2 (MSR-2) for the Development of Label Generation Rules for the Root Zone<https://www.icann.org/news/announcement-2-2015-04-27-en> Latin script is defined in the following part of MSR2. MSR-2-Annotated-non-CJK-Tables-20150413<https://www.icann.org/en/system/files/files/msr-2-non-cjk-13apr15-en.pdf> Regards Mirjana From: <latingp-bounces@icann.org> on behalf of Textual Solutions <textualsolutions@gmail.com> Date: Thursday, May 25, 2017 at 17:17 To: Latin GP <latingp@icann.org> Subject: Re: [Latingp] character-based analysis thanks Mirjana and Mats for respective messages and comments, since there are 180 languages at the start and that's a big number to be looking at, even for a linguist, I suggest each member of our group should look at one character across all 180 languages and obtain/create/establish etc. the context for it. 25 May 2017 13:17 tarihinde "Mats Dufberg" <mats.dufberg@iis.se<mailto:mats.dufberg@iis.se>> yazdı: 1. If not found we still do not know if it should be included or not. 2. We have to return to all languages for characters that we have not found elsewhere. 3. We have to investigate all characters in every language anyway to make to see if it has any combination of base character and combining mark. For every character (or combination) that we want to include we should find evidence that it is used according to the principles. To have a firm ground we should not just register for one language, but for several, in case some language is excluded at a later stage or that evidence is found to be invalid. Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899<tel:+46%2073%20065%2038%2099> https://www.iis.se/en/ From: <latingp-bounces@icann.org<mailto:latingp-bounces@icann.org>> on behalf of Textual Solutions <textualsolutions@gmail.com<mailto:textualsolutions@gmail.com>> Date: Thursday 25 May 2017 at 09:21 To: Latin GP <latingp@icann.org<mailto:latingp@icann.org>> Subject: [Latingp] character-based analysis Dear All, Each member of the Rep. group may be invited to look at one character only across the languages listed. What do you think? Pls see sample attached and comment. Thanks. NPK

3254

Age (days ago)

3265

Last active (days ago)

List overview

Download

6 comments

4 participants

participants (4)

Ahmed Bakhat
Mats Dufberg
Mirjana Tasić
Textual Solutions