Re: [Cyrillicgp] F2F meeting - Istanbul, 25 Nov - further info
Dear Dmitry, All, The work we are undertaking is guided by general principles in RFC 6912 and more specifically by the Procedure <https://www.icann.org/en/system/files/files/draft-lgr-procedure-20mar13-en.p...> (pp. 41-43). These include (among others) the Inclusion principle: “… every … code point is excluded until reviewed and then explicitly included” and the Conservatism principle: “…reject anything … not positively … safe, [and] in the event of disagreement … the proposed rule … is automatically excluded from the root label generation rules.” It will be useful to review all the principles. Root zone will be more conservative than second level, meaning that everything in the second level IDN table may not be in the root zone. Thus, inclusion of a code point at second level does not presume inclusion in the root zone. The case for inclusion of each code point in the root zone has to be made in its own right. Other generation panels have used a mix of the following sources to identify characters used in a language: a. National and/or international standards for different languages using the script b. Primary school (e.g. Grade-1 or Grade-2) books published by the Education Ministry in a country c. Grammar books for a language listing characters used to write a language d. Use of characters in contemporary published materials for general public, e.g. online newspapers e. Scholarly articles commenting on use of script for a particular language f. More general online sources, e.g. see http://omniglot.com/writing/index.htm g. Second level IDN tables Hope this helps. Regards, Sarmad From: Dmitry Kohmanyuk [mailto:dk@hostmaster.ua] Sent: Tuesday, November 22, 2016 5:51 PM To: Sarmad Hussain <sarmad.hussain@icann.org> Cc: cyrillicgp@icann.org Subject: [EXTERNAL] Re: [Cyrillicgp] F2F meeting - Istanbul, 25 Nov - further info I have a problem understating 1a. What evidence we need to provide that letters are worth including or not, besides being used in IDN tables? Surely, some letters may be not part of MSR - but apart from that, languages are here with their alphabet and censoring letters out without "independent evidence" (or what?) is bad idea - goes against "be liberal in what you accept" principle. I can see why adding letters beyond use in IDN - for example, some languages are not used for ccTLD domains yet have been widely used and their alphabets are included into consideration. So please clarify this position before we start. How this was handled in other panels? On 22 лист. 2016 р., at 13:36, Sarmad Hussain <sarmad.hussain@icann.org <mailto:sarmad.hussain@icann.org> > wrote: Dear All, Let’s use the attached consolidated excel sheet as a starting point. It lists all the code points allowed by MSR-2. I have added columns which we eventually need to fill in. This requires following tasks: 1. For each code point, list the language (including EGIDS #) which requires the code point to be included. a. This will also require giving an argument and a reference in support of the code point. For example, if the code point is used in a second level IDN table, that is a possible argument in support. However, please note that having a code point included at the second level may not be a sufficient argument in itself. For cases which are unclear, additional independent evidence may be needed. 2. For each code point, determine if it is a variant of another code point within Cyrillic script. For this purpose, we will need to have a set of principles which clarify what the community considers as such variants. 3. For each code point, determine if there are variants for related scripts, as listed (Latin, Greek, Armenian, Georgian?). Cross-script variants are generally based on homoglyph relations (i.e. visually identical). However, we need to discuss how we will finalize these. Following the agenda circulated by Dusan (see the trailing email), for the meeting before lunch, I would suggest that we do: i. a quick pass on 1. above to list code points which should be included (and mark those code points currently excluded), then ii. use the code points included to do the analysis for 3. above (columns H-K in the excel sheet attached). We do it this in the order (simple to complex cases): a. Georgian? b. Armenian (also see analysis done by Armenian script GP) c. Greek d. Latin Again, we should limit analysis to exact match at this time (though we can identify close to exact match cases separately for any further consideration). We will take up 1. and 2. above in detail during the session after lunch. Please see the data in the Dropbox folder --> F2F meeting Istanbul 25 Nov 16. Regards, Sarmad From: cyrillicgp-bounces@icann.org <mailto:cyrillicgp-bounces@icann.org> [mailto:cyrillicgp-bounces@icann.org] On Behalf Of Sarmad Hussain Sent: Tuesday, November 22, 2016 3:11 PM To: cyrillicgp@icann.org <mailto:cyrillicgp@icann.org> Cc: Eric Evrard <eric.evrard@icann.org <mailto:eric.evrard@icann.org> > Subject: Re: [Cyrillicgp] F2F meeting - Istanbul, 25 Nov - further info Dear All, Please connect using the following link and information for the meeting on 25 November. We plan to start at 9am Istanbul time. How to connect: Link to Adobe Connect: <https://participate.icann.org/idn/> https://participate.icann.org/idn/ *Upon logging into Adobe Connect, a pop up window will provide you the option to Dial Out to your Phone. Enter your Phone Number* (Remember to change the Country Code if needed). After joining the call, as a courtesy to others and the presenters, please MUTE your phone. This can be done by selecting *6 on your keypad. To UNMUTE select *6 again. If you are Unable to log into Adobe Connect and can only join via phone: <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.myrcplus.com_cnums....> https://www.myrcplus.com/cnums.asp?bwebid=8369444&ppc=1064158692&num=1-719-457-1414[myrcplus.com] Participant Passcode: 1064158692 Regards, Sarmad From: cyrillicgp-bounces@icann.org <mailto:cyrillicgp-bounces@icann.org> [mailto:cyrillicgp-bounces@icann.org] On Behalf Of Dusan Stojicevic Sent: Wednesday, November 16, 2016 5:12 PM To: cyrillicgp@icann.org <mailto:cyrillicgp@icann.org> Subject: [Cyrillicgp] F2F meeting - Istanbul, 25 Nov - further info Dear all, F2F meeting of Cyrillic GP will be organized in Istanbul, on 25th November. Venue: ICANN hub in Istanbul AGENDA> 09.00 – 13.00 Work on items 0.2, 0.3 and 0.4 from our working plan defined in our Proposal... This includes: analysis of the National Cyrillic Alphabet to identify cases of confusion Code Points relative to the Latin, Greek and other Scripts (Georgian, Armenian). We have prepared document (by Dmitry K) as a set of Code Point which we will be use. Also, we have a lot of work done from Dmitry B. and we will include his work. Output: Preparation of consolidated confusion tables in form of the summary table for Cyrillic scripts Working method: we will decide how to work on site, on the basis of work already done and regarding the number of in situ participants. One coffee break included. During this session the role of chairman of the session will be in Sarmad hand. 13.00 – 14.00 Lunch 14.00 – 20.00 Work on items 1.1, 1.2, 2.1, 2.2, and 3.1 from our working plan defined in our Proposal... This includes: Based on the confusion tables, building two documents listed bellow in Output, and Analysis for Whole Label Rules Rules (if any). Output: - summary document on Code Points of Cyrillic Scripts which should be included, excluded or deferred (as basis for Label Generation Ruleset - LGR) for TLD labels - summary document on Variants Working method: we will decide how to work on site, on the basis of work already done and regarding the number of in situ participants. One coffee break included. During this work the role of chairman of the session will be on me. 20.00 - … Dinner. Backup day is 26th November, in the morning. Outcome: goal is to finish final report, which will be sent to the list for further discussion. Online participation is possible via Adobe Connect (ICANN people will send us details on time). Other Instructions: In the new created Dropbox folder (F2F meeting) you can find tables, repertoire and some of the work that have been done during the past. Explanations of added files in this folder, are in README:TXT. Please, add all docs of importance, if you think I have missed some. If you have problems of connecting to Dropbox, please write to the list. Online resources: MSR-2 - https://www.icann.org/en/system/files/files/msr-2-non-cjk-13apr15-en.pdfž[icann.org] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.icann.org_en_system...> Proposal... - https://www.icann.org/en/system/files/files/cyrillic-script-lgr-proposal-10dec15-en.pdf[icann.org] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.icann.org_en_system...> About CGP - https://community.icann.org/display/croscomlgrprocedure/Cyrillic+Script+GP[community.icann.org] <https://urldefense.proofpoint.com/v2/url?u=https-3A__community.icann.org_dis...> Russian languages - http://web-corpora.net/wsgi3/minorlangs/view/overall[web-corpora.net] <https://urldefense.proofpoint.com/v2/url?u=http-3A__web-2Dcorpora.net_wsgi3_...> Also feel free to add/share the links. Cheers, Dusan <template.xlsx> _______________________________________________ Cyrillicgp mailing list Cyrillicgp@icann.org <mailto:Cyrillicgp@icann.org> https://mm.icann.org/mailman/listinfo/cyrillicgp
On 22 лист. 2016 р., at 16:04, Sarmad Hussain <sarmad.hussain@icann.org> wrote:
Dear Dmitry, All,
The work we are undertaking is guided by general principles in RFC 6912 and more specifically by the Procedure (pp. 41-43).
These include (among others) the Inclusion principle: “… every … code point is excluded until reviewed and then explicitly included” and the Conservatism principle: “…reject anything … not positively … safe, [and] in the event of disagreement … the proposed rule … is automatically excluded from the root label generation rules.” It will be useful to review all the principles.
I know this document. My name is at its end in list of LGR working group members. Conservatism principle refers to integration panel, not to our generation panel. And code points inclusion into rules would be done on basis of panel consensus. So there is no need to have other guidelines. I don't want to discuss this procedure here.
Root zone will be more conservative than second level, meaning that everything in the second level IDN table may not be in the root zone. Thus, inclusion of a code point at second level does not presume inclusion in the root zone. The case for inclusion of each code point in the root zone has to be made in its own right.
Other generation panels have used a mix of the following sources to identify characters used in a language:
a. National and/or international standards for different languages using the script b. Primary school (e.g. Grade-1 or Grade-2) books published by the Education Ministry in a country c. Grammar books for a language listing characters used to write a language d. Use of characters in contemporary published materials for general public, e.g. online newspapers e. Scholarly articles commenting on use of script for a particular language f. More general online sources, e.g. see http://omniglot.com/writing/index.htm g. Second level IDN tables
Well, I don't know if anybody is bringing scholar books or grammar books but I would have some Ukrainian literary works just to be on the same side as proof that we use our alphabet completely.
Hope this helps.
Regards, Sarmad
From: Dmitry Kohmanyuk [mailto:dk@hostmaster.ua] Sent: Tuesday, November 22, 2016 5:51 PM To: Sarmad Hussain <sarmad.hussain@icann.org> Cc: cyrillicgp@icann.org Subject: [EXTERNAL] Re: [Cyrillicgp] F2F meeting - Istanbul, 25 Nov - further info
I have a problem understating 1a. What evidence we need to provide that letters are worth including or not, besides being used in IDN tables? Surely, some letters may be not part of MSR - but apart from that, languages are here with their alphabet and censoring letters out without "independent evidence" (or what?) is bad idea - goes against "be liberal in what you accept" principle.
I can see why adding letters beyond use in IDN - for example, some languages are not used for ccTLD domains yet have been widely used and their alphabets are included into consideration.
So please clarify this position before we start. How this was handled in other panels?
On 22 лист. 2016 р., at 13:36, Sarmad Hussain <sarmad.hussain@icann.org> wrote:
Dear All,
Let’s use the attached consolidated excel sheet as a starting point. It lists all the code points allowed by MSR-2.
I have added columns which we eventually need to fill in. This requires following tasks:
1. For each code point, list the language (including EGIDS #) which requires the code point to be included. a. This will also require giving an argument and a reference in support of the code point. For example, if the code point is used in a second level IDN table, that is a possible argument in support. However, please note that having a code point included at the second level may not be a sufficient argument in itself. For cases which are unclear, additional independent evidence may be needed. 2. For each code point, determine if it is a variant of another code point within Cyrillic script. For this purpose, we will need to have a set of principles which clarify what the community considers as such variants. 3. For each code point, determine if there are variants for related scripts, as listed (Latin, Greek, Armenian, Georgian?). Cross-script variants are generally based on homoglyph relations (i.e. visually identical). However, we need to discuss how we will finalize these.
Following the agenda circulated by Dusan (see the trailing email), for the meeting before lunch, I would suggest that we do:
i. a quick pass on 1. above to list code points which should be included (and mark those code points currently excluded), then
ii. use the code points included to do the analysis for 3. above (columns H-K in the excel sheet attached). We do it this in the order (simple to complex cases):
a. Georgian? b. Armenian (also see analysis done by Armenian script GP) c. Greek d. Latin
Again, we should limit analysis to exact match at this time (though we can identify close to exact match cases separately for any further consideration).
We will take up 1. and 2. above in detail during the session after lunch.
Please see the data in the Dropbox folder à F2F meeting Istanbul 25 Nov 16.
Regards, Sarmad
From: cyrillicgp-bounces@icann.org [mailto:cyrillicgp-bounces@icann.org] On Behalf Of Sarmad Hussain Sent: Tuesday, November 22, 2016 3:11 PM To: cyrillicgp@icann.org Cc: Eric Evrard <eric.evrard@icann.org> Subject: Re: [Cyrillicgp] F2F meeting - Istanbul, 25 Nov - further info
Dear All,
Please connect using the following link and information for the meeting on 25 November. We plan to start at 9am Istanbul time.
How to connect: Link to Adobe Connect: https://participate.icann.org/idn/ *Upon logging into Adobe Connect, a pop up window will provide you the option to Dial Out to your Phone. Enter your Phone Number* (Remember to change the Country Code if needed).
After joining the call, as a courtesy to others and the presenters, please MUTE your phone. This can be done by selecting *6 on your keypad. To UNMUTE select *6 again.
If you are Unable to log into Adobe Connect and can only join via phone: https://www.myrcplus.com/cnums.asp?bwebid=8369444&ppc=1064158692&num=1-719-457-1414[myrcplus.com] Participant Passcode: 1064158692
Regards, Sarmad
From: cyrillicgp-bounces@icann.org [mailto:cyrillicgp-bounces@icann.org] On Behalf Of Dusan Stojicevic Sent: Wednesday, November 16, 2016 5:12 PM To: cyrillicgp@icann.org Subject: [Cyrillicgp] F2F meeting - Istanbul, 25 Nov - further info
Dear all,
F2F meeting of Cyrillic GP will be organized in Istanbul, on 25th November. Venue: ICANN hub in Istanbul
AGENDA>
09.00 – 13.00 Work on items 0.2, 0.3 and 0.4 from our working plan defined in our Proposal... This includes: analysis of the National Cyrillic Alphabet to identify cases of confusion Code Points relative to the Latin, Greek and other Scripts (Georgian, Armenian). We have prepared document (by Dmitry K) as a set of Code Point which we will be use. Also, we have a lot of work done from Dmitry B. and we will include his work. Output: Preparation of consolidated confusion tables in form of the summary table for Cyrillic scripts Working method: we will decide how to work on site, on the basis of work already done and regarding the number of in situ participants. One coffee break included. During this session the role of chairman of the session will be in Sarmad hand.
13.00 – 14.00 Lunch
14.00 – 20.00 Work on items 1.1, 1.2, 2.1, 2.2, and 3.1 from our working plan defined in our Proposal... This includes: Based on the confusion tables, building two documents listed bellow in Output, and Analysis for Whole Label Rules Rules (if any). Output: - summary document on Code Points of Cyrillic Scripts which should be included, excluded or deferred (as basis for Label Generation Ruleset - LGR) for TLD labels - summary document on Variants Working method: we will decide how to work on site, on the basis of work already done and regarding the number of in situ participants. One coffee break included. During this work the role of chairman of the session will be on me.
20.00 - … Dinner.
Backup day is 26th November, in the morning. Outcome: goal is to finish final report, which will be sent to the list for further discussion. Online participation is possible via Adobe Connect (ICANN people will send us details on time).
Other Instructions: In the new created Dropbox folder (F2F meeting) you can find tables, repertoire and some of the work that have been done during the past. Explanations of added files in this folder, are in README:TXT. Please, add all docs of importance, if you think I have missed some. If you have problems of connecting to Dropbox, please write to the list.
Online resources: MSR-2 - https://www.icann.org/en/system/files/files/msr-2-non-cjk-13apr15-en.pdfž[icann.org] Proposal... - https://www.icann.org/en/system/files/files/cyrillic-script-lgr-proposal-10dec15-en.pdf[icann.org] About CGP - https://community.icann.org/display/croscomlgrprocedure/Cyrillic+Script+GP[community.icann.org] Russian languages - http://web-corpora.net/wsgi3/minorlangs/view/overall[web-corpora.net] Also feel free to add/share the links.
Cheers, Dusan
<template.xlsx> _______________________________________________ Cyrillicgp mailing list Cyrillicgp@icann.org https://mm.icann.org/mailman/listinfo/cyrillicgp
participants (2)
-
Dmitry Kohmanyuk -
Sarmad Hussain