Minutes from the call on May 30, 2017
Dear colleagues, Here are some notes from the conference call held on May 30, 2017. If I missed some discussions, please correct me. Comments, corrections, suggestions are very appreciated. Regards Mirjana --------------------------------------------------------------------------------------------------- Agenda I. Roll call II. Short report from the first meeting of Repertoire group III. Short reports from the first meeting of Variant group IV. Discussion about F2F meeting Agenda V. AOB Notes: 1. Roll call: Following GP members were present on the call: Bill Jouris Dennis Tan Fiammetta Caccavale Sarmad Hussain Mats Dufberg Mirjana Tasić Ahmed Bakhat Masood Excused: Meikal Mumin, Michael Bauland, Ousmane Moussa Tessa, Hazem Hezzah 1. Short report from Repertoire Group call Nebiye Petek Kurtböke sent the notes from Repertoire Group meeting as follows: 1. Sarmad shared ARABIC GP's work to set an example 2. Highlighted the issue of 'combined marks' and how they should be treated 3. They should be treated as Sequences rather than code points 4. Mirjana thinks diacritics by themselves should be excluded 5. Sarmad underlined 'contextual motivation' 6. In the course of our work new code points are expected to be discovered and we'll move on from MSR2 to MSR3 7. What to do with letters not existing in MSR2 8. Are the principles used by the other script groups (eg arabic) applicable to Latin? 9. How to draft new principles? 10. First round of RGIDS languages 1-4 to be circulated (180). After the Repertoire group meeting Mats Dufberg sent the proposal how the group could proceed: 1. Accept the Arabic GP principles as preliminary principles. 2. Doing the following in parallel: * Start working on languages with EGIDS code 0-3. * Revise and update the principles as needed 3. Set the principles for Latin GP 4. Continue the work with the languages. 5. We should not hesitate to update the principles again after they are set if needed. Ahmet Bakhat Massod commented the Repertoire Group meeting: “Repertoire group should convene its meetings to identify code points and a presentation be prepared for discussion during Face to Face meeting” 1. Short report from Variant Group call Denis Tan has informed GP members about first meeting of the Variant Group. After the call Denis sent the following report to GP mailing list: NOTES: * Reviewed summary of body of knowledge of Latin Variants (for TLDs) https://docs.google.com/document/d/1w76ObHKBuhlMh4ajX07M1x9eNSUiCuhHTq9sNzaB... * Discussed criteria for variants in the Latin script): * IP’s expectation clearly limits the scope of variants for the Latin script. Their views are supported by other past work products (e.g. Considerations in the Use of the Latin Script in Variant Internationalized Top-Level Domains) * Latin GP may recommend that strong similarity review mechanisms are in place when processing gTLD applications in future rounds to minimize delegation of confusable TLDs. * Draft principle 1: Variants in Latin script will be limited to homoglyphs (i.e. code points designed by Unicode to look identical) * Variant WG will be looking at in-script and cross-script variants. * We need to review code point sequences that involve more than one diacritic. * For cross-script variants we will be looking at Cyrillic, Armenian and Greek work. As well as Unicode’s intentional and confusable character lists. Bill will also share a work of his own. 1. Discussion about F2F meeting Agenda Sarmad proposed the template for organization of F2F meeting in Istanbul. https://docs.google.com/spreadsheets/d/1myKxtzvvdgdK3Va4SvB5TjVG3rL6nJJRgp1A... - gid=0<https://docs.google.com/spreadsheets/d/1myKxtzvvdgdK3Va4SvB5TjVG3rL6nJJRgp1A...> GP members are asked to consider the proposal, to add some more interesting topics for discussion, and to volunteer for chairing different topics. Ahmed Bakhat add topic: “It was proposed to start presentation from Mirjana to briefly communicate how the panel was formed and Chris started chairing the group and submitted 1st draft proposal to ICANN, later on, Mirjana took the lead and worked to sort out issues raised by ICANN and managed to submit final proposal, which was finally accepted by ICANN. She may also communicate the way forward for group”
Mirjana, thanks, I'd sent apologies to Sarmad just before the meeting, sorry for not being able to attend on 30th.. *Ahmet Bakhat Massod commented the Repertoire Group meeting:* *“Repertoire group should convene its meetings to identify code points and a presentation be prepared for discussion during Face to Face meetin*g” Does Ahmad want to present on how to build a repertoire from zero? As I understand he's one of the 4 members with this experience. egards NP On Sun, Jun 4, 2017 at 1:35 PM, Mirjana Tasić <Mirjana.Tasic@rnids.rs> wrote:
Dear colleagues,
Here are some notes from the conference call held on May 30, 2017.
If I missed some discussions, please correct me.
Comments, corrections, suggestions are very appreciated.
Regards Mirjana
------------------------------------------------------------ ---------------------------------------
Agenda
I. Roll call
II. Short report from the first meeting of Repertoire group
III. Short reports from the first meeting of Variant group
IV. Discussion about F2F meeting Agenda
V. AOB
Notes:
1. Roll call:
Following GP members were present on the call:
Bill Jouris
Dennis Tan
Fiammetta Caccavale
Sarmad Hussain
Mats Dufberg
Mirjana Tasić
Ahmed Bakhat Masood
Excused: Meikal Mumin, Michael Bauland, Ousmane Moussa Tessa, Hazem Hezzah
1. Short report from Repertoire Group call
Nebiye Petek Kurtböke sent the notes from Repertoire Group meeting as follows:
1. Sarmad shared ARABIC GP's work to set an example
2. Highlighted the issue of 'combined marks' and how they should be treated
3. They should be treated as Sequences rather than code points
4. Mirjana thinks diacritics by themselves should be excluded
5. Sarmad underlined 'contextual motivation'
6. In the course of our work new code points are expected to be discovered and we'll move on from MSR2 to MSR3
7. What to do with letters not existing in MSR2
8. Are the principles used by the other script groups (eg arabic) applicable to Latin?
9. How to draft new principles?
10. First round of RGIDS languages 1-4 to be circulated (180).
After the Repertoire group meeting Mats Dufberg sent the proposal how the group could proceed:
1. Accept the Arabic GP principles as preliminary principles. 2. Doing the following in parallel: 1. Start working on languages with EGIDS code 0-3. 2. Revise and update the principles as needed 3. Set the principles for Latin GP 4. Continue the work with the languages. 5. We should not hesitate to update the principles again after they are set if needed.
Ahmet Bakhat Massod commented the Repertoire Group meeting:
“Repertoire group should convene its meetings to identify code points and a presentation be prepared for discussion during Face to Face meeting”
1. Short report from Variant Group call
Denis Tan has informed GP members about first meeting of the Variant Group. After the call Denis sent the following report to GP mailing list:
NOTES:
- Reviewed summary of body of knowledge of Latin Variants (for TLDs) https://docs.google.com/document/d/1w76ObHKBuhlMh4ajX07M1x9eNSUiC uhHTq9sNzaBycg/edit - Discussed criteria for variants in the Latin script): - IP’s expectation clearly limits the scope of variants for the Latin script. Their views are supported by other past work products (e.g. Considerations in the Use of the Latin Script in Variant Internationalized Top-Level Domains) - Latin GP may recommend that strong similarity review mechanisms are in place when processing gTLD applications in future rounds to minimize delegation of confusable TLDs. - Draft principle 1: Variants in Latin script will be limited to homoglyphs (i.e. code points designed by Unicode to look identical) - Variant WG will be looking at in-script and cross-script variants. - We need to review code point sequences that involve more than one diacritic. - For cross-script variants we will be looking at Cyrillic, Armenian and Greek work. As well as Unicode’s intentional and confusable character lists. Bill will also share a work of his own.
1. Discussion about F2F meeting Agenda
Sarmad proposed the template for organization of F2F meeting in Istanbul.
https://docs.google.com/spreadsheets/d/1myKxtzvvdgdK3Va4SvB5TjVG3rL6nJJRgp1A... - gid=0 <https://docs.google.com/spreadsheets/d/1myKxtzvvdgdK3Va4SvB5TjVG3rL6nJJRgp1A...>
GP members are asked to consider the proposal, to add some more interesting topics for discussion, and to volunteer for chairing different topics.
Ahmed Bakhat add topic:
“It was proposed to start presentation from Mirjana to briefly communicate how the panel was formed and Chris started chairing the group and submitted 1st draft proposal to ICANN, later on, Mirjana took the lead and worked to sort out issues raised by ICANN and managed to submit final proposal, which was finally accepted by ICANN. She may also communicate the way forward for group”
_______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp
I raised the issue during the to meeting work on Repertoire, as this group has yet not started its meetings, to devise principles for inclusion / exclusion, so ghat we should have solid grounds to include code points. Furthermore, some one has to present on behalf of the group, what we have done and what is the way forward. Regarding my email containing draft principles, I wanted to communicate that before going for any strategy ( either inclusion of code ponits on the basis of language or on the basis of MSR) we should have principles for it. In my perception it would be easy to go for MSR is much easy as compared to languages, as it would take years to finish 180 languages. It doesn't mean at all that we will start from zero, Marjina has already done most of the work, so we can quickly go through it and work on rest of the code points. Best Regards, Ahmed Bakht Sent from Yahoo Mail on Android On Sun, 4 Jun 2017 at 3:52 pm, Textual Solutions<textualsolutions@gmail.com> wrote: _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp
Thanks Ahmed, I don`t have the group`s 4 yr experience and I won`t comment on a process I`m not familiar with. I only met David Olive at a recent event and he asked me to join the Latin panel and that`s what I did. I respond to Procedures mostly in the form of questions. Regards NP 4 Haz 2017 16:39 tarihinde "Ahmed Bakhat" <ahmedbakhat@yahoo.com> yazdı:
I raised the issue during the to meeting work on Repertoire, as this group has yet not started its meetings, to devise principles for inclusion / exclusion, so ghat we should have solid grounds to include code points. Furthermore, some one has to present on behalf of the group, what we have done and what is the way forward.
Regarding my email containing draft principles, I wanted to communicate that before going for any strategy ( either inclusion of code ponits on the basis of language or on the basis of MSR) we should have principles for it. In my perception it would be easy to go for MSR is much easy as compared to languages, as it would take years to finish 180 languages. It doesn't mean at all that we will start from zero, Marjina has already done most of the work, so we can quickly go through it and work on rest of the code points.
Best Regards,
Ahmed Bakht
Sent from Yahoo Mail on Android <https://overview.mail.yahoo.com/mobile/?.src=Android>
On Sun, 4 Jun 2017 at 3:52 pm, Textual Solutions <textualsolutions@gmail.com> wrote: _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp
Dear Nebiye, You are most welcome in the group and it will be my pleasure to respond to your questions. Sent from Yahoo Mail on Android On Sun, 4 Jun 2017 at 9:00 pm, Textual Solutions<textualsolutions@gmail.com> wrote: Thanks Ahmed, I don`t have the group`s 4 yr experience and I won`t comment on a process I`m not familiar with. I only met David Olive at a recent event and he asked me to join the Latin panel and that`s what I did. I respond to Procedures mostly in the form of questions. Regards NP 4 Haz 2017 16:39 tarihinde "Ahmed Bakhat" <ahmedbakhat@yahoo.com> yazdı: I raised the issue during the to meeting work on Repertoire, as this group has yet not started its meetings, to devise principles for inclusion / exclusion, so ghat we should have solid grounds to include code points. Furthermore, some one has to present on behalf of the group, what we have done and what is the way forward. Regarding my email containing draft principles, I wanted to communicate that before going for any strategy ( either inclusion of code ponits on the basis of language or on the basis of MSR) we should have principles for it. In my perception it would be easy to go for MSR is much easy as compared to languages, as it would take years to finish 180 languages. It doesn't mean at all that we will start from zero, Marjina has already done most of the work, so we can quickly go through it and work on rest of the code points. Best Regards, Ahmed Bakht Sent from Yahoo Mail on Android On Sun, 4 Jun 2017 at 3:52 pm, Textual Solutions<textualsolutions@gmail.com> wrote: ______________________________ _________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/ listinfo/latingp
Ahmed, If you start with MSR -- or actually MSR2 -- and try to find languages that support the inclusion of its code points you would never be able to confirm that no code points outside of MSR2 that are needed to support the languages that the Latin GP wants to support. I do not say that such code points will be included, but we should be aware of any limitation in the support of the languages that are claimed to be supported. If there is any code point in MSR2 not used by any language we would have to investigate every language anyway to confirm that the code point can be excluded. Besides the Latin code points there are non-spacing marks that are used in combination with Latin code points. Those combinations could have different status in language, either being considered to be a character on its own or being a modified character. In the repertoire that the Latin GP suggests that such non-spacing marks are limited to just those combinations that are really used in the languages that the group wants to support. To find those combinations we have to investigate all languages. Another aspect is that the method of going code point by code point in MS2 requires that we already know where to find what we are looking for. And when you start studying the material for a language, the hardest step can be to find sources and understanding what they say. After that it could be more straight forward to extract the characters. -- I do not claim that the task is simple. In my work for ICANN Pre-Delegation Testing, I have already done that. There are many grey areas, but that is our task to dig into. There is no other way than going through all the languages. Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From: <latingp-bounces@icann.org> on behalf of Ahmed Bakhat via Latingp <latingp@icann.org> Reply-To: "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Date: Sunday 4 June 2017 at 15:35 To: "textualsolutions@gmail.com" <textualsolutions@gmail.com>, Mirjana Tasić <Mirjana.Tasic@rnids.rs> Cc: Latin GP <latingp@icann.org> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 I raised the issue during the to meeting work on Repertoire, as this group has yet not started its meetings, to devise principles for inclusion / exclusion, so ghat we should have solid grounds to include code points. Furthermore, some one has to present on behalf of the group, what we have done and what is the way forward. Regarding my email containing draft principles, I wanted to communicate that before going for any strategy ( either inclusion of code ponits on the basis of language or on the basis of MSR) we should have principles for it. In my perception it would be easy to go for MSR is much easy as compared to languages, as it would take years to finish 180 languages. It doesn't mean at all that we will start from zero, Marjina has already done most of the work, so we can quickly go through it and work on rest of the code points. Best Regards, Ahmed Bakht Sent from Yahoo Mail on Android<https://overview.mail.yahoo.com/mobile/?.src=Android> On Sun, 4 Jun 2017 at 3:52 pm, Textual Solutions <textualsolutions@gmail.com> wrote: _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp
Given the enormous number of languages involved, perhaps it would be better to establish which ones will be included at this time. That is, go for inclusion, rather than exclusion. And then, separately, principles and processes for including the occasional additional codepoint, if a language which we did not get thru in this initial effort requires it. Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 925-855-9512 (direct) From: Mats Dufberg <mats.dufberg@iis.se> To: "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Cc: Latin GP <latingp@icann.org> Sent: Monday, June 5, 2017 3:09 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 #yiv1664357667 #yiv1664357667 -- _filtered #yiv1664357667 {panose-1:2 7 3 9 2 2 5 2 4 4;} _filtered #yiv1664357667 {panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv1664357667 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv1664357667 {font-family:PMingLiU;panose-1:2 2 5 0 0 0 0 0 0 0;}#yiv1664357667 #yiv1664357667 p.yiv1664357667MsoNormal, #yiv1664357667 li.yiv1664357667MsoNormal, #yiv1664357667 div.yiv1664357667MsoNormal {margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv1664357667 a:link, #yiv1664357667 span.yiv1664357667MsoHyperlink {color:blue;text-decoration:underline;}#yiv1664357667 a:visited, #yiv1664357667 span.yiv1664357667MsoHyperlinkFollowed {color:purple;text-decoration:underline;}#yiv1664357667 span.yiv1664357667EmailStyle17 {color:windowtext;}#yiv1664357667 span.yiv1664357667msoIns {text-decoration:underline;color:teal;}#yiv1664357667 .yiv1664357667MsoChpDefault {font-size:10.0pt;} _filtered #yiv1664357667 {margin:70.85pt 70.85pt 70.85pt 70.85pt;}#yiv1664357667 div.yiv1664357667WordSection1 {}#yiv1664357667 Ahmed, If you start with MSR -- or actually MSR2 -- and try to find languages that support the inclusion of its code points you would never be able to confirm that no code points outside of MSR2 that are needed to support the languages that the Latin GP wants to support. I do not say that such code points will be included, but we should be aware of any limitation in the support of the languages that are claimed to be supported. If there is any code point in MSR2 not used by any language we would have to investigate every language anyway to confirm that the code point can be excluded. Besides the Latin code points there are non-spacing marks that are used in combination with Latin code points. Those combinations could have different status in language, either being considered to be a character on its own or being a modified character. In the repertoire that the Latin GP suggests that such non-spacing marks are limited to just those combinations that are really used in the languages that the group wants to support. To find those combinations we have to investigate all languages. Another aspect is that the method of going code point by code point in MS2 requires that we already know where to find what we are looking for. And when you start studying the material for a language, the hardest step can be to find sources and understanding what they say. After that it could be more straight forward to extract the characters. -- I do not claim that the task is simple. In my work for ICANN Pre-Delegation Testing, I have already done that. There are many grey areas, but that is our task to dig into. There is no other way than going through all the languages. Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From: <latingp-bounces@icann.org> on behalf of Ahmed Bakhat via Latingp <latingp@icann.org> Reply-To: "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Date: Sunday 4 June 2017 at 15:35 To: "textualsolutions@gmail.com" <textualsolutions@gmail.com>, Mirjana Tasić <Mirjana.Tasic@rnids.rs> Cc: Latin GP <latingp@icann.org> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 I raised the issue during the to meeting work on Repertoire, as this group has yet not started its meetings, to devise principles for inclusion / exclusion, so ghat we should have solid grounds to include code points. Furthermore, some one has to present on behalf of the group, what we have done and what is the way forward. Regarding my email containing draft principles, I wanted to communicate that before going for any strategy ( either inclusion of code ponits on the basis of language or on the basis of MSR) we should have principles for it. In my perception it would be easy to go for MSR is much easy as compared to languages, as it would take years to finish 180 languages. It doesn't mean at all that we will start from zero, Marjina has already done most of the work, so we can quickly go through it and work on rest of the code points. Best Regards, Ahmed Bakht Sent from Yahoo Mail on Android On Sun, 4 Jun 2017 at 3:52 pm, Textual Solutions <textualsolutions@gmail.com> wrote: _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp
Bill, Inclusion of languages or code points? The only code points that we can include are the code points that we have confirmed to be used by languages according to the criteria, i.e. the language must be high enough on the EGIDS scale (low number) and the usage in some language should be contemporary and established. All other code points are excluded. The number of languages is high. That is a fact. The only way to reduce the number of languages is to move the border higher up in scale. My suggestion is that we should start working by taking the languages highest up on the scale (0-2) and get some experience from that. When we see what we get, we can move into languages 3-4. https://www.ethnologue.com/about/language-status Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From: Bill Jouris <bill.jouris@insidethestack.com> Reply-To: Bill Jouris <bill.jouris@insidethestack.com> Date: Monday 5 June 2017 at 17:38 To: Mats Dufberg <mats.dufberg@iis.se>, "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Cc: Latin GP <latingp@icann.org> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Given the enormous number of languages involved, perhaps it would be better to establish which ones will be included at this time. That is, go for inclusion, rather than exclusion. And then, separately, principles and processes for including the occasional additional codepoint, if a language which we did not get thru in this initial effort requires it. Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 925-855-9512 (direct) ________________________________ From: Mats Dufberg <mats.dufberg@iis.se> To: "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Cc: Latin GP <latingp@icann.org> Sent: Monday, June 5, 2017 3:09 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Ahmed, If you start with MSR -- or actually MSR2 -- and try to find languages that support the inclusion of its code points you would never be able to confirm that no code points outside of MSR2 that are needed to support the languages that the Latin GP wants to support. I do not say that such code points will be included, but we should be aware of any limitation in the support of the languages that are claimed to be supported. If there is any code point in MSR2 not used by any language we would have to investigate every language anyway to confirm that the code point can be excluded. Besides the Latin code points there are non-spacing marks that are used in combination with Latin code points. Those combinations could have different status in language, either being considered to be a character on its own or being a modified character. In the repertoire that the Latin GP suggests that such non-spacing marks are limited to just those combinations that are really used in the languages that the group wants to support. To find those combinations we have to investigate all languages. Another aspect is that the method of going code point by code point in MS2 requires that we already know where to find what we are looking for. And when you start studying the material for a language, the hardest step can be to find sources and understanding what they say. After that it could be more straight forward to extract the characters. -- I do not claim that the task is simple. In my work for ICANN Pre-Delegation Testing, I have already done that. There are many grey areas, but that is our task to dig into. There is no other way than going through all the languages. Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From: <latingp-bounces@icann.org> on behalf of Ahmed Bakhat via Latingp <latingp@icann.org> Reply-To: "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Date: Sunday 4 June 2017 at 15:35 To: "textualsolutions@gmail.com" <textualsolutions@gmail.com>, Mirjana Tasić <Mirjana.Tasic@rnids.rs> Cc: Latin GP <latingp@icann.org> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 I raised the issue during the to meeting work on Repertoire, as this group has yet not started its meetings, to devise principles for inclusion / exclusion, so ghat we should have solid grounds to include code points. Furthermore, some one has to present on behalf of the group, what we have done and what is the way forward. Regarding my email containing draft principles, I wanted to communicate that before going for any strategy ( either inclusion of code ponits on the basis of language or on the basis of MSR) we should have principles for it. In my perception it would be easy to go for MSR is much easy as compared to languages, as it would take years to finish 180 languages. It doesn't mean at all that we will start from zero, Marjina has already done most of the work, so we can quickly go through it and work on rest of the code points. Best Regards, Ahmed Bakht Sent from Yahoo Mail on Android<https://overview.mail.yahoo.com/mobile/?.src=Android> On Sun, 4 Jun 2017 at 3:52 pm, Textual Solutions <textualsolutions@gmail.com> wrote: _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp
Pardon my ignorance -- being with the Variant group, I didn't realize that you folks in Repetoire were considering any other approach. Starting with the most used languages, and then working thru as many of the less used ones as time allows, seems like the obvious approach. Were you guys actually considering a different one? Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 925-855-9512 (direct) From: Mats Dufberg <mats.dufberg@iis.se> To: Bill Jouris <bill.jouris@insidethestack.com> Cc: Latin GP <latingp@icann.org>; "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Sent: Monday, June 5, 2017 9:11 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 #yiv8179538059 #yiv8179538059 -- _filtered #yiv8179538059 {font-family:Arial;panose-1:2 11 6 4 2 2 2 2 2 4;} _filtered #yiv8179538059 {panose-1:2 7 3 9 2 2 5 2 4 4;} _filtered #yiv8179538059 {panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv8179538059 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv8179538059 {font-family:PMingLiU;panose-1:2 2 5 0 0 0 0 0 0 0;} _filtered #yiv8179538059 {panose-1:2 0 5 3 0 0 0 2 0 4;}#yiv8179538059 #yiv8179538059 p.yiv8179538059MsoNormal, #yiv8179538059 li.yiv8179538059MsoNormal, #yiv8179538059 div.yiv8179538059MsoNormal {margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv8179538059 a:link, #yiv8179538059 span.yiv8179538059MsoHyperlink {color:blue;text-decoration:underline;}#yiv8179538059 a:visited, #yiv8179538059 span.yiv8179538059MsoHyperlinkFollowed {color:purple;text-decoration:underline;}#yiv8179538059 p.yiv8179538059msonormal, #yiv8179538059 li.yiv8179538059msonormal, #yiv8179538059 div.yiv8179538059msonormal {margin-right:0cm;margin-left:0cm;font-size:12.0pt;}#yiv8179538059 p.yiv8179538059msochpdefault, #yiv8179538059 li.yiv8179538059msochpdefault, #yiv8179538059 div.yiv8179538059msochpdefault {margin-right:0cm;margin-left:0cm;font-size:12.0pt;}#yiv8179538059 span.yiv8179538059msohyperlink {}#yiv8179538059 span.yiv8179538059msohyperlinkfollowed {}#yiv8179538059 span.yiv8179538059emailstyle17 {}#yiv8179538059 span.yiv8179538059msoins {}#yiv8179538059 p.yiv8179538059msonormal1, #yiv8179538059 li.yiv8179538059msonormal1, #yiv8179538059 div.yiv8179538059msonormal1 {margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv8179538059 span.yiv8179538059msohyperlink1 {color:blue;text-decoration:underline;}#yiv8179538059 span.yiv8179538059msohyperlinkfollowed1 {color:purple;text-decoration:underline;}#yiv8179538059 span.yiv8179538059emailstyle171 {color:windowtext;}#yiv8179538059 span.yiv8179538059msoins1 {color:teal;text-decoration:underline;}#yiv8179538059 p.yiv8179538059msochpdefault1, #yiv8179538059 li.yiv8179538059msochpdefault1, #yiv8179538059 div.yiv8179538059msochpdefault1 {margin-right:0cm;margin-left:0cm;font-size:10.0pt;}#yiv8179538059 span.yiv8179538059EmailStyle29 {color:windowtext;}#yiv8179538059 span.yiv8179538059msoIns {text-decoration:underline;color:teal;}#yiv8179538059 .yiv8179538059MsoChpDefault {font-size:10.0pt;} _filtered #yiv8179538059 {margin:70.85pt 70.85pt 70.85pt 70.85pt;}#yiv8179538059 div.yiv8179538059WordSection1 {}#yiv8179538059 Bill, Inclusion of languages or code points? The only code points that we can include are the code points that we have confirmed to be used by languages according to the criteria, i.e. the language must be high enough on the EGIDS scale (low number) and the usage in some language should be contemporary and established. All other code points are excluded. The number of languages is high. That is a fact. The only way to reduce the number of languages is to move the border higher up in scale. My suggestion is that we should start working by taking the languages highest up on the scale (0-2) and get some experience from that. When we see what we get, we can move into languages 3-4. https://www.ethnologue.com/about/language-status Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From: Bill Jouris <bill.jouris@insidethestack.com> Reply-To: Bill Jouris <bill.jouris@insidethestack.com> Date: Monday 5 June 2017 at 17:38 To: Mats Dufberg <mats.dufberg@iis.se>, "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Cc: Latin GP <latingp@icann.org> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Given the enormous number of languages involved, perhaps it would be better to establish which ones will beincluded at this time. That is, go for inclusion, rather than exclusion. And then, separately, principles and processes for including the occasional additional codepoint, if a language which we did not get thru in this initial effort requires it. Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 925-855-9512 (direct) From: Mats Dufberg <mats.dufberg@iis.se> To: "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Cc: Latin GP <latingp@icann.org> Sent: Monday, June 5, 2017 3:09 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Ahmed, If you start with MSR -- or actually MSR2 -- and try to find languages that support the inclusion of its code points you would never be able to confirm that no code points outside of MSR2 that are needed to support the languages that the Latin GP wants to support. I do not say that such code points will be included, but we should be aware of any limitation in the support of the languages that are claimed to be supported. If there is any code point in MSR2 not used by any language we would have to investigate every language anyway to confirm that the code point can be excluded. Besides the Latin code points there are non-spacing marks that are used in combination with Latin code points. Those combinations could have different status in language, either being considered to be a character on its own or being a modified character. In the repertoire that the Latin GP suggests that such non-spacing marks are limited to just those combinations that are really used in the languages that the group wants to support. To find those combinations we have to investigate all languages. Another aspect is that the method of going code point by code point in MS2 requires that we already know where to find what we are looking for. And when you start studying the material for a language, the hardest step can be to find sources and understanding what they say. After that it could be more straight forward to extract the characters. -- I do not claim that the task is simple. In my work for ICANN Pre-Delegation Testing, I have already done that. There are many grey areas, but that is our task to dig into. There is no other way than going through all the languages. Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From:<latingp-bounces@icann.org> on behalf of Ahmed Bakhat via Latingp <latingp@icann.org> Reply-To: "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Date: Sunday 4 June 2017 at 15:35 To: "textualsolutions@gmail.com" <textualsolutions@gmail.com>, Mirjana Tasić <Mirjana.Tasic@rnids.rs> Cc: Latin GP <latingp@icann.org> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 I raised the issue during the to meeting work on Repertoire, as this group has yet not started its meetings, to devise principles for inclusion / exclusion, so ghat we should have solid grounds to include code points. Furthermore, some one has to present on behalf of the group, what we have done and what is the way forward. Regarding my email containing draft principles, I wanted to communicate that before going for any strategy ( either inclusion of code ponits on the basis of language or on the basis of MSR) we should have principles for it. In my perception it would be easy to go for MSR is much easy as compared to languages, as it would take years to finish 180 languages. It doesn't mean at all that we will start from zero, Marjina has already done most of the work, so we can quickly go through it and work on rest of the code points. Best Regards, Ahmed Bakht Sent from Yahoo Mail on Android On Sun, 4 Jun 2017 at 3:52 pm, Textual Solutions <textualsolutions@gmail.com> wrote: _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp
Dear Mats and Bill, Many thanks for valuable comments. Though I am very much convinced to go for MSR 2 approach, keeping in view large number of languages which may take years to complete the task. Furthermore, in case of non-availability of some code point in Unicode / MSR, our exercise will not give desired results.However, I agree with the proposal to start with 2-3 large languages (excluding English) and based on the experience, we will proceed further. In this case our principles will also be changed, as we have to look for all available codes in a particular language. Sent from Yahoo Mail on Android On Mon, 5 Jun 2017 at 9:33 pm, Bill Jouris<bill.jouris@insidethestack.com> wrote: _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp
Dear all, After going through this thread of messages, let me suggest the following approach as a start, with possibility to refine as we go on. 1. Decide which languages we will include in the first phase of repertoire building. According to EGIDS scale either 1-2 (93 languages), 1-3 (134 languages), or 1-4 (180 languages). I would suggest to finish 1-2 then go on to 3- 4. The 3 Latin EGIDS 0 languages (English, French, Spanish) are included in the 1 list 2. Taking the MSR-2 tables as our starting pool, I see that there are already a number of code points ineligible from being used in the root zone (white and pink background), so our pool will be only characters with white background. (recitation needed if pink background characters are to be included) 3. Distributing the languages on members for analysis, I would suggest that each one goes through the characters of the language in hand, then makes a mark on each code point found in the MSR. 4. If any character is not found in the MSR, take a note to look after it later. 5. For every other language analyzing, if code point found already has a mark, go on to next character. 6. At the end by combining all marks of all languages inspected, I think we can consider that the repertoire consists of all marked code points. 7. Missing code points taken notes with should be decided what to do towards. Any suggestions for handling combinations? Welcoming your opinions. Regards, Hazem Hezzah From: latingp-bounces@icann.org [mailto:latingp-bounces@icann.org] On Behalf Of Bill Jouris Sent: Monday, 05 June, 2017 18:19 To: Mats Dufberg Cc: Latin GP Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Pardon my ignorance -- being with the Variant group, I didn't realize that you folks in Repetoire were considering any other approach. Starting with the most used languages, and then working thru as many of the less used ones as time allows, seems like the obvious approach. Were you guys actually considering a different one? Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 925-855-9512 (direct) _____ From: Mats Dufberg <mats.dufberg@iis.se> To: Bill Jouris <bill.jouris@insidethestack.com> Cc: Latin GP <latingp@icann.org>; "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Sent: Monday, June 5, 2017 9:11 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Bill, Inclusion of languages or code points? The only code points that we can include are the code points that we have confirmed to be used by languages according to the criteria, i.e. the language must be high enough on the EGIDS scale (low number) and the usage in some language should be contemporary and established. All other code points are excluded. The number of languages is high. That is a fact. The only way to reduce the number of languages is to move the border higher up in scale. My suggestion is that we should start working by taking the languages highest up on the scale (0-2) and get some experience from that. When we see what we get, we can move into languages 3-4. https://www.ethnologue.com/about/language-status Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From: Bill Jouris <bill.jouris@insidethestack.com> Reply-To: Bill Jouris <bill.jouris@insidethestack.com> Date: Monday 5 June 2017 at 17:38 To: Mats Dufberg <mats.dufberg@iis.se>, "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Cc: Latin GP <latingp@icann.org> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Given the enormous number of languages involved, perhaps it would be better to establish which ones will be included at this time. That is, go for inclusion, rather than exclusion. And then, separately, principles and processes for including the occasional additional codepoint, if a language which we did not get thru in this initial effort requires it. Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 925-855-9512 (direct) _____ From: Mats Dufberg <mats.dufberg@iis.se> To: "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Cc: Latin GP <latingp@icann.org> Sent: Monday, June 5, 2017 3:09 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Ahmed, If you start with MSR -- or actually MSR2 -- and try to find languages that support the inclusion of its code points you would never be able to confirm that no code points outside of MSR2 that are needed to support the languages that the Latin GP wants to support. I do not say that such code points will be included, but we should be aware of any limitation in the support of the languages that are claimed to be supported. If there is any code point in MSR2 not used by any language we would have to investigate every language anyway to confirm that the code point can be excluded. Besides the Latin code points there are non-spacing marks that are used in combination with Latin code points. Those combinations could have different status in language, either being considered to be a character on its own or being a modified character. In the repertoire that the Latin GP suggests that such non-spacing marks are limited to just those combinations that are really used in the languages that the group wants to support. To find those combinations we have to investigate all languages. Another aspect is that the method of going code point by code point in MS2 requires that we already know where to find what we are looking for. And when you start studying the material for a language, the hardest step can be to find sources and understanding what they say. After that it could be more straight forward to extract the characters. -- I do not claim that the task is simple. In my work for ICANN Pre-Delegation Testing, I have already done that. There are many grey areas, but that is our task to dig into. There is no other way than going through all the languages. Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From: <latingp-bounces@icann.org> on behalf of Ahmed Bakhat via Latingp <latingp@icann.org> Reply-To: "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Date: Sunday 4 June 2017 at 15:35 To: "textualsolutions@gmail.com" <textualsolutions@gmail.com>, Mirjana Tasić <Mirjana.Tasic@rnids.rs> Cc: Latin GP <latingp@icann.org> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 I raised the issue during the to meeting work on Repertoire, as this group has yet not started its meetings, to devise principles for inclusion / exclusion, so ghat we should have solid grounds to include code points. Furthermore, some one has to present on behalf of the group, what we have done and what is the way forward. Regarding my email containing draft principles, I wanted to communicate that before going for any strategy ( either inclusion of code ponits on the basis of language or on the basis of MSR) we should have principles for it. In my perception it would be easy to go for MSR is much easy as compared to languages, as it would take years to finish 180 languages. It doesn't mean at all that we will start from zero, Marjina has already done most of the work, so we can quickly go through it and work on rest of the code points. Best Regards, Ahmed Bakht Sent from Yahoo Mail on Android <https://overview.mail.yahoo.com/mobile/?.src=Android> On Sun, 4 Jun 2017 at 3:52 pm, Textual Solutions <textualsolutions@gmail.com> wrote: _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp
Thanks Hazzem, repertoire building section of the Proceduresat mentions a number of code points that are used in multiple scripts. Do you want to give examples? Also, some code points apparently have ambiguous status. What does it mean for a code point to have such a status? and how do you resolve it? On Tue, Jun 6, 2017 at 4:12 PM, Hazem Hezzah <hhezzah.las@gmail.com> wrote:
Dear all,
After going through this thread of messages, let me suggest the following approach as a start, with possibility to refine as we go on.
1. Decide which languages we will include in the first phase of repertoire building. According to EGIDS scale either 1-2 (93 languages), 1-3 (134 languages), or 1-4 (180 languages). I would suggest to finish 1-2 then go on to 3- 4. The 3 Latin EGIDS 0 languages (English, French, Spanish) are included in the 1 list
2. Taking the MSR-2 tables as our starting pool, I see that there are already a number of code points ineligible from being used in the root zone (white and pink background), so our pool will be only characters with white background. (*recitation needed if pink background characters are to be included*)
3. Distributing the languages on members for analysis, I would suggest that each one goes through the characters of the language in hand, then makes a mark on each code point found in the MSR.
4. If any character is not found in the MSR, take a note to look after it later.
5. For every other language analyzing, if code point found already has a mark, go on to next character.
6. At the end by combining all marks of all languages inspected, I think we can consider that the repertoire consists of all marked code points.
7. Missing code points taken notes with should be decided what to do towards.
Any suggestions for handling combinations?
Welcoming your opinions.
Regards,
*Hazem Hezzah*
*From:* latingp-bounces@icann.org [mailto:latingp-bounces@icann.org] *On Behalf Of *Bill Jouris *Sent:* Monday, 05 June, 2017 18:19 *To:* Mats Dufberg *Cc:* Latin GP
*Subject:* Re: [Latingp] Minutes from the call on May 30, 2017
Pardon my ignorance -- being with the Variant group, I didn't realize that you folks in Repetoire were considering any other approach.
Starting with the most used languages, and then working thru as many of the less used ones as time allows, seems like the obvious approach. Were you guys actually considering a different one?
Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 <(831)%20659-8360> 925-855-9512 <(925)%20855-9512> (direct)
------------------------------
*From:* Mats Dufberg <mats.dufberg@iis.se> *To:* Bill Jouris <bill.jouris@insidethestack.com> *Cc:* Latin GP <latingp@icann.org>; "ahmedbakhat@yahoo.com" < ahmedbakhat@yahoo.com> *Sent:* Monday, June 5, 2017 9:11 AM *Subject:* Re: [Latingp] Minutes from the call on May 30, 2017
Bill,
Inclusion of languages or code points?
The only code points that we can include are the code points that we have confirmed to be used by languages according to the criteria, i.e. the language must be high enough on the EGIDS scale (low number) and the usage in some language should be contemporary and established. All other code points are excluded.
The number of languages is high. That is a fact. The only way to reduce the number of languages is to move the border higher up in scale.
My suggestion is that we should start working by taking the languages highest up on the scale (0-2) and get some experience from that. When we see what we get, we can move into languages 3-4.
https://www.ethnologue.com/about/language-status
Yours,
Mats
---
Mats Dufberg
DNS Specialist, IIS
Mobile: +46 73 065 3899 <+46%2073%20065%2038%2099>
*From: *Bill Jouris <bill.jouris@insidethestack.com> *Reply-To: *Bill Jouris <bill.jouris@insidethestack.com> *Date: *Monday 5 June 2017 at 17:38 *To: *Mats Dufberg <mats.dufberg@iis.se>, "ahmedbakhat@yahoo.com" < ahmedbakhat@yahoo.com> *Cc: *Latin GP <latingp@icann.org> *Subject: *Re: [Latingp] Minutes from the call on May 30, 2017
Given the enormous number of languages involved, perhaps it would be better to establish which ones will be *included* at this time. That is, go for inclusion, rather than exclusion.
And then, separately, principles and processes for including the occasional additional codepoint, if a language which we did not get thru in this initial effort requires it.
Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 <(831)%20659-8360> 925-855-9512 <(925)%20855-9512> (direct)
------------------------------
*From:* Mats Dufberg <mats.dufberg@iis.se> *To:* "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> *Cc:* Latin GP <latingp@icann.org> *Sent:* Monday, June 5, 2017 3:09 AM *Subject:* Re: [Latingp] Minutes from the call on May 30, 2017
Ahmed,
If you start with MSR -- or actually MSR2 -- and try to find languages that support the inclusion of its code points you would never be able to confirm that no code points outside of MSR2 that are needed to support the languages that the Latin GP wants to support. I do not say that such code points will be included, but we should be aware of any limitation in the support of the languages that are claimed to be supported.
If there is any code point in MSR2 not used by any language we would have to investigate every language anyway to confirm that the code point can be excluded.
Besides the Latin code points there are non-spacing marks that are used in combination with Latin code points. Those combinations could have different status in language, either being considered to be a character on its own or being a modified character. In the repertoire that the Latin GP suggests that such non-spacing marks are limited to just those combinations that are really used in the languages that the group wants to support. To find those combinations we have to investigate all languages.
Another aspect is that the method of going code point by code point in MS2 requires that we already know where to find what we are looking for. And when you start studying the material for a language, the hardest step can be to find sources and understanding what they say. After that it could be more straight forward to extract the characters. -- I do not claim that the task is simple. In my work for ICANN Pre-Delegation Testing, I have already done that. There are many grey areas, but that is our task to dig into.
There is no other way than going through all the languages.
Yours,
Mats
---
Mats Dufberg
DNS Specialist, IIS
Mobile: +46 73 065 3899 <+46%2073%20065%2038%2099>
*From: *<latingp-bounces@icann.org> on behalf of Ahmed Bakhat via Latingp <latingp@icann.org> *Reply-To: *"ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> *Date: *Sunday 4 June 2017 at 15:35 *To: *"textualsolutions@gmail.com" <textualsolutions@gmail.com>, Mirjana Tasić <Mirjana.Tasic@rnids.rs> *Cc: *Latin GP <latingp@icann.org> *Subject: *Re: [Latingp] Minutes from the call on May 30, 2017
I raised the issue during the to meeting work on Repertoire, as this group has yet not started its meetings, to devise principles for inclusion / exclusion, so ghat we should have solid grounds to include code points. Furthermore, some one has to present on behalf of the group, what we have done and what is the way forward.
Regarding my email containing draft principles, I wanted to communicate that before going for any strategy ( either inclusion of code ponits on the basis of language or on the basis of MSR) we should have principles for it. In my perception it would be easy to go for MSR is much easy as compared to languages, as it would take years to finish 180 languages.
It doesn't mean at all that we will start from zero, Marjina has already done most of the work, so we can quickly go through it and work on rest of the code points.
Best Regards,
Ahmed Bakht
Sent from Yahoo Mail on Android <https://overview.mail.yahoo.com/mobile/?.src=Android>
On Sun, 4 Jun 2017 at 3:52 pm, Textual Solutions
<textualsolutions@gmail.com> wrote:
_______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp
_______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp
_______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp
5. For every other language analyzing, if code point found already has a mark, go on to next character. 6. At the end by combining all marks of all languages inspected, I think we can consider that the repertoire consists of all marked code points.
I think we should keep record of the combination of Letter code point and Mark code point (or code points). I think we should assume that we should restrict the usage of Marks to those contexts that we can motivate from the language material. Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From: <latingp-bounces@icann.org> on behalf of Hazem Hezzah <hhezzah.las@gmail.com> Organization: LAS Date: Tuesday 6 June 2017 at 15:12 To: 'Latin GP' <latingp@icann.org> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Dear all, After going through this thread of messages, let me suggest the following approach as a start, with possibility to refine as we go on. 1. Decide which languages we will include in the first phase of repertoire building. According to EGIDS scale either 1-2 (93 languages), 1-3 (134 languages), or 1-4 (180 languages). I would suggest to finish 1-2 then go on to 3- 4. The 3 Latin EGIDS 0 languages (English, French, Spanish) are included in the 1 list 2. Taking the MSR-2 tables as our starting pool, I see that there are already a number of code points ineligible from being used in the root zone (white and pink background), so our pool will be only characters with white background. (recitation needed if pink background characters are to be included) 3. Distributing the languages on members for analysis, I would suggest that each one goes through the characters of the language in hand, then makes a mark on each code point found in the MSR. 4. If any character is not found in the MSR, take a note to look after it later. 5. For every other language analyzing, if code point found already has a mark, go on to next character. 6. At the end by combining all marks of all languages inspected, I think we can consider that the repertoire consists of all marked code points. 7. Missing code points taken notes with should be decided what to do towards. Any suggestions for handling combinations? Welcoming your opinions. Regards, Hazem Hezzah From: latingp-bounces@icann.org [mailto:latingp-bounces@icann.org] On Behalf Of Bill Jouris Sent: Monday, 05 June, 2017 18:19 To: Mats Dufberg Cc: Latin GP Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Pardon my ignorance -- being with the Variant group, I didn't realize that you folks in Repetoire were considering any other approach. Starting with the most used languages, and then working thru as many of the less used ones as time allows, seems like the obvious approach. Were you guys actually considering a different one? Bill Jouris Inside Products bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com> 831-659-8360 925-855-9512 (direct) ________________________________ From: Mats Dufberg <mats.dufberg@iis.se<mailto:mats.dufberg@iis.se>> To: Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> Cc: Latin GP <latingp@icann.org<mailto:latingp@icann.org>>; "ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>" <ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>> Sent: Monday, June 5, 2017 9:11 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Bill, Inclusion of languages or code points? The only code points that we can include are the code points that we have confirmed to be used by languages according to the criteria, i.e. the language must be high enough on the EGIDS scale (low number) and the usage in some language should be contemporary and established. All other code points are excluded. The number of languages is high. That is a fact. The only way to reduce the number of languages is to move the border higher up in scale. My suggestion is that we should start working by taking the languages highest up on the scale (0-2) and get some experience from that. When we see what we get, we can move into languages 3-4. https://www.ethnologue.com/about/language-status Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From: Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> Reply-To: Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> Date: Monday 5 June 2017 at 17:38 To: Mats Dufberg <mats.dufberg@iis.se<mailto:mats.dufberg@iis.se>>, "ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>" <ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>> Cc: Latin GP <latingp@icann.org<mailto:latingp@icann.org>> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Given the enormous number of languages involved, perhaps it would be better to establish which ones will be included at this time. That is, go for inclusion, rather than exclusion. And then, separately, principles and processes for including the occasional additional codepoint, if a language which we did not get thru in this initial effort requires it. Bill Jouris Inside Products bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com> 831-659-8360 925-855-9512 (direct) ________________________________ From: Mats Dufberg <mats.dufberg@iis.se<mailto:mats.dufberg@iis.se>> To: "ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>" <ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>> Cc: Latin GP <latingp@icann.org<mailto:latingp@icann.org>> Sent: Monday, June 5, 2017 3:09 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Ahmed, If you start with MSR -- or actually MSR2 -- and try to find languages that support the inclusion of its code points you would never be able to confirm that no code points outside of MSR2 that are needed to support the languages that the Latin GP wants to support. I do not say that such code points will be included, but we should be aware of any limitation in the support of the languages that are claimed to be supported. If there is any code point in MSR2 not used by any language we would have to investigate every language anyway to confirm that the code point can be excluded. Besides the Latin code points there are non-spacing marks that are used in combination with Latin code points. Those combinations could have different status in language, either being considered to be a character on its own or being a modified character. In the repertoire that the Latin GP suggests that such non-spacing marks are limited to just those combinations that are really used in the languages that the group wants to support. To find those combinations we have to investigate all languages. Another aspect is that the method of going code point by code point in MS2 requires that we already know where to find what we are looking for. And when you start studying the material for a language, the hardest step can be to find sources and understanding what they say. After that it could be more straight forward to extract the characters. -- I do not claim that the task is simple. In my work for ICANN Pre-Delegation Testing, I have already done that. There are many grey areas, but that is our task to dig into. There is no other way than going through all the languages. Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From: <latingp-bounces@icann.org<mailto:latingp-bounces@icann.org>> on behalf of Ahmed Bakhat via Latingp <latingp@icann.org<mailto:latingp@icann.org>> Reply-To: "ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>" <ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>> Date: Sunday 4 June 2017 at 15:35 To: "textualsolutions@gmail.com<mailto:textualsolutions@gmail.com>" <textualsolutions@gmail.com<mailto:textualsolutions@gmail.com>>, Mirjana Tasić <Mirjana.Tasic@rnids.rs<mailto:Mirjana.Tasic@rnids.rs>> Cc: Latin GP <latingp@icann.org<mailto:latingp@icann.org>> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 I raised the issue during the to meeting work on Repertoire, as this group has yet not started its meetings, to devise principles for inclusion / exclusion, so ghat we should have solid grounds to include code points. Furthermore, some one has to present on behalf of the group, what we have done and what is the way forward. Regarding my email containing draft principles, I wanted to communicate that before going for any strategy ( either inclusion of code ponits on the basis of language or on the basis of MSR) we should have principles for it. In my perception it would be easy to go for MSR is much easy as compared to languages, as it would take years to finish 180 languages. It doesn't mean at all that we will start from zero, Marjina has already done most of the work, so we can quickly go through it and work on rest of the code points. Best Regards, Ahmed Bakht Sent from Yahoo Mail on Android<https://overview.mail.yahoo.com/mobile/?.src=Android> On Sun, 4 Jun 2017 at 3:52 pm, Textual Solutions <textualsolutions@gmail.com<mailto:textualsolutions@gmail.com>> wrote: _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp
Hi Mats, By marks I mean marking the code point as present in a language. Like making a check mark √ for that code point. Regards, Hazem Hezzah From: Mats Dufberg Sent: Tuesday, June 06, 2017 6:57 PM To: Hazem Hezzah ; 'Latin GP' Subject: Re: [Latingp] Minutes from the call on May 30, 2017
5. For every other language analyzing, if code point found already has a
mark, go on to next character.
6. At the end by combining all marks of all languages inspected, I think
we can consider that the repertoire consists of all marked code points.
I think we should keep record of the combination of Letter code point and Mark code point (or code points). I think we should assume that we should restrict the usage of Marks to those contexts that we can motivate from the language material. Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From: <latingp-bounces@icann.org> on behalf of Hazem Hezzah <hhezzah.las@gmail.com> Organization: LAS Date: Tuesday 6 June 2017 at 15:12 To: 'Latin GP' <latingp@icann.org> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Dear all, After going through this thread of messages, let me suggest the following approach as a start, with possibility to refine as we go on. 1. Decide which languages we will include in the first phase of repertoire building. According to EGIDS scale either 1-2 (93 languages), 1-3 (134 languages), or 1-4 (180 languages). I would suggest to finish 1-2 then go on to 3- 4. The 3 Latin EGIDS 0 languages (English, French, Spanish) are included in the 1 list 2. Taking the MSR-2 tables as our starting pool, I see that there are already a number of code points ineligible from being used in the root zone (white and pink background), so our pool will be only characters with white background. (recitation needed if pink background characters are to be included) 3. Distributing the languages on members for analysis, I would suggest that each one goes through the characters of the language in hand, then makes a mark on each code point found in the MSR. 4. If any character is not found in the MSR, take a note to look after it later. 5. For every other language analyzing, if code point found already has a mark, go on to next character. 6. At the end by combining all marks of all languages inspected, I think we can consider that the repertoire consists of all marked code points. 7. Missing code points taken notes with should be decided what to do towards. Any suggestions for handling combinations? Welcoming your opinions. Regards, Hazem Hezzah From: latingp-bounces@icann.org [mailto:latingp-bounces@icann.org] On Behalf Of Bill Jouris Sent: Monday, 05 June, 2017 18:19 To: Mats Dufberg Cc: Latin GP Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Pardon my ignorance -- being with the Variant group, I didn't realize that you folks in Repetoire were considering any other approach. Starting with the most used languages, and then working thru as many of the less used ones as time allows, seems like the obvious approach. Were you guys actually considering a different one? Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 925-855-9512 (direct) -------------------------------------------------------------------------------- From: Mats Dufberg <mats.dufberg@iis.se> To: Bill Jouris <bill.jouris@insidethestack.com> Cc: Latin GP <latingp@icann.org>; "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Sent: Monday, June 5, 2017 9:11 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Bill, Inclusion of languages or code points? The only code points that we can include are the code points that we have confirmed to be used by languages according to the criteria, i.e. the language must be high enough on the EGIDS scale (low number) and the usage in some language should be contemporary and established. All other code points are excluded. The number of languages is high. That is a fact. The only way to reduce the number of languages is to move the border higher up in scale. My suggestion is that we should start working by taking the languages highest up on the scale (0-2) and get some experience from that. When we see what we get, we can move into languages 3-4. https://www.ethnologue.com/about/language-status Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From: Bill Jouris <bill.jouris@insidethestack.com> Reply-To: Bill Jouris <bill.jouris@insidethestack.com> Date: Monday 5 June 2017 at 17:38 To: Mats Dufberg <mats.dufberg@iis.se>, "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Cc: Latin GP <latingp@icann.org> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Given the enormous number of languages involved, perhaps it would be better to establish which ones will be included at this time. That is, go for inclusion, rather than exclusion. And then, separately, principles and processes for including the occasional additional codepoint, if a language which we did not get thru in this initial effort requires it. Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 925-855-9512 (direct) -------------------------------------------------------------------------------- From: Mats Dufberg <mats.dufberg@iis.se> To: "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Cc: Latin GP <latingp@icann.org> Sent: Monday, June 5, 2017 3:09 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Ahmed, If you start with MSR -- or actually MSR2 -- and try to find languages that support the inclusion of its code points you would never be able to confirm that no code points outside of MSR2 that are needed to support the languages that the Latin GP wants to support. I do not say that such code points will be included, but we should be aware of any limitation in the support of the languages that are claimed to be supported. If there is any code point in MSR2 not used by any language we would have to investigate every language anyway to confirm that the code point can be excluded. Besides the Latin code points there are non-spacing marks that are used in combination with Latin code points. Those combinations could have different status in language, either being considered to be a character on its own or being a modified character. In the repertoire that the Latin GP suggests that such non-spacing marks are limited to just those combinations that are really used in the languages that the group wants to support. To find those combinations we have to investigate all languages. Another aspect is that the method of going code point by code point in MS2 requires that we already know where to find what we are looking for. And when you start studying the material for a language, the hardest step can be to find sources and understanding what they say. After that it could be more straight forward to extract the characters. -- I do not claim that the task is simple. In my work for ICANN Pre-Delegation Testing, I have already done that. There are many grey areas, but that is our task to dig into. There is no other way than going through all the languages. Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From: <latingp-bounces@icann.org> on behalf of Ahmed Bakhat via Latingp <latingp@icann.org> Reply-To: "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Date: Sunday 4 June 2017 at 15:35 To: "textualsolutions@gmail.com" <textualsolutions@gmail.com>, Mirjana Tasić <Mirjana.Tasic@rnids.rs> Cc: Latin GP <latingp@icann.org> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 I raised the issue during the to meeting work on Repertoire, as this group has yet not started its meetings, to devise principles for inclusion / exclusion, so ghat we should have solid grounds to include code points. Furthermore, some one has to present on behalf of the group, what we have done and what is the way forward. Regarding my email containing draft principles, I wanted to communicate that before going for any strategy ( either inclusion of code ponits on the basis of language or on the basis of MSR) we should have principles for it. In my perception it would be easy to go for MSR is much easy as compared to languages, as it would take years to finish 180 languages. It doesn't mean at all that we will start from zero, Marjina has already done most of the work, so we can quickly go through it and work on rest of the code points. Best Regards, Ahmed Bakht Sent from Yahoo Mail on Android On Sun, 4 Jun 2017 at 3:52 pm, Textual Solutions <textualsolutions@gmail.com> wrote: _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp Virus-free. www.avg.com
Are we planning on keeping track of which code points exist for which languages? I ask because it seems to me that one of the requirements for a name should be that all of the code points in it should exist in some *single* language. That is, just because all of the code points exist in the total repertoire that doesn't mean the name should be acceptable. After all, the whole point of this exercise is to allow names in other languages. Not merely to expand the possible names as much as possible. Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 925-855-9512 (direct) From: Hazem Hezzah <hhezzah.las@gmail.com> To: 'Latin GP' <latingp@icann.org> Sent: Tuesday, June 6, 2017 6:13 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 #yiv2018272829 #yiv2018272829 -- _filtered #yiv2018272829 {font-family:PMingLiU;panose-1:2 2 5 0 0 0 0 0 0 0;} _filtered #yiv2018272829 {panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv2018272829 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv2018272829 {font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4 2 4;} _filtered #yiv2018272829 {} _filtered #yiv2018272829 {panose-1:2 2 5 0 0 0 0 0 0 0;} _filtered #yiv2018272829 {panose-1:3 15 7 2 3 3 2 2 2 4;} _filtered #yiv2018272829 {font-family:Georgia;panose-1:2 4 5 2 5 4 5 2 3 3;}#yiv2018272829 #yiv2018272829 p.yiv2018272829MsoNormal, #yiv2018272829 li.yiv2018272829MsoNormal, #yiv2018272829 div.yiv2018272829MsoNormal {margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv2018272829 a:link, #yiv2018272829 span.yiv2018272829MsoHyperlink {color:blue;text-decoration:underline;}#yiv2018272829 a:visited, #yiv2018272829 span.yiv2018272829MsoHyperlinkFollowed {color:purple;text-decoration:underline;}#yiv2018272829 p.yiv2018272829MsoAcetate, #yiv2018272829 li.yiv2018272829MsoAcetate, #yiv2018272829 div.yiv2018272829MsoAcetate {margin:0cm;margin-bottom:.0001pt;font-size:8.0pt;}#yiv2018272829 p.yiv2018272829MsoListParagraph, #yiv2018272829 li.yiv2018272829MsoListParagraph, #yiv2018272829 div.yiv2018272829MsoListParagraph {margin-top:0cm;margin-right:0cm;margin-bottom:0cm;margin-left:36.0pt;margin-bottom:.0001pt;font-size:12.0pt;}#yiv2018272829 p.yiv2018272829msonormal, #yiv2018272829 li.yiv2018272829msonormal, #yiv2018272829 div.yiv2018272829msonormal {margin-right:0cm;margin-left:0cm;font-size:12.0pt;}#yiv2018272829 p.yiv2018272829msochpdefault, #yiv2018272829 li.yiv2018272829msochpdefault, #yiv2018272829 div.yiv2018272829msochpdefault {margin-right:0cm;margin-left:0cm;font-size:12.0pt;}#yiv2018272829 p.yiv2018272829msonormal1, #yiv2018272829 li.yiv2018272829msonormal1, #yiv2018272829 div.yiv2018272829msonormal1 {margin-right:0cm;margin-left:0cm;font-size:12.0pt;}#yiv2018272829 p.yiv2018272829msochpdefault1, #yiv2018272829 li.yiv2018272829msochpdefault1, #yiv2018272829 div.yiv2018272829msochpdefault1 {margin-right:0cm;margin-left:0cm;font-size:12.0pt;}#yiv2018272829 span.yiv2018272829msohyperlink {}#yiv2018272829 span.yiv2018272829msohyperlinkfollowed {}#yiv2018272829 span.yiv2018272829msohyperlink1 {}#yiv2018272829 span.yiv2018272829msohyperlinkfollowed1 {}#yiv2018272829 span.yiv2018272829emailstyle171 {}#yiv2018272829 span.yiv2018272829msoins1 {}#yiv2018272829 span.yiv2018272829emailstyle29 {}#yiv2018272829 span.yiv2018272829msoins {}#yiv2018272829 p.yiv2018272829msonormal2, #yiv2018272829 li.yiv2018272829msonormal2, #yiv2018272829 div.yiv2018272829msonormal2 {margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv2018272829 span.yiv2018272829msohyperlink2 {color:blue;text-decoration:underline;}#yiv2018272829 span.yiv2018272829msohyperlinkfollowed2 {color:purple;text-decoration:underline;}#yiv2018272829 p.yiv2018272829msonormal3, #yiv2018272829 li.yiv2018272829msonormal3, #yiv2018272829 div.yiv2018272829msonormal3 {margin-right:0cm;margin-left:0cm;font-size:12.0pt;}#yiv2018272829 p.yiv2018272829msochpdefault2, #yiv2018272829 li.yiv2018272829msochpdefault2, #yiv2018272829 div.yiv2018272829msochpdefault2 {margin-right:0cm;margin-left:0cm;font-size:12.0pt;}#yiv2018272829 p.yiv2018272829msonormal11, #yiv2018272829 li.yiv2018272829msonormal11, #yiv2018272829 div.yiv2018272829msonormal11 {margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv2018272829 span.yiv2018272829msohyperlink11 {color:blue;text-decoration:underline;}#yiv2018272829 span.yiv2018272829msohyperlinkfollowed11 {color:purple;text-decoration:underline;}#yiv2018272829 span.yiv2018272829emailstyle1711 {color:windowtext;}#yiv2018272829 span.yiv2018272829msoins11 {color:teal;text-decoration:underline;}#yiv2018272829 p.yiv2018272829msochpdefault11, #yiv2018272829 li.yiv2018272829msochpdefault11, #yiv2018272829 div.yiv2018272829msochpdefault11 {margin-right:0cm;margin-left:0cm;font-size:10.0pt;}#yiv2018272829 span.yiv2018272829emailstyle291 {color:windowtext;}#yiv2018272829 span.yiv2018272829msoins2 {color:teal;text-decoration:underline;}#yiv2018272829 span.yiv2018272829EmailStyle42 {color:#006600;}#yiv2018272829 span.yiv2018272829BalloonTextChar {}#yiv2018272829 span.yiv2018272829SpellE {}#yiv2018272829 .yiv2018272829MsoChpDefault {font-size:10.0pt;} _filtered #yiv2018272829 {margin:72.0pt 90.0pt 72.0pt 90.0pt;}#yiv2018272829 div.yiv2018272829WordSection1 {}#yiv2018272829 _filtered #yiv2018272829 {} _filtered #yiv2018272829 {} _filtered #yiv2018272829 {} _filtered #yiv2018272829 {} _filtered #yiv2018272829 {} _filtered #yiv2018272829 {} _filtered #yiv2018272829 {} _filtered #yiv2018272829 {} _filtered #yiv2018272829 {} _filtered #yiv2018272829 {}#yiv2018272829 ol {margin-bottom:0cm;}#yiv2018272829 ul {margin-bottom:0cm;}#yiv2018272829 Dear all, After going through this thread of messages, let me suggest the following approach as a start, with possibility to refine as we go on.1. Decide which languages we will include in the first phase of repertoire building. According to EGIDS scale either 1-2 (93 languages), 1-3 (134 languages), or 1-4 (180 languages). I would suggest to finish 1-2 then go on to 3- 4. The 3 Latin EGIDS 0 languages (English, French, Spanish) are included in the 1 list2. Taking the MSR-2 tables as our starting pool, I see that there are already a number of code points ineligible from being used in the root zone (white and pink background), so our pool will be only characters with white background. (recitation needed if pink background characters are to be included)3. Distributing the languages on members for analysis, I would suggest that each one goes through the characters of the language in hand, then makes a mark on each code point found in the MSR.4. If any character is not found in the MSR, take a note to look after it later.5. For every other language analyzing, if code point found already has a mark, go on to next character.6. At the end by combining all marks of all languages inspected, I think we can consider that the repertoire consists of all marked code points.7. Missing code points taken notes with should be decided what to do towards. Any suggestions for handling combinations? Welcoming your opinions. Regards,Hazem Hezzah From: latingp-bounces@icann.org [mailto:latingp-bounces@icann.org] On Behalf Of Bill Jouris Sent: Monday, 05 June, 2017 18:19 To: Mats Dufberg Cc: Latin GP Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Pardon my ignorance -- being with the Variant group, I didn't realize that you folks in Repetoire were considering any other approach. Starting with the most used languages, and then working thru as many of the less used ones as time allows, seems like the obvious approach. Were you guys actually considering a different one? Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 925-855-9512 (direct) From: Mats Dufberg <mats.dufberg@iis.se> To: Bill Jouris <bill.jouris@insidethestack.com> Cc: Latin GP <latingp@icann.org>; "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Sent: Monday, June 5, 2017 9:11 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Bill, Inclusion of languages or code points? The only code points that we can include are the code points that we have confirmed to be used by languages according to the criteria, i.e. the language must be high enough on the EGIDS scale (low number) and the usage in some language should be contemporary and established. All other code points are excluded. The number of languages is high. That is a fact. The only way to reduce the number of languages is to move the border higher up in scale. My suggestion is that we should start working by taking the languages highest up on the scale (0-2) and get some experience from that. When we see what we get, we can move into languages 3-4. https://www.ethnologue.com/about/language-status Yours,Mats ---Mats DufbergDNS Specialist, IISMobile: +46 73 065 3899https://www.iis.se/en/ From: Bill Jouris <bill.jouris@insidethestack.com> Reply-To: Bill Jouris <bill.jouris@insidethestack.com> Date: Monday 5 June 2017 at 17:38 To: Mats Dufberg <mats.dufberg@iis.se>, "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Cc: Latin GP <latingp@icann.org> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Given the enormous number of languages involved, perhaps it would be better to establish which ones will be included at this time. That is, go for inclusion, rather than exclusion. And then, separately, principles and processes for including the occasional additional codepoint, if a language which we did not get thru in this initial effort requires it. Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 925-855-9512 (direct) From: Mats Dufberg <mats.dufberg@iis.se> To: "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Cc: Latin GP <latingp@icann.org> Sent: Monday, June 5, 2017 3:09 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Ahmed, If you start with MSR -- or actually MSR2 -- and try to find languages that support the inclusion of its code points you would never be able to confirm that no code points outside of MSR2 that are needed to support the languages that the Latin GP wants to support. I do not say that such code points will be included, but we should be aware of any limitation in the support of the languages that are claimed to be supported. If there is any code point in MSR2 not used by any language we would have to investigate every language anyway to confirm that the code point can be excluded. Besides the Latin code points there are non-spacing marks that are used in combination with Latin code points. Those combinations could have different status in language, either being considered to be a character on its own or being a modified character. In the repertoire that the Latin GP suggests that such non-spacing marks are limited to just those combinations that are really used in the languages that the group wants to support. To find those combinations we have to investigate all languages. Another aspect is that the method of going code point by code point in MS2 requires that we already know where to find what we are looking for. And when you start studying the material for a language, the hardest step can be to find sources and understanding what they say. After that it could be more straight forward to extract the characters. -- I do not claim that the task is simple. In my work for ICANN Pre-Delegation Testing, I have already done that. There are many grey areas, but that is our task to dig into. There is no other way than going through all the languages. Yours,Mats ---Mats DufbergDNS Specialist, IISMobile: +46 73 065 3899https://www.iis.se/en/ From: <latingp-bounces@icann.org> on behalf of Ahmed Bakhat via Latingp <latingp@icann.org> Reply-To: "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Date: Sunday 4 June 2017 at 15:35 To: "textualsolutions@gmail.com" <textualsolutions@gmail.com>, Mirjana Tasić <Mirjana.Tasic@rnids.rs> Cc: Latin GP <latingp@icann.org> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 I raised the issue during the to meeting work on Repertoire, as this group has yet not started its meetings, to devise principles for inclusion / exclusion, so ghat we should have solid grounds to include code points. Furthermore, some one has to present on behalf of the group, what we have done and what is the way forward. Regarding my email containing draft principles, I wanted to communicate that before going for any strategy ( either inclusion of code ponits on the basis of language or on the basis of MSR) we should have principles for it. In my perception it would be easy to go for MSR is much easy as compared to languages, as it would take years to finish 180 languages. It doesn't mean at all that we will start from zero, Marjina has already done most of the work, so we can quickly go through it and work on rest of the code points. Best Regards, Ahmed Bakht Sent from Yahoo Mail on Android On Sun, 4 Jun 2017 at 3:52 pm, Textual Solutions<textualsolutions@gmail.com> wrote:_______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp
Hi Bill. Yes, in the repertoire, for each code point or code point sequence included, the GP has to list at least one (or a few) languages which use it. This has to be documented as part of the proposal submitted by the GP to the Integration Panel. I am attaching the template for the document which Latin GP has to produce. Please see the table in Section 5. Regards, Sarmad From: latingp-bounces@icann.org [mailto:latingp-bounces@icann.org] On Behalf Of Bill Jouris Sent: Tuesday, June 06, 2017 11:12 PM To: Hazem Hezzah <hhezzah.las@gmail.com>; 'Latin GP' <latingp@icann.org> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Are we planning on keeping track of which code points exist for which languages? I ask because it seems to me that one of the requirements for a name should be that all of the code points in it should exist in some *single* language. That is, just because all of the code points exist in the total repertoire that doesn't mean the name should be acceptable. After all, the whole point of this exercise is to allow names in other languages. Not merely to expand the possible names as much as possible. Bill Jouris Inside Products bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com> 831-659-8360 925-855-9512 (direct) ________________________________ From: Hazem Hezzah <hhezzah.las@gmail.com<mailto:hhezzah.las@gmail.com>> To: 'Latin GP' <latingp@icann.org<mailto:latingp@icann.org>> Sent: Tuesday, June 6, 2017 6:13 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Dear all, After going through this thread of messages, let me suggest the following approach as a start, with possibility to refine as we go on. 1. Decide which languages we will include in the first phase of repertoire building. According to EGIDS scale either 1-2 (93 languages), 1-3 (134 languages), or 1-4 (180 languages). I would suggest to finish 1-2 then go on to 3- 4. The 3 Latin EGIDS 0 languages (English, French, Spanish) are included in the 1 list 2. Taking the MSR-2 tables as our starting pool, I see that there are already a number of code points ineligible from being used in the root zone (white and pink background), so our pool will be only characters with white background. (recitation needed if pink background characters are to be included) 3. Distributing the languages on members for analysis, I would suggest that each one goes through the characters of the language in hand, then makes a mark on each code point found in the MSR. 4. If any character is not found in the MSR, take a note to look after it later. 5. For every other language analyzing, if code point found already has a mark, go on to next character. 6. At the end by combining all marks of all languages inspected, I think we can consider that the repertoire consists of all marked code points. 7. Missing code points taken notes with should be decided what to do towards. Any suggestions for handling combinations? Welcoming your opinions. Regards, Hazem Hezzah From: latingp-bounces@icann.org<mailto:latingp-bounces@icann.org> [mailto:latingp-bounces@icann.org] On Behalf Of Bill Jouris Sent: Monday, 05 June, 2017 18:19 To: Mats Dufberg Cc: Latin GP Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Pardon my ignorance -- being with the Variant group, I didn't realize that you folks in Repetoire were considering any other approach. Starting with the most used languages, and then working thru as many of the less used ones as time allows, seems like the obvious approach. Were you guys actually considering a different one? Bill Jouris Inside Products bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com> 831-659-8360 925-855-9512 (direct) ________________________________ From: Mats Dufberg <mats.dufberg@iis.se<mailto:mats.dufberg@iis.se>> To: Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> Cc: Latin GP <latingp@icann.org<mailto:latingp@icann.org>>; "ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>" <ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>> Sent: Monday, June 5, 2017 9:11 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Bill, Inclusion of languages or code points? The only code points that we can include are the code points that we have confirmed to be used by languages according to the criteria, i.e. the language must be high enough on the EGIDS scale (low number) and the usage in some language should be contemporary and established. All other code points are excluded. The number of languages is high. That is a fact. The only way to reduce the number of languages is to move the border higher up in scale. My suggestion is that we should start working by taking the languages highest up on the scale (0-2) and get some experience from that. When we see what we get, we can move into languages 3-4. https://www.ethnologue.com/about/language-status[ethnologue.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ethnologue.com_about_language-2Dstatus&d=DwMFaQ&c=FmY1u3PJp6wrcrwll3mSVzgfkbPSS6sJms7xcl4I5cM&r=KTETvEaGPwPcawI-QmNa-kiv-ZBvdgyyLm-mxd028M4&m=BrFfSw9ZKh-kQIZzDcza0GMbgU1_zjIK1r5WHg5qnBI&s=8iGPxChEkt--xG9ZaH7tUQB9ZJPxKkQ_Ev6e-n7LuLA&e=> Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/[iis.se]<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.iis.se_en_&d=DwMFaQ&c=FmY1u3PJp6wrcrwll3mSVzgfkbPSS6sJms7xcl4I5cM&r=KTETvEaGPwPcawI-QmNa-kiv-ZBvdgyyLm-mxd028M4&m=BrFfSw9ZKh-kQIZzDcza0GMbgU1_zjIK1r5WHg5qnBI&s=9GuLF6_4nc3CU8EkzmCEEeuvbteaQVYtkcHtD2BhwLc&e=> From: Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> Reply-To: Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> Date: Monday 5 June 2017 at 17:38 To: Mats Dufberg <mats.dufberg@iis.se<mailto:mats.dufberg@iis.se>>, "ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>" <ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>> Cc: Latin GP <latingp@icann.org<mailto:latingp@icann.org>> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Given the enormous number of languages involved, perhaps it would be better to establish which ones will be included at this time. That is, go for inclusion, rather than exclusion. And then, separately, principles and processes for including the occasional additional codepoint, if a language which we did not get thru in this initial effort requires it. Bill Jouris Inside Products bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com> 831-659-8360 925-855-9512 (direct) ________________________________ From: Mats Dufberg <mats.dufberg@iis.se<mailto:mats.dufberg@iis.se>> To: "ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>" <ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>> Cc: Latin GP <latingp@icann.org<mailto:latingp@icann.org>> Sent: Monday, June 5, 2017 3:09 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Ahmed, If you start with MSR -- or actually MSR2 -- and try to find languages that support the inclusion of its code points you would never be able to confirm that no code points outside of MSR2 that are needed to support the languages that the Latin GP wants to support. I do not say that such code points will be included, but we should be aware of any limitation in the support of the languages that are claimed to be supported. If there is any code point in MSR2 not used by any language we would have to investigate every language anyway to confirm that the code point can be excluded. Besides the Latin code points there are non-spacing marks that are used in combination with Latin code points. Those combinations could have different status in language, either being considered to be a character on its own or being a modified character. In the repertoire that the Latin GP suggests that such non-spacing marks are limited to just those combinations that are really used in the languages that the group wants to support. To find those combinations we have to investigate all languages. Another aspect is that the method of going code point by code point in MS2 requires that we already know where to find what we are looking for. And when you start studying the material for a language, the hardest step can be to find sources and understanding what they say. After that it could be more straight forward to extract the characters. -- I do not claim that the task is simple. In my work for ICANN Pre-Delegation Testing, I have already done that. There are many grey areas, but that is our task to dig into. There is no other way than going through all the languages. Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/[iis.se]<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.iis.se_en_&d=DwMFaQ&c=FmY1u3PJp6wrcrwll3mSVzgfkbPSS6sJms7xcl4I5cM&r=KTETvEaGPwPcawI-QmNa-kiv-ZBvdgyyLm-mxd028M4&m=BrFfSw9ZKh-kQIZzDcza0GMbgU1_zjIK1r5WHg5qnBI&s=9GuLF6_4nc3CU8EkzmCEEeuvbteaQVYtkcHtD2BhwLc&e=> From: <latingp-bounces@icann.org<mailto:latingp-bounces@icann.org>> on behalf of Ahmed Bakhat via Latingp <latingp@icann.org<mailto:latingp@icann.org>> Reply-To: "ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>" <ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>> Date: Sunday 4 June 2017 at 15:35 To: "textualsolutions@gmail.com<mailto:textualsolutions@gmail.com>" <textualsolutions@gmail.com<mailto:textualsolutions@gmail.com>>, Mirjana Tasić <Mirjana.Tasic@rnids.rs<mailto:Mirjana.Tasic@rnids.rs>> Cc: Latin GP <latingp@icann.org<mailto:latingp@icann.org>> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 I raised the issue during the to meeting work on Repertoire, as this group has yet not started its meetings, to devise principles for inclusion / exclusion, so ghat we should have solid grounds to include code points. Furthermore, some one has to present on behalf of the group, what we have done and what is the way forward. Regarding my email containing draft principles, I wanted to communicate that before going for any strategy ( either inclusion of code ponits on the basis of language or on the basis of MSR) we should have principles for it. In my perception it would be easy to go for MSR is much easy as compared to languages, as it would take years to finish 180 languages. It doesn't mean at all that we will start from zero, Marjina has already done most of the work, so we can quickly go through it and work on rest of the code points. Best Regards, Ahmed Bakht Sent from Yahoo Mail on Android[overview.mail.yahoo.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__overview.mail.yahoo.com...> On Sun, 4 Jun 2017 at 3:52 pm, Textual Solutions <textualsolutions@gmail.com<mailto:textualsolutions@gmail.com>> wrote: _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp
Thanks for the template
citing evidence of the letters’ use or membership in an alphabet
Do you have to apply to native speaker judgements for this? On Tue, Jun 6, 2017 at 9:50 PM, Sarmad Hussain <sarmad.hussain@icann.org> wrote:
Hi Bill.
Yes, in the repertoire, for each code point or code point sequence included, the GP has to list at least one (or a few) languages which use it. This has to be documented as part of the proposal submitted by the GP to the Integration Panel. I am attaching the template for the document which Latin GP has to produce. Please see the table in Section 5.
Regards, Sarmad
*From:* latingp-bounces@icann.org [mailto:latingp-bounces@icann.org] *On Behalf Of *Bill Jouris *Sent:* Tuesday, June 06, 2017 11:12 PM *To:* Hazem Hezzah <hhezzah.las@gmail.com>; 'Latin GP' <latingp@icann.org> *Subject:* Re: [Latingp] Minutes from the call on May 30, 2017
Are we planning on keeping track of which code points exist for which languages?
I ask because it seems to me that one of the requirements for a name should be that all of the code points in it should exist in some *single* language. That is, just because all of the code points exist in the total repertoire that doesn't mean the name should be acceptable.
After all, the whole point of this exercise is to allow names in other languages. Not merely to expand the possible names as much as possible.
Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 <(831)%20659-8360> 925-855-9512 <(925)%20855-9512> (direct)
------------------------------
*From:* Hazem Hezzah <hhezzah.las@gmail.com> *To:* 'Latin GP' <latingp@icann.org> *Sent:* Tuesday, June 6, 2017 6:13 AM *Subject:* Re: [Latingp] Minutes from the call on May 30, 2017
Dear all,
After going through this thread of messages, let me suggest the following approach as a start, with possibility to refine as we go on.
1. Decide which languages we will include in the first phase of repertoire building. According to EGIDS scale either 1-2 (93 languages), 1-3 (134 languages), or 1-4 (180 languages). I would suggest to finish 1-2 then go on to 3- 4. The 3 Latin EGIDS 0 languages (English, French, Spanish) are included in the 1 list
2. Taking the MSR-2 tables as our starting pool, I see that there are already a number of code points ineligible from being used in the root zone (white and pink background), so our pool will be only characters with white background. (*recitation needed if pink background characters are to be included*)
3. Distributing the languages on members for analysis, I would suggest that each one goes through the characters of the language in hand, then makes a mark on each code point found in the MSR.
4. If any character is not found in the MSR, take a note to look after it later.
5. For every other language analyzing, if code point found already has a mark, go on to next character.
6. At the end by combining all marks of all languages inspected, I think we can consider that the repertoire consists of all marked code points.
7. Missing code points taken notes with should be decided what to do towards.
Any suggestions for handling combinations?
Welcoming your opinions.
Regards,
*Hazem Hezzah*
*From:* latingp-bounces@icann.org [mailto:latingp-bounces@icann.org <latingp-bounces@icann.org>] *On Behalf Of *Bill Jouris *Sent:* Monday, 05 June, 2017 18:19 *To:* Mats Dufberg *Cc:* Latin GP *Subject:* Re: [Latingp] Minutes from the call on May 30, 2017
Pardon my ignorance -- being with the Variant group, I didn't realize that you folks in Repetoire were considering any other approach.
Starting with the most used languages, and then working thru as many of the less used ones as time allows, seems like the obvious approach. Were you guys actually considering a different one?
Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 <(831)%20659-8360> 925-855-9512 <(925)%20855-9512> (direct)
------------------------------
*From:* Mats Dufberg <mats.dufberg@iis.se> *To:* Bill Jouris <bill.jouris@insidethestack.com> *Cc:* Latin GP <latingp@icann.org>; "ahmedbakhat@yahoo.com" < ahmedbakhat@yahoo.com> *Sent:* Monday, June 5, 2017 9:11 AM *Subject:* Re: [Latingp] Minutes from the call on May 30, 2017
Bill,
Inclusion of languages or code points?
The only code points that we can include are the code points that we have confirmed to be used by languages according to the criteria, i.e. the language must be high enough on the EGIDS scale (low number) and the usage in some language should be contemporary and established. All other code points are excluded.
The number of languages is high. That is a fact. The only way to reduce the number of languages is to move the border higher up in scale.
My suggestion is that we should start working by taking the languages highest up on the scale (0-2) and get some experience from that. When we see what we get, we can move into languages 3-4.
https://www.ethnologue.com/about/language-status[ethnologue.com] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ethnologue.com_abou...>
Yours,
Mats
---
Mats Dufberg
DNS Specialist, IIS
Mobile: +46 73 065 3899 <+46%2073%20065%2038%2099>
https://www.iis.se/en/[iis.se] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.iis.se_en_&d=DwMFaQ...>
*From: *Bill Jouris <bill.jouris@insidethestack.com> *Reply-To: *Bill Jouris <bill.jouris@insidethestack.com> *Date: *Monday 5 June 2017 at 17:38 *To: *Mats Dufberg <mats.dufberg@iis.se>, "ahmedbakhat@yahoo.com" < ahmedbakhat@yahoo.com> *Cc: *Latin GP <latingp@icann.org> *Subject: *Re: [Latingp] Minutes from the call on May 30, 2017
Given the enormous number of languages involved, perhaps it would be better to establish which ones will be *included* at this time. That is, go for inclusion, rather than exclusion.
And then, separately, principles and processes for including the occasional additional codepoint, if a language which we did not get thru in this initial effort requires it.
Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 <(831)%20659-8360> 925-855-9512 <(925)%20855-9512> (direct)
------------------------------
*From:* Mats Dufberg <mats.dufberg@iis.se> *To:* "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> *Cc:* Latin GP <latingp@icann.org> *Sent:* Monday, June 5, 2017 3:09 AM *Subject:* Re: [Latingp] Minutes from the call on May 30, 2017
Ahmed,
If you start with MSR -- or actually MSR2 -- and try to find languages that support the inclusion of its code points you would never be able to confirm that no code points outside of MSR2 that are needed to support the languages that the Latin GP wants to support. I do not say that such code points will be included, but we should be aware of any limitation in the support of the languages that are claimed to be supported.
If there is any code point in MSR2 not used by any language we would have to investigate every language anyway to confirm that the code point can be excluded.
Besides the Latin code points there are non-spacing marks that are used in combination with Latin code points. Those combinations could have different status in language, either being considered to be a character on its own or being a modified character. In the repertoire that the Latin GP suggests that such non-spacing marks are limited to just those combinations that are really used in the languages that the group wants to support. To find those combinations we have to investigate all languages.
Another aspect is that the method of going code point by code point in MS2 requires that we already know where to find what we are looking for. And when you start studying the material for a language, the hardest step can be to find sources and understanding what they say. After that it could be more straight forward to extract the characters. -- I do not claim that the task is simple. In my work for ICANN Pre-Delegation Testing, I have already done that. There are many grey areas, but that is our task to dig into.
There is no other way than going through all the languages.
Yours,
Mats
---
Mats Dufberg
DNS Specialist, IIS
Mobile: +46 73 065 3899 <+46%2073%20065%2038%2099>
https://www.iis.se/en/[iis.se] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.iis.se_en_&d=DwMFaQ...>
*From: *<latingp-bounces@icann.org> on behalf of Ahmed Bakhat via Latingp <latingp@icann.org> *Reply-To: *"ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> *Date: *Sunday 4 June 2017 at 15:35 *To: *"textualsolutions@gmail.com" <textualsolutions@gmail.com>, Mirjana Tasić <Mirjana.Tasic@rnids.rs> *Cc: *Latin GP <latingp@icann.org> *Subject: *Re: [Latingp] Minutes from the call on May 30, 2017
I raised the issue during the to meeting work on Repertoire, as this group has yet not started its meetings, to devise principles for inclusion / exclusion, so ghat we should have solid grounds to include code points. Furthermore, some one has to present on behalf of the group, what we have done and what is the way forward.
Regarding my email containing draft principles, I wanted to communicate that before going for any strategy ( either inclusion of code ponits on the basis of language or on the basis of MSR) we should have principles for it. In my perception it would be easy to go for MSR is much easy as compared to languages, as it would take years to finish 180 languages.
It doesn't mean at all that we will start from zero, Marjina has already done most of the work, so we can quickly go through it and work on rest of the code points.
Best Regards,
Ahmed Bakht
Sent from Yahoo Mail on Android[overview.mail.yahoo.com] <https://urldefense.proofpoint.com/v2/url?u=https-3A__overview.mail.yahoo.com...>
On Sun, 4 Jun 2017 at 3:52 pm, Textual Solutions
<textualsolutions@gmail.com> wrote:
_______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp
_______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp
_______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp
_______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp
Hi Nebiye, It is generally preferred citing a published source. However, in exceptional cases where such sources are either not available or an alternate point of view needs to be presented, communication with an expert may be cited. These are reviewed by the by the Integration Panel. Regards, Sarmad From: Textual Solutions [mailto:textualsolutions@gmail.com] Sent: Wednesday, June 07, 2017 1:23 AM To: Sarmad Hussain <sarmad.hussain@icann.org> Cc: Bill Jouris <bill.jouris@insidethestack.com>; Hazem Hezzah <hhezzah.las@gmail.com>; Latin GP <latingp@icann.org> Subject: [Ext] Re: [Latingp] Minutes from the call on May 30, 2017 Thanks for the template
citing evidence of the letters’ use or membership in an alphabet
Do you have to apply to native speaker judgements for this? On Tue, Jun 6, 2017 at 9:50 PM, Sarmad Hussain <sarmad.hussain@icann.org<mailto:sarmad.hussain@icann.org>> wrote: Hi Bill. Yes, in the repertoire, for each code point or code point sequence included, the GP has to list at least one (or a few) languages which use it. This has to be documented as part of the proposal submitted by the GP to the Integration Panel. I am attaching the template for the document which Latin GP has to produce. Please see the table in Section 5. Regards, Sarmad From: latingp-bounces@icann.org<mailto:latingp-bounces@icann.org> [mailto:latingp-bounces@icann.org<mailto:latingp-bounces@icann.org>] On Behalf Of Bill Jouris Sent: Tuesday, June 06, 2017 11:12 PM To: Hazem Hezzah <hhezzah.las@gmail.com<mailto:hhezzah.las@gmail.com>>; 'Latin GP' <latingp@icann.org<mailto:latingp@icann.org>> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Are we planning on keeping track of which code points exist for which languages? I ask because it seems to me that one of the requirements for a name should be that all of the code points in it should exist in some *single* language. That is, just because all of the code points exist in the total repertoire that doesn't mean the name should be acceptable. After all, the whole point of this exercise is to allow names in other languages. Not merely to expand the possible names as much as possible. Bill Jouris Inside Products bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com> 831-659-8360<tel:(831)%20659-8360> 925-855-9512<tel:(925)%20855-9512> (direct) ________________________________ From: Hazem Hezzah <hhezzah.las@gmail.com<mailto:hhezzah.las@gmail.com>> To: 'Latin GP' <latingp@icann.org<mailto:latingp@icann.org>> Sent: Tuesday, June 6, 2017 6:13 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Dear all, After going through this thread of messages, let me suggest the following approach as a start, with possibility to refine as we go on. 1. Decide which languages we will include in the first phase of repertoire building. According to EGIDS scale either 1-2 (93 languages), 1-3 (134 languages), or 1-4 (180 languages). I would suggest to finish 1-2 then go on to 3- 4. The 3 Latin EGIDS 0 languages (English, French, Spanish) are included in the 1 list 2. Taking the MSR-2 tables as our starting pool, I see that there are already a number of code points ineligible from being used in the root zone (white and pink background), so our pool will be only characters with white background. (recitation needed if pink background characters are to be included) 3. Distributing the languages on members for analysis, I would suggest that each one goes through the characters of the language in hand, then makes a mark on each code point found in the MSR. 4. If any character is not found in the MSR, take a note to look after it later. 5. For every other language analyzing, if code point found already has a mark, go on to next character. 6. At the end by combining all marks of all languages inspected, I think we can consider that the repertoire consists of all marked code points. 7. Missing code points taken notes with should be decided what to do towards. Any suggestions for handling combinations? Welcoming your opinions. Regards, Hazem Hezzah From: latingp-bounces@icann.org<mailto:latingp-bounces@icann.org> [mailto:latingp-bounces@icann.org] On Behalf Of Bill Jouris Sent: Monday, 05 June, 2017 18:19 To: Mats Dufberg Cc: Latin GP Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Pardon my ignorance -- being with the Variant group, I didn't realize that you folks in Repetoire were considering any other approach. Starting with the most used languages, and then working thru as many of the less used ones as time allows, seems like the obvious approach. Were you guys actually considering a different one? Bill Jouris Inside Products bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com> 831-659-8360<tel:(831)%20659-8360> 925-855-9512<tel:(925)%20855-9512> (direct) ________________________________ From: Mats Dufberg <mats.dufberg@iis.se<mailto:mats.dufberg@iis.se>> To: Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> Cc: Latin GP <latingp@icann.org<mailto:latingp@icann.org>>; "ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>" <ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>> Sent: Monday, June 5, 2017 9:11 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Bill, Inclusion of languages or code points? The only code points that we can include are the code points that we have confirmed to be used by languages according to the criteria, i.e. the language must be high enough on the EGIDS scale (low number) and the usage in some language should be contemporary and established. All other code points are excluded. The number of languages is high. That is a fact. The only way to reduce the number of languages is to move the border higher up in scale. My suggestion is that we should start working by taking the languages highest up on the scale (0-2) and get some experience from that. When we see what we get, we can move into languages 3-4. https://www.ethnologue.com/about/language-status[ethnologue.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ethnologue.com_about_language-2Dstatus&d=DwMFaQ&c=FmY1u3PJp6wrcrwll3mSVzgfkbPSS6sJms7xcl4I5cM&r=KTETvEaGPwPcawI-QmNa-kiv-ZBvdgyyLm-mxd028M4&m=BrFfSw9ZKh-kQIZzDcza0GMbgU1_zjIK1r5WHg5qnBI&s=8iGPxChEkt--xG9ZaH7tUQB9ZJPxKkQ_Ev6e-n7LuLA&e=> Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899<tel:+46%2073%20065%2038%2099> https://www.iis.se/en/[iis.se]<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.iis.se_en_&d=DwMFaQ&c=FmY1u3PJp6wrcrwll3mSVzgfkbPSS6sJms7xcl4I5cM&r=KTETvEaGPwPcawI-QmNa-kiv-ZBvdgyyLm-mxd028M4&m=BrFfSw9ZKh-kQIZzDcza0GMbgU1_zjIK1r5WHg5qnBI&s=9GuLF6_4nc3CU8EkzmCEEeuvbteaQVYtkcHtD2BhwLc&e=> From: Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> Reply-To: Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> Date: Monday 5 June 2017 at 17:38 To: Mats Dufberg <mats.dufberg@iis.se<mailto:mats.dufberg@iis.se>>, "ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>" <ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>> Cc: Latin GP <latingp@icann.org<mailto:latingp@icann.org>> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Given the enormous number of languages involved, perhaps it would be better to establish which ones will be included at this time. That is, go for inclusion, rather than exclusion. And then, separately, principles and processes for including the occasional additional codepoint, if a language which we did not get thru in this initial effort requires it. Bill Jouris Inside Products bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com> 831-659-8360<tel:(831)%20659-8360> 925-855-9512<tel:(925)%20855-9512> (direct) ________________________________ From: Mats Dufberg <mats.dufberg@iis.se<mailto:mats.dufberg@iis.se>> To: "ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>" <ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>> Cc: Latin GP <latingp@icann.org<mailto:latingp@icann.org>> Sent: Monday, June 5, 2017 3:09 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Ahmed, If you start with MSR -- or actually MSR2 -- and try to find languages that support the inclusion of its code points you would never be able to confirm that no code points outside of MSR2 that are needed to support the languages that the Latin GP wants to support. I do not say that such code points will be included, but we should be aware of any limitation in the support of the languages that are claimed to be supported. If there is any code point in MSR2 not used by any language we would have to investigate every language anyway to confirm that the code point can be excluded. Besides the Latin code points there are non-spacing marks that are used in combination with Latin code points. Those combinations could have different status in language, either being considered to be a character on its own or being a modified character. In the repertoire that the Latin GP suggests that such non-spacing marks are limited to just those combinations that are really used in the languages that the group wants to support. To find those combinations we have to investigate all languages. Another aspect is that the method of going code point by code point in MS2 requires that we already know where to find what we are looking for. And when you start studying the material for a language, the hardest step can be to find sources and understanding what they say. After that it could be more straight forward to extract the characters. -- I do not claim that the task is simple. In my work for ICANN Pre-Delegation Testing, I have already done that. There are many grey areas, but that is our task to dig into. There is no other way than going through all the languages. Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899<tel:+46%2073%20065%2038%2099> https://www.iis.se/en/[iis.se]<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.iis.se_en_&d=DwMFaQ&c=FmY1u3PJp6wrcrwll3mSVzgfkbPSS6sJms7xcl4I5cM&r=KTETvEaGPwPcawI-QmNa-kiv-ZBvdgyyLm-mxd028M4&m=BrFfSw9ZKh-kQIZzDcza0GMbgU1_zjIK1r5WHg5qnBI&s=9GuLF6_4nc3CU8EkzmCEEeuvbteaQVYtkcHtD2BhwLc&e=> From: <latingp-bounces@icann.org<mailto:latingp-bounces@icann.org>> on behalf of Ahmed Bakhat via Latingp <latingp@icann.org<mailto:latingp@icann.org>> Reply-To: "ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>" <ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>> Date: Sunday 4 June 2017 at 15:35 To: "textualsolutions@gmail.com<mailto:textualsolutions@gmail.com>" <textualsolutions@gmail.com<mailto:textualsolutions@gmail.com>>, Mirjana Tasić <Mirjana.Tasic@rnids.rs<mailto:Mirjana.Tasic@rnids.rs>> Cc: Latin GP <latingp@icann.org<mailto:latingp@icann.org>> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 I raised the issue during the to meeting work on Repertoire, as this group has yet not started its meetings, to devise principles for inclusion / exclusion, so ghat we should have solid grounds to include code points. Furthermore, some one has to present on behalf of the group, what we have done and what is the way forward. Regarding my email containing draft principles, I wanted to communicate that before going for any strategy ( either inclusion of code ponits on the basis of language or on the basis of MSR) we should have principles for it. In my perception it would be easy to go for MSR is much easy as compared to languages, as it would take years to finish 180 languages. It doesn't mean at all that we will start from zero, Marjina has already done most of the work, so we can quickly go through it and work on rest of the code points. Best Regards, Ahmed Bakht Sent from Yahoo Mail on Android[overview.mail.yahoo.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__overview.mail.yahoo.com...> On Sun, 4 Jun 2017 at 3:52 pm, Textual Solutions <textualsolutions@gmail.com<mailto:textualsolutions@gmail.com>> wrote: _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp
Bill, No, the goal is not, as far as I understand, to restrict the combination of characters to match some language. I.e. it will be permitted to create a label with character A and B, where A is only found in language a and B is only found in language b. The IDN table that we are to propose will be script based, but with evidence that the code points (or sequence of code points) are in contemporary, everyday use of at least one language in the list of chosen languages. Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From: <latingp-bounces@icann.org> on behalf of Bill Jouris <bill.jouris@insidethestack.com> Reply-To: Bill Jouris <bill.jouris@insidethestack.com> Date: Tuesday 6 June 2017 at 20:11 To: Hazem Hezzah <hhezzah.las@gmail.com>, 'Latin GP' <latingp@icann.org> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Are we planning on keeping track of which code points exist for which languages? I ask because it seems to me that one of the requirements for a name should be that all of the code points in it should exist in some *single* language. That is, just because all of the code points exist in the total repertoire that doesn't mean the name should be acceptable. After all, the whole point of this exercise is to allow names in other languages. Not merely to expand the possible names as much as possible. Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 925-855-9512 (direct) ________________________________ From: Hazem Hezzah <hhezzah.las@gmail.com> To: 'Latin GP' <latingp@icann.org> Sent: Tuesday, June 6, 2017 6:13 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Dear all, After going through this thread of messages, let me suggest the following approach as a start, with possibility to refine as we go on. 1. Decide which languages we will include in the first phase of repertoire building. According to EGIDS scale either 1-2 (93 languages), 1-3 (134 languages), or 1-4 (180 languages). I would suggest to finish 1-2 then go on to 3- 4. The 3 Latin EGIDS 0 languages (English, French, Spanish) are included in the 1 list 2. Taking the MSR-2 tables as our starting pool, I see that there are already a number of code points ineligible from being used in the root zone (white and pink background), so our pool will be only characters with white background. (recitation needed if pink background characters are to be included) 3. Distributing the languages on members for analysis, I would suggest that each one goes through the characters of the language in hand, then makes a mark on each code point found in the MSR. 4. If any character is not found in the MSR, take a note to look after it later. 5. For every other language analyzing, if code point found already has a mark, go on to next character. 6. At the end by combining all marks of all languages inspected, I think we can consider that the repertoire consists of all marked code points. 7. Missing code points taken notes with should be decided what to do towards. Any suggestions for handling combinations? Welcoming your opinions. Regards, Hazem Hezzah From: latingp-bounces@icann.org [mailto:latingp-bounces@icann.org] On Behalf Of Bill Jouris Sent: Monday, 05 June, 2017 18:19 To: Mats Dufberg Cc: Latin GP Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Pardon my ignorance -- being with the Variant group, I didn't realize that you folks in Repetoire were considering any other approach. Starting with the most used languages, and then working thru as many of the less used ones as time allows, seems like the obvious approach. Were you guys actually considering a different one? Bill Jouris Inside Products bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com> 831-659-8360 925-855-9512 (direct) ________________________________ From: Mats Dufberg <mats.dufberg@iis.se<mailto:mats.dufberg@iis.se>> To: Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> Cc: Latin GP <latingp@icann.org<mailto:latingp@icann.org>>; "ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>" <ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>> Sent: Monday, June 5, 2017 9:11 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Bill, Inclusion of languages or code points? The only code points that we can include are the code points that we have confirmed to be used by languages according to the criteria, i.e. the language must be high enough on the EGIDS scale (low number) and the usage in some language should be contemporary and established. All other code points are excluded. The number of languages is high. That is a fact. The only way to reduce the number of languages is to move the border higher up in scale. My suggestion is that we should start working by taking the languages highest up on the scale (0-2) and get some experience from that. When we see what we get, we can move into languages 3-4. https://www.ethnologue.com/about/language-status Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From: Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> Reply-To: Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> Date: Monday 5 June 2017 at 17:38 To: Mats Dufberg <mats.dufberg@iis.se<mailto:mats.dufberg@iis.se>>, "ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>" <ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>> Cc: Latin GP <latingp@icann.org<mailto:latingp@icann.org>> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Given the enormous number of languages involved, perhaps it would be better to establish which ones will be included at this time. That is, go for inclusion, rather than exclusion. And then, separately, principles and processes for including the occasional additional codepoint, if a language which we did not get thru in this initial effort requires it. Bill Jouris Inside Products bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com> 831-659-8360 925-855-9512 (direct) ________________________________ From: Mats Dufberg <mats.dufberg@iis.se<mailto:mats.dufberg@iis.se>> To: "ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>" <ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>> Cc: Latin GP <latingp@icann.org<mailto:latingp@icann.org>> Sent: Monday, June 5, 2017 3:09 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Ahmed, If you start with MSR -- or actually MSR2 -- and try to find languages that support the inclusion of its code points you would never be able to confirm that no code points outside of MSR2 that are needed to support the languages that the Latin GP wants to support. I do not say that such code points will be included, but we should be aware of any limitation in the support of the languages that are claimed to be supported. If there is any code point in MSR2 not used by any language we would have to investigate every language anyway to confirm that the code point can be excluded. Besides the Latin code points there are non-spacing marks that are used in combination with Latin code points. Those combinations could have different status in language, either being considered to be a character on its own or being a modified character. In the repertoire that the Latin GP suggests that such non-spacing marks are limited to just those combinations that are really used in the languages that the group wants to support. To find those combinations we have to investigate all languages. Another aspect is that the method of going code point by code point in MS2 requires that we already know where to find what we are looking for. And when you start studying the material for a language, the hardest step can be to find sources and understanding what they say. After that it could be more straight forward to extract the characters. -- I do not claim that the task is simple. In my work for ICANN Pre-Delegation Testing, I have already done that. There are many grey areas, but that is our task to dig into. There is no other way than going through all the languages. Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From: <latingp-bounces@icann.org<mailto:latingp-bounces@icann.org>> on behalf of Ahmed Bakhat via Latingp <latingp@icann.org<mailto:latingp@icann.org>> Reply-To: "ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>" <ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>> Date: Sunday 4 June 2017 at 15:35 To: "textualsolutions@gmail.com<mailto:textualsolutions@gmail.com>" <textualsolutions@gmail.com<mailto:textualsolutions@gmail.com>>, Mirjana Tasić <Mirjana.Tasic@rnids.rs<mailto:Mirjana.Tasic@rnids.rs>> Cc: Latin GP <latingp@icann.org<mailto:latingp@icann.org>> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 I raised the issue during the to meeting work on Repertoire, as this group has yet not started its meetings, to devise principles for inclusion / exclusion, so ghat we should have solid grounds to include code points. Furthermore, some one has to present on behalf of the group, what we have done and what is the way forward. Regarding my email containing draft principles, I wanted to communicate that before going for any strategy ( either inclusion of code ponits on the basis of language or on the basis of MSR) we should have principles for it. In my perception it would be easy to go for MSR is much easy as compared to languages, as it would take years to finish 180 languages. It doesn't mean at all that we will start from zero, Marjina has already done most of the work, so we can quickly go through it and work on rest of the code points. Best Regards, Ahmed Bakht Sent from Yahoo Mail on Android<https://overview.mail.yahoo.com/mobile/?.src=Android> On Sun, 4 Jun 2017 at 3:52 pm, Textual Solutions <textualsolutions@gmail.com<mailto:textualsolutions@gmail.com>> wrote: _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp
Yes, as Mats suggests,
The IDN table that we are to propose will be script based, but with evidence that the code points (or sequence of code points) are in contemporary, everyday use of at least one language in the list of chosen languages.
To add to Hazem’s point, in case of Arabic script, some letters are encoded in “duplicate”, and languages use either one or the other code of the letter (e.g. 0643 and 06A9 for the letter Kaf). In such cases, Arabic script GP considered that the labels which mix the two versions of the same letter are not usable and therefore wrote rules to make such variant labels “blocked”. Regards, Sarmad From: latingp-bounces@icann.org [mailto:latingp-bounces@icann.org] On Behalf Of Hazem Hezzah Sent: Wednesday, June 07, 2017 12:42 PM To: Mats Dufberg <mats.dufberg@iis.se>; Bill Jouris <bill.jouris@insidethestack.com> Cc: 'Latin GP' <latingp@icann.org> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 As far as I remember, in the Arabic script LGR we added a condition to have all codepoints in the label from the same language. Some way to avoid phishing. Regards, Hazem Hezzah - Sent from my mobile device - -------- Original Message -------- Subject: Re: [Latingp] Minutes from the call on May 30, 2017 From: Mats Dufberg To: Bill Jouris CC: 'Latin GP' Bill, No, the goal is not, as far as I understand, to restrict the combination of characters to match some language. I.e. it will be permitted to create a label with character A and B, where A is only found in language a and B is only found in language b. The IDN table that we are to propose will be script based, but with evidence that the code points (or sequence of code points) are in contemporary, everyday use of at least one language in the list of chosen languages. Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From: <latingp-bounces@icann.org<mailto:latingp-bounces@icann.org>> on behalf of Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> Reply-To: Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> Date: Tuesday 6 June 2017 at 20:11 To: Hazem Hezzah <hhezzah.las@gmail.com<mailto:hhezzah.las@gmail.com>>, 'Latin GP' <latingp@icann.org<mailto:latingp@icann.org>> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Are we planning on keeping track of which code points exist for which languages? I ask because it seems to me that one of the requirements for a name should be that all of the code points in it should exist in some *single* language. That is, just because all of the code points exist in the total repertoire that doesn't mean the name should be acceptable. After all, the whole point of this exercise is to allow names in other languages. Not merely to expand the possible names as much as possible. Bill Jouris Inside Products bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com> 831-659-8360 925-855-9512 (direct) ________________________________ From: Hazem Hezzah <hhezzah.las@gmail.com<mailto:hhezzah.las@gmail.com>> To: 'Latin GP' <latingp@icann.org<mailto:latingp@icann.org>> Sent: Tuesday, June 6, 2017 6:13 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Dear all, After going through this thread of messages, let me suggest the following approach as a start, with possibility to refine as we go on. 1. Decide which languages we will include in the first phase of repertoire building. According to EGIDS scale either 1-2 (93 languages), 1-3 (134 languages), or 1-4 (180 languages). I would suggest to finish 1-2 then go on to 3- 4. The 3 Latin EGIDS 0 languages (English, French, Spanish) are included in the 1 list 2. Taking the MSR-2 tables as our starting pool, I see that there are already a number of code points ineligible from being used in the root zone (white and pink background), so our pool will be only characters with white background. (recitation needed if pink background characters are to be included) 3. Distributing the languages on members for analysis, I would suggest that each one goes through the characters of the language in hand, then makes a mark on each code point found in the MSR. 4. If any character is not found in the MSR, take a note to look after it later. 5. For every other language analyzing, if code point found already has a mark, go on to next character. 6. At the end by combining all marks of all languages inspected, I think we can consider that the repertoire consists of all marked code points. 7. Missing code points taken notes with should be decided what to do towards. Any suggestions for handling combinations? Welcoming your opinions. Regards, Hazem Hezzah From: latingp-bounces@icann.org<mailto:latingp-bounces@icann.org> [mailto:latingp-bounces@icann.org] On Behalf Of Bill Jouris Sent: Monday, 05 June, 2017 18:19 To: Mats Dufberg Cc: Latin GP Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Pardon my ignorance -- being with the Variant group, I didn't realize that you folks in Repetoire were considering any other approach. Starting with the most used languages, and then working thru as many of the less used ones as time allows, seems like the obvious approach. Were you guys actually considering a different one? Bill Jouris Inside Products bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com> 831-659-8360 925-855-9512 (direct) ________________________________ From: Mats Dufberg <mats.dufberg@iis.se<mailto:mats.dufberg@iis.se>> To: Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> Cc: Latin GP <latingp@icann.org<mailto:latingp@icann.org>>; "ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>" <ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>> Sent: Monday, June 5, 2017 9:11 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Bill, Inclusion of languages or code points? The only code points that we can include are the code points that we have confirmed to be used by languages according to the criteria, i.e. the language must be high enough on the EGIDS scale (low number) and the usage in some language should be contemporary and established. All other code points are excluded. The number of languages is high. That is a fact. The only way to reduce the number of languages is to move the border higher up in scale. My suggestion is that we should start working by taking the languages highest up on the scale (0-2) and get some experience from that. When we see what we get, we can move into languages 3-4. https://www.ethnologue.com/about/language-status[ethnologue.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ethnologue.com_about_language-2Dstatus&d=DwMGaQ&c=FmY1u3PJp6wrcrwll3mSVzgfkbPSS6sJms7xcl4I5cM&r=KTETvEaGPwPcawI-QmNa-kiv-ZBvdgyyLm-mxd028M4&m=egch_HkR0BDTdQbp4dsnHNR6-7_xqUva6vj2FJsVQsc&s=tMzbxyKLDhs6Ha0nfs39pNUpn-_CQMj48sycdVCzD-o&e=> Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/[iis.se]<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.iis.se_en_&d=DwMGaQ&c=FmY1u3PJp6wrcrwll3mSVzgfkbPSS6sJms7xcl4I5cM&r=KTETvEaGPwPcawI-QmNa-kiv-ZBvdgyyLm-mxd028M4&m=egch_HkR0BDTdQbp4dsnHNR6-7_xqUva6vj2FJsVQsc&s=9PcfL3LN4xfvhBzYwE8Ftl8TJn6nXgXzVZJTCNJ2Llc&e=> From: Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> Reply-To: Bill Jouris <bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com>> Date: Monday 5 June 2017 at 17:38 To: Mats Dufberg <mats.dufberg@iis.se<mailto:mats.dufberg@iis.se>>, "ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>" <ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>> Cc: Latin GP <latingp@icann.org<mailto:latingp@icann.org>> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Given the enormous number of languages involved, perhaps it would be better to establish which ones will be included at this time. That is, go for inclusion, rather than exclusion. And then, separately, principles and processes for including the occasional additional codepoint, if a language which we did not get thru in this initial effort requires it. Bill Jouris Inside Products bill.jouris@insidethestack.com<mailto:bill.jouris@insidethestack.com> 831-659-8360 925-855-9512 (direct) ________________________________ From: Mats Dufberg <mats.dufberg@iis.se<mailto:mats.dufberg@iis.se>> To: "ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>" <ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>> Cc: Latin GP <latingp@icann.org<mailto:latingp@icann.org>> Sent: Monday, June 5, 2017 3:09 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Ahmed, If you start with MSR -- or actually MSR2 -- and try to find languages that support the inclusion of its code points you would never be able to confirm that no code points outside of MSR2 that are needed to support the languages that the Latin GP wants to support. I do not say that such code points will be included, but we should be aware of any limitation in the support of the languages that are claimed to be supported. If there is any code point in MSR2 not used by any language we would have to investigate every language anyway to confirm that the code point can be excluded. Besides the Latin code points there are non-spacing marks that are used in combination with Latin code points. Those combinations could have different status in language, either being considered to be a character on its own or being a modified character. In the repertoire that the Latin GP suggests that such non-spacing marks are limited to just those combinations that are really used in the languages that the group wants to support. To find those combinations we have to investigate all languages. Another aspect is that the method of going code point by code point in MS2 requires that we already know where to find what we are looking for. And when you start studying the material for a language, the hardest step can be to find sources and understanding what they say. After that it could be more straight forward to extract the characters. -- I do not claim that the task is simple. In my work for ICANN Pre-Delegation Testing, I have already done that. There are many grey areas, but that is our task to dig into. There is no other way than going through all the languages. Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/[iis.se]<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.iis.se_en_&d=DwMGaQ&c=FmY1u3PJp6wrcrwll3mSVzgfkbPSS6sJms7xcl4I5cM&r=KTETvEaGPwPcawI-QmNa-kiv-ZBvdgyyLm-mxd028M4&m=egch_HkR0BDTdQbp4dsnHNR6-7_xqUva6vj2FJsVQsc&s=9PcfL3LN4xfvhBzYwE8Ftl8TJn6nXgXzVZJTCNJ2Llc&e=> From: <latingp-bounces@icann.org<mailto:latingp-bounces@icann.org>> on behalf of Ahmed Bakhat via Latingp <latingp@icann.org<mailto:latingp@icann.org>> Reply-To: "ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>" <ahmedbakhat@yahoo.com<mailto:ahmedbakhat@yahoo.com>> Date: Sunday 4 June 2017 at 15:35 To: "textualsolutions@gmail.com<mailto:textualsolutions@gmail.com>" <textualsolutions@gmail.com<mailto:textualsolutions@gmail.com>>, Mirjana Tasić <Mirjana.Tasic@rnids.rs<mailto:Mirjana.Tasic@rnids.rs>> Cc: Latin GP <latingp@icann.org<mailto:latingp@icann.org>> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 I raised the issue during the to meeting work on Repertoire, as this group has yet not started its meetings, to devise principles for inclusion / exclusion, so ghat we should have solid grounds to include code points. Furthermore, some one has to present on behalf of the group, what we have done and what is the way forward. Regarding my email containing draft principles, I wanted to communicate that before going for any strategy ( either inclusion of code ponits on the basis of language or on the basis of MSR) we should have principles for it. In my perception it would be easy to go for MSR is much easy as compared to languages, as it would take years to finish 180 languages. It doesn't mean at all that we will start from zero, Marjina has already done most of the work, so we can quickly go through it and work on rest of the code points. Best Regards, Ahmed Bakht Sent from Yahoo Mail on Android[overview.mail.yahoo.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__overview.mail.yahoo.com...> On Sun, 4 Jun 2017 at 3:52 pm, Textual Solutions <textualsolutions@gmail.com<mailto:textualsolutions@gmail.com>> wrote: _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ Latingp mailing list Latingp@icann.org<mailto:Latingp@icann.org> https://mm.icann.org/mailman/listinfo/latingp
Dear All, Reference discussion within the group and clarification from Dr. Sarmad. It seems that now we have agreement to work on languages depending on the availability of its relevant code points in the MSR-2. I propose to start with EGIDS 1 languages which means status of national language of some country (though some of them are international languages line English French and Spanish). There are 63 languages which have status of EGIDS 1 and we can select 10 top most languages out of these 63 languages. Out of these 10 languages two three languages like English and French will not require any additional work so we will have around 8 languages to work with. This will simply the task for group. Following is the list of 10 top languages based on the population using it and extracted from the list communicated by Mirjana | S.No. | Language | Population | EGIDS | Language map | | 1. | Spanish. Castellano. Castilian. Español | 398931840 | 1 | Andorra and France Belize.Portugal and Spain | | 2. | English | 341000000 | 1 | over 140 countries | | 3. | Portuguese. | 203352100 | 1 | Portugal and Spain | | 4. | French.Français | 87000000 | 1 | Andorra and FranceBelgium Luxembourg and NetherlandsLiechtenstein and Switzerland | | 5. | Vietnamese. Annamese. Ching. Gin. Jing. Kinh. Viet | 73600000 | 1 | Viet Nam | | 6. | Turkish. Anatolian. Türkçe. Türkisch | 70890130 | 1 | Turkey | | 7. | German Deutsch Tedesco | 69800000 | 1 | Austria Belgium Luxembourg and NetherlandsDenmark Finland Norway and Sweden | | 8. | ItalianItaliano | 63738247 | 1 | Italy Switcerland | | 9. | Malay. | 60475586 | 1 | Malaysia - Peninsular.Brunei and Malaysia - Sarawak.Indonesia. Java and Bali.Indonesia. Kalimantan.Indonesia. Sulawesi.Indonesia. Sumatra. | | 10. | Filipino | 45000000 | 1 | Filippines | Best Regards, Ahmed Bakht On Wednesday, June 7, 2017, 2:39:08 PM GMT+5, Sarmad Hussain <sarmad.hussain@icann.org> wrote: Yes, as Mats suggests,
The IDN table that we are to propose will be script based, but with evidence that the code points (or sequence of code points) are in contemporary, everyday use of at least one language in the list of chosen languages.
To add to Hazem’s point, in case of Arabic script, some letters are encoded in “duplicate”, and languages use either one or the other code of the letter (e.g. 0643 and 06A9 for the letter Kaf). In such cases, Arabic script GP considered that the labels which mix the two versions of the same letter are not usable and therefore wrote rules to make such variant labels “blocked”. Regards, Sarmad From: latingp-bounces@icann.org [mailto:latingp-bounces@icann.org]On Behalf Of Hazem Hezzah Sent: Wednesday, June 07, 2017 12:42 PM To: Mats Dufberg <mats.dufberg@iis.se>; Bill Jouris <bill.jouris@insidethestack.com> Cc: 'Latin GP' <latingp@icann.org> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 As far as I remember, in the Arabic script LGR we added a condition to have all codepoints in the label from the same language. Some way to avoid phishing. Regards, Hazem Hezzah - Sent from my mobile device - -------- Original Message -------- Subject: Re: [Latingp] Minutes from the call on May 30, 2017 From: Mats Dufberg To: Bill Jouris CC: 'Latin GP' Bill, No, the goal is not, as far as I understand, to restrict the combination of characters to match some language. I.e. it will be permitted to create a label with character A and B, where A is only found in language a and B is only found in language b. The IDN table that we are to propose will be script based, but with evidence that the code points (or sequence of code points) are in contemporary, everyday use of at least one language in the list of chosen languages. Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/ From:<latingp-bounces@icann.org> on behalf of Bill Jouris <bill.jouris@insidethestack.com> Reply-To: Bill Jouris <bill.jouris@insidethestack.com> Date: Tuesday 6 June 2017 at 20:11 To: Hazem Hezzah <hhezzah.las@gmail.com>, 'Latin GP' <latingp@icann.org> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Are we planning on keeping track of which code points exist for which languages? I ask because it seems to me that one of the requirements for a name should be that all of the code points in it should exist in some *single* language. That is, just because all of the code points exist in the total repertoire that doesn't mean the name should be acceptable. After all, the whole point of this exercise is to allow names in other languages. Not merely to expand the possible names as much as possible. Bill Jouris Inside Products bill.jouris@insidethestack.com 831-659-8360 925-855-9512 (direct) From: Hazem Hezzah <hhezzah.las@gmail.com> To: 'Latin GP' <latingp@icann.org> Sent: Tuesday, June 6, 2017 6:13 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Dear all, After going through this thread of messages, let me suggest the following approach as a start, with possibility to refine as we go on. 1. Decide which languages we will include in the first phase of repertoire building. According to EGIDS scale either 1-2 (93 languages), 1-3 (134 languages), or 1-4 (180 languages). I would suggest to finish 1-2 then go on to 3- 4. The 3 Latin EGIDS 0 languages (English, French, Spanish) are included in the 1 list 2. Taking the MSR-2 tables as our starting pool, I see that there are already a number of code points ineligible from being used in the root zone (white and pink background), so our pool will be only characters with white background. (recitation needed if pink background characters are to be included) 3. Distributing the languages on members for analysis, I would suggest that each one goes through the characters of the language in hand, then makes a mark on each code point found in the MSR. 4. If any character is not found in the MSR, take a note to look after it later. 5. For every other language analyzing, if code point found already has a mark, go on to next character. 6. At the end by combining all marks of all languages inspected, I think we can consider that the repertoire consists of all marked code points. 7. Missing code points taken notes with should be decided what to do towards. Any suggestions for handling combinations? Welcoming your opinions. Regards, Hazem Hezzah From:latingp-bounces@icann.org [mailto:latingp-bounces@icann.org]On Behalf Of Bill Jouris Sent: Monday, 05 June, 2017 18:19 To: Mats Dufberg Cc: Latin GP Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Pardon my ignorance -- being with the Variant group, I didn't realize that you folks in Repetoire were considering any other approach. Starting with the most used languages, and then working thru as many of the less used ones as time allows, seems like the obvious approach. Were you guys actually considering a different one? BillJouris Inside Products bill.jouris@insidethestack.com 831-659-8360 925-855-9512 (direct) From: MatsDufberg <mats.dufberg@iis.se> To: Bill Jouris <bill.jouris@insidethestack.com> Cc: Latin GP <latingp@icann.org>; "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Sent: Monday, June 5, 2017 9:11 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Bill, Inclusion of languages or code points? The only code points that we can include are the code points that we have confirmed to be used by languages according to the criteria, i.e. the language must be high enough on the EGIDS scale (low number) and the usage in some language should be contemporary and established. All other code points are excluded. The number of languages is high. That is a fact. The only way to reduce the number of languages is to move the border higher up in scale. My suggestion is that we should start working by taking the languages highest up on the scale (0-2) and get some experience from that. When we see what we get, we can move into languages 3-4. https://www.ethnologue.com/about/language-status[ethnologue.com] Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/[iis.se] From:Bill Jouris <bill.jouris@insidethestack.com> Reply-To: Bill Jouris <bill.jouris@insidethestack.com> Date: Monday 5 June 2017 at 17:38 To: Mats Dufberg <mats.dufberg@iis.se>, "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Cc: Latin GP <latingp@icann.org> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Given the enormous number of languages involved, perhaps it would be better to establish which ones will beincluded at this time. That is, go for inclusion, rather than exclusion. And then, separately, principles and processes for including the occasional additional codepoint, if a language which we did not get thru in this initial effort requires it. BillJouris Inside Products bill.jouris@insidethestack.com 831-659-8360 925-855-9512 (direct) From: MatsDufberg <mats.dufberg@iis.se> To: "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Cc: Latin GP <latingp@icann.org> Sent: Monday, June 5, 2017 3:09 AM Subject: Re: [Latingp] Minutes from the call on May 30, 2017 Ahmed, If you start with MSR -- or actually MSR2 -- and try to find languages that support the inclusion of its code points you would never be able to confirm that no code points outside of MSR2 that are needed to support the languages that the Latin GP wants to support. I do not say that such code points will be included, but we should be aware of any limitation in the support of the languages that are claimed to be supported. If there is any code point in MSR2 not used by any language we would have to investigate every language anyway to confirm that the code point can be excluded. Besides the Latin code points there are non-spacing marks that are used in combination with Latin code points. Those combinations could have different status in language, either being considered to be a character on its own or being a modified character. In the repertoire that the Latin GP suggests that such non-spacing marks are limited to just those combinations that are really used in the languages that the group wants to support. To find those combinations we have to investigate all languages. Another aspect is that the method of going code point by code point in MS2 requires that we already know where to find what we are looking for. And when you start studying the material for a language, the hardest step can be to find sources and understanding what they say. After that it could be more straight forward to extract the characters. -- I do not claim that the task is simple. In my work for ICANN Pre-Delegation Testing, I have already done that. There are many grey areas, but that is our task to dig into. There is no other way than going through all the languages. Yours, Mats --- Mats Dufberg DNS Specialist, IIS Mobile: +46 73 065 3899 https://www.iis.se/en/[iis.se] From:<latingp-bounces@icann.org> on behalf of AhmedBakhat via Latingp <latingp@icann.org> Reply-To: "ahmedbakhat@yahoo.com" <ahmedbakhat@yahoo.com> Date: Sunday 4 June 2017 at 15:35 To: "textualsolutions@gmail.com" <textualsolutions@gmail.com>, MirjanaTasić <Mirjana.Tasic@rnids.rs> Cc: Latin GP <latingp@icann.org> Subject: Re: [Latingp] Minutes from the call on May 30, 2017 I raised the issue during the to meeting work on Repertoire, as this group has yet not started its meetings, to devise principles for inclusion / exclusion, so ghat we should have solid grounds to include code points. Furthermore, some one has to present on behalf of the group, what we have done and what is the way forward. Regarding my email containing draft principles, I wanted to communicate that before going for any strategy ( either inclusion of code ponits on the basis of language or on the basis of MSR) we should have principles for it. In my perception it would be easy to go for MSR is much easy as compared to languages, as it would take years to finish 180 languages. It doesn't mean at all that we will start from zero, Marjina has already done most of the work, so we can quickly go through it and work on rest of the code points. Best Regards, Ahmed Bakht Sent from Yahoo Mail on Android[overview.mail.yahoo.com] On Sun, 4 Jun 2017 at 3:52 pm, Textual Solutions <textualsolutions@gmail.com> wrote: _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp _______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp
participants (7)
-
Ahmed Bakhat -
Bill Jouris -
Hazem Hezzah -
Mats Dufberg -
Mirjana Tasić -
Sarmad Hussain -
Textual Solutions