May 2020 - Latingp - lists.icann.org

Summary of Latin GP Meeting on 7 May 2020
by Pitinan Kooarmornpatana May 12, 2020

May 12, 2020

Dear All, Please find attached the summary of the Latin GP meeting on 7 May 2020. Please let us know if you would like to suggest any edits or additions. Action Items: S. No.Action ItemsOwner 1Investigate the usage of relevant locale for Small Latin Letter I (i, 0069) and Small Latin Letter Dotless I (ı, 0131)DT 2Propose the pending items for the next meeting agenda PK The recording for this meeting and the note are posted at Latin GP wiki page at https://community.icann.org/display/croscomlgrprocedure/Latin+GP. Regards, Pitinan

1 0

Add'l research on dotless I issue
by Tan Tanaka, Dennis May 11, 2020

May 11, 2020

All, Here is the additional research on the stability issue with case folding of dotless I and dotted I https://docs.google.com/document/d/11aads2CRTE7C_gpSejL8ltW2Z8fMNZXT2cWdT0y… Happy to discuss during our next call. Best, Dennis

1 0

AGENDA for the GP call on May the 7th 2020, 16:00UTC
by Mirjana Tasić May 8, 2020

May 8, 2020

Dear GP members, Please find enclosed the proposal of Agenda for our call. Any comments, corrections and suggestions are welcome. Regards Mirjana _______________________________________________________________________________________________ AGENDA for the GP call on May the 7th 2020, 16:00UTC 1. Latin Small Letter I () and Latin Small Letter Dotless I () the analysis is already documented, it only requires the final conclusion. Please find the relevant section in the proposal below. 2. Next tasks- discuss the priorities: * Thorough reading and finalization of In-Script variant analysis * Review proposed confusables for Similarity review * Report finalization * XML production and correction of possible errors * Test data production 3. Next meeting May the 14th 2020, 16:00 UTC 4. AOB D.5.2 Latin Small Letter Dotless I (ı) 0131 There are four Latin code points that have a special case (upper case/lower case) relationship: · U+0069 Latin Small Letter I ("i") · U+0049 Latin Capital Letter I ("I") · U+0131 Latin Small Letter Dotless I ("ı") · U+0130 Latin Capital Letter I with Dot Above ("İ") In most locales Latin Small Letter I is lower case of Latin Capital Letter I, and reverse Latin Capital Letter I (U+0069) is upper case of Latin Small Letter I (U+0069). In those locales, Latin Capital Letter I (U+0049) is also upper case of Latin Small Letter Dotless I. It could be described as in the following chart: Table D.8. Case Relationships for 0069, 0049, 0130, and 0131 Character Process Resulting Character Process Resulting Character Latin Small Letter I U+0069 up case → Latin Capital Letter I U+0049 down case → Latin Small Letter I U+0069 Latin Small Letter Dotless I U+0131 up case → Latin Capital Letter I U+0049 down case → Latin Small Letter I U+0069 Latin Capital Letter I with Dot Above U+0130 down case → Latin Small Letter I U+0069 up case → Latin Capital Letter I U+0049 In two locales, Turkish and Azeri, respectively, the case relationship is different. In those two, Latin Small Letter I and Latin Capital Letter I with Dot Above are in mutual upcase/downcase relationship to each other, as well as Latin Small Letter Dotless I and Latin Capital Letter I, which could be described as in the following chart: Table D.9. Case Relationships in Turkish and Azeri Locales Character Process Resulting Character Process Resulting Character Latin Small Letter I up case → Latin Capital Letter I with Dot Above down case → Latin Small Letter I Latin Small Letter Dotless I up case → Latin Capital Letter I down case → Latin Small Letter Dotless I If we look at the repertoire of Latin code points for the root zone, as proposed by the Latin Generation Panel, Latin Small Letter I and Latin Small Letter Dotless I are included, whereas the capital letters are excluded. Capital letters are not even valid in IDNA2008, so the question is, is the case relationship described here a problem or even relevant? Before IDNA2008, there was IDNA2003. Even though IDNA2003 has been replaced by IDNA2008 it is still implemented. For example, the web browser Google Chrome to date remains IDNA2003 compliant but not fully IDNA 2008 compliant. In IDNA2003 there is a pre-process, normalization, of domain names before conversion to Punycode. That normalization includes down casing of Latin characters. For ASCII labels there is already an equivalence between upper case and lower case letters. And this is what users, based on decades of experience, expect to happen. In an IDNA2003-compliant web browser it is expected that "EXÄMPEL" and "EXAMPLE" are equivalent to "exämpel" and "example", respectively. In an IDNA2008 browser "EXAMPLE" must be accepted, but "EXÄMPLE" could be rejected since "Ä" is not valid, but that is not how e.g. Mozilla Firefox and Apple Safari have been designed to handle the problem. They also do down case before the formal IDNA2008 process. Even though down casing is not part of the formal IDNA2008 process, one of the IDNA2008 documents, RFC 5894, states that the user interface of an application, before IDNA2008 processing, can do normalization. The down casing in IDNA2008 browsers should probably be seen in that light. While "TÄT" will probably be down cased to "tät" in the browser, what should the browser do with "TIT"? Depending on the locale that the browser is running in, it may be down cased to either "tit" or "tıt" (with or without the dot). The casing, in an application, is expected to go in one direction, from upper case to lower case. When domain names are presented in text, however, it is common that domain names are presented in upper or mixed case. So "ice" might become "Ice" or "İce". It is quite obvious from the text above that case shift of dotted or dotless I could create erroneous lookup, but the question is how large threat it would be to the users. Since the applications are expected to go from upper case to lower case, when they handle domain names, we should consider a situation where down casing could result in different lower case letters, i.e. when CAPITAL LETTER I is down cased. With a non-Turkish and non-Azeri locale, a Latin CAPITAL LETTER I in a domain name is either down cased to Latin Small Letter I (IDN label) or equivalent to Latin Small Letter I (ASCII label). With a Turkish or Azeri locale, a Latin Capital Letter I is expected to be down cased to Latin Small Letter Dotless I, but in an ASCII label in a domain name, it is still expected to be equivalent with Latin Small Letter I, because that is what the DNS standards says. There is an obvious risk that, in a Turkish or Azeri locale that the two letters are confused or mistreated due to the case folding, and this confusion could be misused. To be on the safe side Latin Small Letter I and Latin Small Letter Dotless I should be variants. Accordingly, the following variant set could be the optimal solution: Table D.10. Possible Variant Relationships for 0069 and 0131 Group Dotless i vs. i Target Source Variant Candidate [Yes/No] Disposition [Allocatable/ Blocked] Rationale Code Point Glyph Name Code Point Glyph Name 0069 i Latin Small Letter I 0131 ı Latin Small Letter Dotless I YES Blocked Risk of confusion due to inconsistent case folding 0131 ı Latin Small Letter Dotless I 0069 i Latin Small Letter I YES Blocked Risk of confusion due to inconsistent case folding

2 1

Re: [Latingp] Microsoft account
by Mats Dufberg May 7, 2020

May 7, 2020

My outlook/microsoft account is mats.dufberg(a)outlook.com<mailto:mats.dufberg@outlook.com> Mats --- Mats Dufberg mats.dufberg(a)internetstiftelsen.se Technical Expert Internetstiftelsen (The Swedish Internet Foundation) Mobile: +46 73 065 3899 https://internetstiftelsen.se/ From: Latingp <latingp-bounces(a)icann.org> on behalf of Mirjana Tasić <Mirjana.Tasic(a)rnids.rs> Date: Monday, 4 May 2020 at 12:13 To: ICANN Latin GP <latingp(a)icann.org> Subject: [Latingp] Microsoft account Dear Pitinan, I shall use following account Mirjana.Tasic(a)rnids.rs<mailto:Mirjana.Tasic@rnids.rs> Regards Mirjana

1 0

Latin GP Meeting
by Pitinan Kooarmornpatana May 7, 2020

May 7, 2020

Dear Latin GP members, This is a gentle reminder for the Latin GP meeting on 7 May 2020, 16.00UTC. Please join the meeting at the updated link: https://icann.zoom.us/j/95395839226?pwd=ZGxpYmRqV2E2NjloWTFHZWNGbkJWZz09 Password: 096517 -------------- To join via phone, please find your local number at https://icann.zoom.us/u/aw6COk4xD Meeting ID: 953 9583 9226 Password: 096517 Please find the Agenda and the note from the chair below. Regards, Pitinan Dear GP members, Please find enclosed the proposal of Agenda for our call. Any comments, corrections and suggestions are welcome. Regards Mirjana _______________________________________________________________________________________________ AGENDA for the GP call on May the 7th 2020, 16:00UTC 1. Latin Small Letter I () and Latin Small Letter Dotless I () the analysis is already documented, it only requires the final conclusion. Please find the relevant section in the proposal below. 2. Next tasks- discuss the priorities: * Thorough reading and finalization of In-Script variant analysis * Review proposed confusables for Similarity review * Report finalization * XML production and correction of possible errors * Test data production 3. Next meeting May the 14th 2020, 16:00 UTC 4. AOB D.5.2 Latin Small Letter Dotless I (Äą) 0131 There are four Latin code points that have a special case (upper case/lower case) relationship: Âˇ U+0069 Latin Small Letter I ("i") Âˇ U+0049 Latin Capital Letter I ("I") Âˇ U+0131 Latin Small Letter Dotless I ("Äą") Âˇ U+0130 Latin Capital Letter I with Dot Above ("Ä°") In most locales Latin Small Letter I is lower case of Latin Capital Letter I, and reverse Latin Capital Letter I (U+0069) is upper case of Latin Small Letter I (U+0069). In those locales, Latin Capital Letter I (U+0049) is also upper case of Latin Small Letter Dotless I. It could be described as in the following chart: Table D.8. Case Relationships for 0069, 0049, 0130, and 0131 Character Process Resulting Character Process Resulting Character Latin Small Letter I U+0069 up case â†’ Latin Capital Letter I U+0049 down case â†’ Latin Small Letter I U+0069 Latin Small Letter Dotless I U+0131 up case â†’ Latin Capital Letter I U+0049 down case â†’ Latin Small Letter I U+0069 Latin Capital Letter I with Dot Above U+0130 down case â†’ Latin Small Letter I U+0069 up case â†’ Latin Capital Letter I U+0049 In two locales, Turkish and Azeri, respectively, the case relationship is different. In those two, Latin Small Letter I and Latin Capital Letter I with Dot Above are in mutual upcase/downcase relationship to each other, as well as Latin Small Letter Dotless I and Latin Capital Letter I, which could be described as in the following chart: Table D.9. Case Relationships in Turkish and Azeri Locales Character Process Resulting Character Process Resulting Character Latin Small Letter I up case â†’ Latin Capital Letter I with Dot Above down case â†’ Latin Small Letter I Latin Small Letter Dotless I up case â†’ Latin Capital Letter I down case â†’ Latin Small Letter Dotless I If we look at the repertoire of Latin code points for the root zone, as proposed by the Latin Generation Panel, Latin Small Letter I and Latin Small Letter Dotless I are included, whereas the capital letters are excluded. Capital letters are not even valid in IDNA2008, so the question is, is the case relationship described here a problem or even relevant? Before IDNA2008, there was IDNA2003. Even though IDNA2003 has been replaced by IDNA2008 it is still implemented. For example, the web browser Google Chrome to date remains IDNA2003 compliant but not fully IDNA 2008 compliant. In IDNA2003 there is a pre-process, normalization, of domain names before conversion to Punycode. That normalization includes down casing of Latin characters. For ASCII labels there is already an equivalence between upper case and lower case letters. And this is what users, based on decades of experience, expect to happen. In an IDNA2003-compliant web browser it is expected that "EXĂ„MPEL" and "EXAMPLE" are equivalent to "exĂ¤mpel" and "example", respectively. In an IDNA2008 browser "EXAMPLE" must be accepted, but "EXĂ„MPLE" could be rejected since "Ă„" is not valid, but that is not how e.g. Mozilla Firefox and Apple Safari have been designed to handle the problem. They also do down case before the formal IDNA2008 process. Even though down casing is not part of the formal IDNA2008 process, one of the IDNA2008 documents, RFC 5894, states that the user interface of an application, before IDNA2008 processing, can do normalization. The down casing in IDNA2008 browsers should probably be seen in that light. While "TĂ„T" will probably be down cased to "tĂ¤t" in the browser, what should the browser do with "TIT"? Depending on the locale that the browser is running in, it may be down cased to either "tit" or "tÄąt" (with or without the dot). The casing, in an application, is expected to go in one direction, from upper case to lower case. When domain names are presented in text, however, it is common that domain names are presented in upper or mixed case. So "ice" might become "Ice" or "Ä°ce". It is quite obvious from the text above that case shift of dotted or dotless I could create erroneous lookup, but the question is how large threat it would be to the users. Since the applications are expected to go from upper case to lower case, when they handle domain names, we should consider a situation where down casing could result in different lower case letters, i.e. when CAPITAL LETTER I is down cased. With a non-Turkish and non-Azeri locale, a Latin CAPITAL LETTER I in a domain name is either down cased to Latin Small Letter I (IDN label) or equivalent to Latin Small Letter I (ASCII label). With a Turkish or Azeri locale, a Latin Capital Letter I is expected to be down cased to Latin Small Letter Dotless I, but in an ASCII label in a domain name, it is still expected to be equivalent with Latin Small Letter I, because that is what the DNS standards says. There is an obvious risk that, in a Turkish or Azeri locale that the two letters are confused or mistreated due to the case folding, and this confusion could be misused. To be on the safe side Latin Small Letter I and Latin Small Letter Dotless I should be variants. Accordingly, the following variant set could be the optimal solution: Table D.10. Possible Variant Relationships for 0069 and 0131 Group Dotless i vs. i Target Source Variant Candidate [Yes/No] Disposition [Allocatable/ Blocked] Rationale Code Point Glyph Name Code Point Glyph Name 0069 i Latin Small Letter I 0131 Äą Latin Small Letter Dotless I YES Blocked Risk of confusion due to inconsistent case folding 0131 Äą Latin Small Letter Dotless I 0069 i Latin Small Letter I YES Blocked Risk of confusion due to inconsistent case folding

1 0

Microsoft account
by Mirjana Tasić May 4, 2020

May 4, 2020

Dear Pitinan, I shall use following account Mirjana.Tasic(a)rnids.rs<mailto:Mirjana.Tasic@rnids.rs> Regards Mirjana

1 0

Summary of Latin GP Meeting on 30 April 2020
by Pitinan Kooarmornpatana May 4, 2020

May 4, 2020

Dear All, Please find attached the summary of the Latin GP meeting on 30 April 2020. Please let us know if you would like to suggest any edits or additions. Action Items: S. No.Action ItemsOwner 1Incorporate the conclusion from the Generic Glyph data sheet into the proposal BJ 2Share a Microsoft account to Pitinan to manage the access rights ALL The recording for this meeting and the note are posted at Latin GP wiki page at https://community.icann.org/display/croscomlgrprocedure/Latin+GP. Regards, Pitinan §

1 0