FW: [Ext] Re: Re: Handling of down casing of I-dotted in Turkish locale
Dear All, Please find the test results tested by Mert Saka, our colleague in ICANN Istanbul office. He also asked me to extend his sincere thanks to everyone on Latin GP for enabling the Turkish IDNs. Regards, Pitinan From: Mert Saka <mert.saka@icann.org> Date: Wednesday, May 20, 2020 at 15:48 To: Pitinan Kooarmornpatana <pitinan.koo@icann.org> Cc: Sarmad Hussain <sarmad.hussain@icann.org> Subject: Re: [Ext] Re: Re: [Latingp] Handling of down casing of I-dotted in Turkish locale Hi Pitinan, Yesterday was a public holiday in Turkey, so I had plenty of time in my hands to test. I resulted testing on 4 different devices, but with only two browsers (Chrome and Firefox). More details about my testing environment (the device details) are below, hoping they help the team evaluate the results better: ICANN’s MacBook Pro on MacOS Mojave (10.14.6) Locale en_US: Only Chrome, no Firefox. Safari was not possible to test because it automatically forwards me to the default search engine for any unrecognized domain names (including base test case, “zil.test” and I couldn’t find how to turn it off). Personal MacBook Air on MacOS Mojave (10.14.6) Locale en_TR: Chrome and Firefox available but couldn’t test Safari (same reason above). Personal MSI GE73VR 8RF-210XTR Laptop on Windows 10 Locale tr; Türkçe : Chrome installed, no Firefox or Safari. Couldn’t test on Edge (same reason as Safari on MacOS above). ICANN’s iPhone on iOS (13.4.1): Chrome and Safari available. No other browsers. Here are some highlights from the test results: In general, results were consistent across devices for IDNA conversion on https://www.punycoder.com/. Even on mobile devices, it worked the consistently and delivered punycode for each word successfully. Chrome on all my computers worked well, except for its different display behavior for “zıl.test” and “ZİL.test”. You can see them below on the images I added inline but to summarize: “zıl.test” (DOTLESS LOWERCASE I) is displayed on the address bar as-is, but “ZIL.test” is displayed as punycode (xn--zil-9dc.test) on the address bar. Both are copy/pasted as punycode, so I see no odd browser behavior, but thought it would be something to take into consideration about consistency with display. Firefox on my personal MacBook has a problem with “ZİL.test” (with CAPITAL DOTTED I). “İ” turns into a LOWERCASE I WITH TWO DOTS ON TOP. I thought this was something weird, so Firefox seems to fail the browser behavior testing when using UPPERCASE I WITH DOT ON TOP. Safari on my company iPhone for “ZİL.test” (with CAPITAL I WITH DOT ON TOP) is displayed as-is (“ZİL.test”) on browser address bar but when copy/pasted on to the iPhone Notes app, it is displayed as a LOWERCASE I WITH TWO DOTS ON TOP. You can find the result attached as screenshot. I elaborate more on this in the below note about the testing methodology. A note about the testing methodology: When we “copy” text, we actually use another application (the Operating System, a.k.a. OS), which may have an impact on the test results. Taking a screenshot would be a better option because when copying, we do not know how OS stores it in memory (as punycode or ISO or UTF or else). When “pasting” on an application (in my case Outlook Mail or iOS Mail or iOS Notes app), results may not be consistent (examples below also point to this difference between copy/pasting and screenshots). You can see the result when I copy/pasted it onto the iOS Notes app on my iPhone in the attached screenshot. To get rid of copy/paste issues into Outlook Mail on my Mac, I also attached a UTF-8 coded plain-text TXT file (Test Results.txt), which can help to observe results as they actually appear on my devices. Now, the test result details: 1. ICANN’s MacBook Pro on MacOS Mojave (10.14.6) IDNA conversión https://www.punycoder.com/ User InputCopy and Paste into “Text” box, then “Convert to Punycode”zilzılZİLZIL Output Register output string in “Punycode” boxzilxn--zl-hpa xn--zil-9dc ZIL Browser Behavior (Chrome) User InputCopy and Paste in URL address Barzil.test zıl.test*ZİL.testZIL.test* Output URL BarRegister returned label in address barhttp://zil.test/http://xn--zl-hpa.test/http://xn--zil-9dc.test/ http://zil.test/ * Below are the images that show the different display behavior of Chrome for the two Turkish characters: “zıl.test” (LOWERCASE I WITHOUT DOT) “ZİL.test” (CAPITAL I WITH DOT ON TOP) 2. Personal MacBook Air on MacOS Mojave (10.14.6) IDNA conversión https://www.punycoder.com/ User InputCopy and Paste into “Text” box, then “Convert to Punycode”zilzılZİLZIL Output Register output string in “Punycode” boxzilxn--zl-hpa xn--zil-9dc ZIL Browser Behavior (Chrome) User InputCopy and Paste in URL address Barzil.test zıl.test*ZİL.test*ZIL.test Output URL BarRegister returned label in address barhttp://zil.test/http://xn--zl-hpa.test/http://xn--zil-9dc.test/ http://zil.test/ * Same results as ICANN’s MBPro above. Different display behavior of Chrome for the two Turkish characters were observed as in the above screenshots. Browser Behavior (Firefox) User InputCopy and Paste in URL address Barzil.test zıl.testZİL.test*ZIL.test Output URL BarRegister returned label in address barhttp://www.zil.test/http://www.zıl.test/http://www.zi̇l.test/ http://www.zil.test/ * Pleaase note that the result may not be displayed above (due to Outlook Email) properly. Please refer to the results on the image below and the attached TXT file. “ZİL.test” (CAPITAL I WITH DOT ON TOP) 3. Personal MSI GE73VR 8RF-210XTR Laptop on Windows 10 IDNA conversión https://www.punycoder.com/ User InputCopy and Paste into “Text” box, then “Convert to Punycode”zilzılZİLZIL Output Register output string in “Punycode” boxzilxn--zl-hpa xn--zil-9dc ZIL Browser Behavior (Chrome) User InputCopy and Paste in URL address Barzil.test zıl.test*ZİL.test*ZIL.test Output URL BarRegister returned label in address barhttp://zil.test/http://xn--zl-hpa.test/http://xn--zil-9dc.test/http://zil.te... * Same results as ICANN’s MBPro above. Different display behavior of Chrome for the two Turkish characters were observed as in the above screenshots. 4. ICANN’s iPhone on iOS (13.4.1) IDNA conversión https://www.punycoder.com/ User InputCopy and Paste into “Text” box, then “Convert to Punycode”zilzılZİLZIL Output Register output string in “Punycode” boxzilxn--zl-hpa xn--zil-9dc ZIL Browser Behavior (Chrome) User InputCopy and Paste in URL address Barzil.test zıl.testZİL.testZIL.test Output URL BarRegister returned label in address barhttp://zil.test/http://xn--zl-hpa.test/http://xn--zil-9dc.test/http://zil.te... * Same results as ICANN’s MBPro above. Different display behavior of Chrome for the two Turkish characters were observed as seen in the screenshots below: “zıl.test” (LOWERCASE I WITHOUT DOT) “ZİL.test” (CAPITAL I WITH DOT ON TOP) Browser Behavior (Safari) User InputCopy and Paste in URL address Barzil.test zıl.testZİL.test*ZIL.test Output URL BarRegister returned label in address barhttp://zil.test/http://zıl.testhttp://zi̇l.testhttp://zil.test/ * Pleaase note that the result above may not be displayed (due to Outlook Email) properly. Please refer to the results on the image below and the attached TXT file. “ZİL.test” (CAPITAL I WITH DOT ON TOP) I hope these will be helpful to evaluate the browsers better and inline images are displayed on your computer properly. If not, please let me know and I can send them in a different format. I wish you a happy COVID-free day… Best regards, Mert Saka gTLD Accounts Manager ICANN – www.icann.org From: Pitinan Kooarmornpatana <pitinan.koo@icann.org> Date: Monday, May 18, 2020 at 22:04 To: Mert Saka <mert.saka@icann.org> Cc: Sarmad Hussain <sarmad.hussain@icann.org> Subject: FW: [Ext] Re: Re: [Latingp] Handling of down casing of I-dotted in Turkish locale Hi Mert, Trust this email finds you well. The Latin GP is testing some behavior with the Turkish locale. I understand that ICANN laptop would have setting of US Locale (en_US). So I’m wondering if you could find the Turkish local computer to conduct the test below? Please feel free to ping me on slack if you have any questions. Regards, Pitinan From: "Tan Tanaka, Dennis" <dtantanaka@verisign.com> Date: Saturday, May 16, 2020 at 03:59 To: Pitinan Kooarmornpatana <pitinan.koo@icann.org>, "mats.dufberg@internetstiftelsen.se" <mats.dufberg@internetstiftelsen.se>, "Latingp@icann.org" <Latingp@icann.org> Subject: [Ext] Re: Re: [Latingp] Handling of down casing of I-dotted in Turkish locale Thanks Pitinan. We would need to determine the IDNA behavior in a conversion tool and the browser. To that effect, could you ask your colleague to try the below examples. (He/She needs to copy and paste the Input strings into the tool and URL address bars, and register the output string for each test). @Mats Dufberg Provided their machines are set up with Turkish locale, are we missing any other test case? IDNA conversion https://www.punycoder.com/ User InputCopy and Paste into “Text” box, then “Convert to Punycode”zilzılZİLZIL Output Register output string in “Punycode” box Browser Behavior (repeat for Chrome, Firefox and Safari/Edge) User InputCopy and Paste in URL address Barzil.test zıl.testZİL.testZIL.test Output URL BarRegister returned label in address bar Thanks, Dennis From: Pitinan Kooarmornpatana <pitinan.koo@icann.org> Date: Friday, May 15, 2020 at 4:36 PM To: Mats Dufberg <mats.dufberg@internetstiftelsen.se>, Dennis Tan Tanaka <dtantanaka@verisign.com>, "Latingp@icann.org" <Latingp@icann.org> Subject: [EXTERNAL] Re: [Latingp] Handling of down casing of I-dotted in Turkish locale Dear all, Please find attached test case and the results. 1. test case file (test case.xlsx) 2. test result using Turkish locale (testcase-turkishlocale.xlsx) 3. test result using en_US local (test case-en_US-locale.xlsx) The test result for Turkish locale was done by someone in Turkey who use Turkish locale. The result for case folding seems to be stable both up and down. The test result for en_US was done by me and it cannot produce original dotless I after a round-folding. I understand that this due to the absence of CAPITAL LETTER I WITH DOT ABOVE in en-US locale. Kindly let us know if there is any further queries. Regards, Pitinan From: Latingp <latingp-bounces@icann.org> on behalf of Mats Dufberg <mats.dufberg@internetstiftelsen.se> Date: Friday, May 15, 2020 at 05:13 To: "Tan Tanaka, Dennis" <dtantanaka@verisign.com>, "Latingp@icann.org" <Latingp@icann.org> Subject: Re: [Latingp] Handling of down casing of I-dotted in Turkish locale
If I understand this correctly, when we test the IDNA behavior of the uppercase string in a Turkish setting we should see the behavior described in the second part.
That is my interpretation too. I will try to test some generic tool with Turkish locale (tr_TR.UTF-8) set. Mats --- Mats Dufberg mats.dufberg@internetstiftelsen.se Technical Expert Internetstiftelsen (The Swedish Internet Foundation) Mobile: +46 73 065 3899 https://internetstiftelsen.se/ From: "Tan Tanaka, Dennis" <dtantanaka@verisign.com> Date: Thursday, 14 May 2020 at 23:19 To: Mats Dufberg <mats.dufberg@internetstiftelsen.se>, ICANN Latin GP <Latingp@icann.org> Subject: Re: [Latingp] Handling of down casing of I-dotted in Turkish locale There are two set of rules, one for non-Turkish # Preserve canonical equivalence for I with dot. Turkic is handled below. 0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH DOT ABOVE And another for Turkish # Turkish and Azeri # I and i-dotless; I-dot and i are case pairs in Turkish and Azeri # The following rules handle those cases. 0130; 0069; 0130; 0130; tr; # LATIN CAPITAL LETTER I WITH DOT ABOVE 0130; 0069; 0130; 0130; az; # LATIN CAPITAL LETTER I WITH DOT ABOVE # When lowercasing, remove dot_above in the sequence I + dot_above, which will turn into i. # This matches the behavior of the canonically equivalent I-dot_above 0307; ; 0307; 0307; tr After_I; # COMBINING DOT ABOVE 0307; ; 0307; 0307; az After_I; # COMBINING DOT ABOVE # When lowercasing, unless an I is before a dot_above, it turns into a dotless i. 0049; 0131; 0049; 0049; tr Not_Before_Dot; # LATIN CAPITAL LETTER I 0049; 0131; 0049; 0049; az Not_Before_Dot; # LATIN CAPITAL LETTER I # When uppercasing, i turns into a dotted capital I 0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I 0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I # Note: the following case is already in the UnicodeData.txt file. # 0131; 0131; 0049; 0049; tr; # LATIN SMALL LETTER DOTLESS I If I understand this correctly, when we test the IDNA behavior of the uppercase string in a Turkish setting we should see the behavior described in the second part. -Dennis From: Latingp <latingp-bounces@icann.org> on behalf of Mats Dufberg <mats.dufberg@internetstiftelsen.se> Date: Thursday, May 14, 2020 at 4:27 PM To: ICANN Latin GP <Latingp@icann.org> Subject: [EXTERNAL] [Latingp] Handling of down casing of I-dotted in Turkish locale The link below goes to the file in the Unicode database that handles the special casing rules for i etc in Turkish and Azerian (sp?). The relevant section is the last section of the file. https://unicode.org/Public/UNIDATA/SpecialCasing.txt [secure-web.cisco.com] --- Mats Dufberg mats.dufberg@internetstiftelsen.se Technical Expert Internetstiftelsen (The Swedish Internet Foundation) Mobile: +46 73 065 3899 https://internetstiftelsen.se/
participants (1)
-
Pitinan Kooarmornpatana