The behavior in your material in case 4 and 8 is different from what Dennis has seen (and I have seen). The dot-above mark is missing.
I do not think it is expected to get small dotless i from roundfolding in a non-Turkish locale. I do not think it is correct to say that CAPITAL LETTER I WITH
DOT ABOVE is absent in en_US.UTF-8.
Mats
---
Mats Dufberg
mats.dufberg@internetstiftelsen.se
Technical Expert
Internetstiftelsen (The Swedish Internet Foundation)
Mobile: +46 73 065 3899
https://internetstiftelsen.se/
From: Pitinan Kooarmornpatana <pitinan.koo@icann.org>
Date: Friday, 15 May 2020 at 22:36
To: Mats Dufberg <mats.dufberg@internetstiftelsen.se>, "Tan Tanaka, Dennis" <dtantanaka@verisign.com>, ICANN Latin GP <Latingp@icann.org>
Subject: Re: [Latingp] Handling of down casing of I-dotted in Turkish locale
Dear all,
Please find attached test case and the results.
The test result for Turkish locale was done by someone in Turkey who use Turkish locale. The result for case folding seems to be stable both up and down. The test result for en_US was done by me and it cannot produce original dotless I
after a round-folding. I understand that this due to the absence of CAPITAL LETTER I WITH DOT ABOVE in en-US locale.
Kindly let us know if there is any further queries.
Regards,
Pitinan
From: Latingp <latingp-bounces@icann.org> on behalf of Mats Dufberg <mats.dufberg@internetstiftelsen.se>
Date: Friday, May 15, 2020 at 05:13
To: "Tan Tanaka, Dennis" <dtantanaka@verisign.com>, "Latingp@icann.org" <Latingp@icann.org>
Subject: Re: [Latingp] Handling of down casing of I-dotted in Turkish locale
> If I understand this correctly, when we test the IDNA behavior of the uppercase string in a Turkish setting we
should see the behavior described in the second part.
That is my interpretation too. I will try to test some generic tool with Turkish locale (tr_TR.UTF-8) set.
Mats
---
Mats Dufberg
mats.dufberg@internetstiftelsen.se
Technical Expert
Internetstiftelsen (The Swedish Internet Foundation)
Mobile: +46 73 065 3899
https://internetstiftelsen.se/
From: "Tan Tanaka, Dennis" <dtantanaka@verisign.com>
Date: Thursday, 14 May 2020 at 23:19
To: Mats Dufberg <mats.dufberg@internetstiftelsen.se>, ICANN Latin GP <Latingp@icann.org>
Subject: Re: [Latingp] Handling of down casing of I-dotted in Turkish locale
There are two set of rules, one for non-Turkish
# Preserve canonical equivalence for I with dot. Turkic is handled below.
0130;
0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH DOT ABOVE
And another for Turkish
# Turkish and Azeri
# I and i-dotless; I-dot and i are case pairs in Turkish and Azeri
# The following rules handle those cases.
0130; 0069; 0130; 0130; tr; # LATIN CAPITAL LETTER I WITH DOT ABOVE
0130; 0069; 0130; 0130; az; # LATIN CAPITAL LETTER I WITH DOT ABOVE
# When lowercasing, remove dot_above in the sequence I + dot_above, which will turn into i.
# This matches the behavior of the canonically equivalent I-dot_above
0307; ; 0307; 0307; tr After_I; # COMBINING DOT ABOVE
0307; ; 0307; 0307; az After_I; # COMBINING DOT ABOVE
# When lowercasing, unless an I is before a dot_above, it turns into a dotless i.
0049; 0131; 0049; 0049; tr Not_Before_Dot; # LATIN CAPITAL LETTER I
0049; 0131; 0049; 0049; az Not_Before_Dot; # LATIN CAPITAL LETTER I
# When uppercasing, i turns into a dotted capital I
0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I
0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I
# Note: the following case is already in the UnicodeData.txt file.
# 0131; 0131; 0049; 0049; tr; # LATIN SMALL LETTER DOTLESS I
If I understand this correctly, when we test the IDNA behavior of the uppercase string in a Turkish setting we
should see the behavior described in the second part.
-Dennis
From:
Latingp <latingp-bounces@icann.org> on behalf of Mats Dufberg <mats.dufberg@internetstiftelsen.se>
Date: Thursday, May 14, 2020 at 4:27 PM
To: ICANN Latin GP <Latingp@icann.org>
Subject: [EXTERNAL] [Latingp] Handling of down casing of I-dotted in Turkish locale
The link below goes to the file in the Unicode database that handles the special casing rules for i etc in Turkish and Azerian (sp?). The relevant section is the last section
of the file.
https://unicode.org/Public/UNIDATA/SpecialCasing.txt
[secure-web.cisco.com]
---
Mats Dufberg
mats.dufberg@internetstiftelsen.se
Technical Expert
Internetstiftelsen (The Swedish Internet Foundation)
Mobile: +46 73 065 3899
https://internetstiftelsen.se/