Subject: [EXTERNAL] [Latingp] Principles for repertoire and variants -
 Feedback from
 IP
IronPort-PHdr: =?us-ascii?q?9a23=3A2x46UxR5HuB2vi6entYrOlE/c9psv+yvbD5Q0YIu?=
 =?us-ascii?q?jvd0So/mwa6zZh2N2/xhgRfzUJnB7Loc0qyN4vCmATRIyK3CmUhKSIZLWR4BhJ?=
 =?us-ascii?q?detC0bK+nBN3fGKuX3ZTcxBsVIWQwt1Xi6NU9IBJS2PAWK8TW94jEIBxrwKxd+?=
 =?us-ascii?q?KPjrFY7OlcS30P2594HObwlSijewZbB/IA+qoQnNq8IbnZZsJqEtxxXTv3BGYf?=
 =?us-ascii?q?5WxWRmJVKSmxbz+MK994N9/ipTpvws6ddOXb31cKokQ7NYCi8mM30u683wqRbD?=
 =?us-ascii?q?VwqP6WACXWgQjxFFHhLK7BD+Xpf2ryv6qu9w0zSUMMHqUbw5Xymp4rx1QxH0li?=
 =?us-ascii?q?gIKz858HnWisNuiqJbvAmhrAF7z4LNfY2ZKOZycqbbcNgHR2ROQ9xRWjRBDI2i?=
 =?us-ascii?q?c4sBAekPPedXoIfgv1sDrxmwCAaxCO7h1jNHmmT20rMh3uQkDQ3KwBYtE84SvH?=
 =?us-ascii?q?nWqtj+KaccUfqyzKnN1TjNculZ2S346IfSbx8qvPOCXa9rccrK00YvFgXFhUiX?=
 =?us-ascii?q?pIz+JTyVzOENvHKG4OZ6VeKvlnUnqxprrTiuwMchkYjJiZ4PxVDC8yV12oE1Jc?=
 =?us-ascii?q?e3SENiZ9OvDZhetzmCOodrRs4uXXtktDogxrAJuJO3ZisHxIk/yxLCb/GLbZKE?=
 =?us-ascii?q?7g75WOueIzp0nmxpdbywihqq8EWtxffwW8u33VpQoSdJjsPAum4Q2xHV98OJUO?=
 =?us-ascii?q?Fy/l271jaKzw3T6v9LIUQzlafDMJ4hx6IwloIPvUjeBCP2mVn5g7WQdkUi4OSo?=
 =?us-ascii?q?7P7nYrr+qp+dMY97lB3+P7wzlsG7H+g0KBQCU3KU9Om9zrHu/lD1TK1PjvIsk6?=
 =?us-ascii?q?nZtJ7aJd4cpq68GwJV1pws6wq+Dzeg39QYhWALI0lCeBKaiYjmJ0/BIOvjAPe+?=
 =?us-ascii?q?n1ujijFrx/bcMr3mGJXNIWDPkK39crZl905c1A0zwMhC6JJIEL4BJu7zVVX3tN?=
 =?us-ascii?q?PCDR82KRe5w/j5B9Vn14MeQmOPAqCfMK/IrVCI4ecvcKGxYpQIsmP9N+Q9/Kyp?=
 =?us-ascii?q?yno8314WdK+vm5wNZze9F/ViJkyfJn30ntYGF3xNugMiTerlkA6/VyVObSO3Vq?=
 =?us-ascii?q?M4+jZpDIu6DIHMXsW3jaWc0TynNpxdb2ZATFeWHiDzao+GVvwQPT+UOdJriTce?=
 =?us-ascii?q?VLKsGLMmgDWnqA78g5VfZr7Q8zEZs537/Nl+7uvV0xYo+monId6a1jS2Rntxl2?=
 =?us-ascii?q?9AZyItx6l+pwQp0V6Z1KF0xfxFDsda5vVhUwwzM5iayPZ1XYOhEjndd8uEHQ71?=
 =?us-ascii?q?Cu6tBis8G5dom4cD?=
X-CrossPremisesHeadersFilteredBySendConnector: 
 PMBX112-W1-CA-2.pexch112.icann.org
Received: from PMBX112-W1-CA-1.pexch112.icann.org (64.78.40.21) by
	PMBX112-W1-CA-2.pexch112.icann.org (64.78.40.23) with Microsoft SMTP	Server
 (TLS) id 15.0.1178.4; Wed, 18 Oct 2017 02:13:36 -0700
Received: from PMBX112-W1-CA-1.pexch112.icann.org ([64.78.40.21]) by
	PMBX112-W1-CA-1.PEXCH112.ICANN.ORG ([64.78.40.21]) with mapi id
	15.00.1178.000; Wed, 18 Oct 2017 02:13:36 -0700
From: Sarmad Hussain <sarmad.hussain@icann.org>
To: Latin GP <latingp@icann.org>
Thread-Topic: Principles for repertoire and variants - Feedback from IP
Thread-Index: AdNH8UZsQjIJcXvvSWmce8OQNlG5HA==
Date: Wed, 18 Oct 2017 09:13:35 +0000
Message-ID: 
 <9e049feb2efd41f99315564ad084a551@PMBX112-W1-CA-1.PEXCH112.ICANN.ORG>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [192.0.47.234]
X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.5.16
	(pechora2.lax.icann.org [192.0.33.72]);
	Wed, 18 Oct 2017 09:13:38 +0000 (UTC)
X-OrganizationHeadersPreserved: PMBX112-W1-CA-2.pexch112.icann.org
X-BeenThere: latingp@icann.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Communication platform - Label Generation Rules <latingp.icann.org>
List-Unsubscribe: <https://mm.icann.org/mailman/options/latingp>,
	<mailto:latingp-request@icann.org?subject=unsubscribe>
List-Archive: <http://mm.icann.org/pipermail/latingp/>
List-Post: <mailto:latingp@icann.org>
List-Help: <mailto:latingp-request@icann.org?subject=help>
List-Subscribe: <https://mm.icann.org/mailman/listinfo/latingp>,
	<mailto:latingp-request@icann.org?subject=subscribe>
Sender: <latingp-bounces@icann.org>
Errors-To: latingp-bounces@icann.org
Return-Path: latingp-bounces@icann.org
X-MS-Exchange-Organization-AuthSource: brn1wnexcas01.vcorp.ad.vrsn.com
X-MS-Exchange-Organization-AuthAs: Anonymous
Content-Type: multipart/mixed; boundary="B_3597993043_2077910385"
MIME-Version: 1.0

--B_3597993043_2077910385
Content-Type: multipart/alternative; boundary="B_3597993043_1373650010"

--B_3597993043_1373650010
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Dear All,

=20

Please find input from the Integration Panel in response for the call for c=
omments on the principles documents.

=20

Regards,
Sarmad

The Integration Panel (IP) has reviewed "Principles for Inclusion and Exclu=
sion of Code Points in Latin Script for the Root Zone (Latin LGR)e" and has =
the following comments:

The IP congratulates the Latin GP on the formulation of its "Principles for=
 Inclusion and Exclusion of Code Points in Latin Script for the Root Zone (L=
atin LGR)". They appear to cover the important considerations and will likel=
y serve the GP well in arriving  at at list of proposed candidate code point=
s. The IP would like to caution that the final decision on whether to includ=
e or exclude a code point may not be possible by rote application of these (=
or any other set of) principles, and that additional factors  may have to be=
 considered in individual cases.=20

The IP is looking forward to the next stage of the Latin GP's work and to r=
eviewing actual examples of draft code points.

Additional notes:

The IP would like to note that all entries in an LGR need to be in Unicode =
Normalization Form C (see RFC 7940) and further that IDNA requires NFC, even=
 if it doesn't agree with the native typing order, or conventions regarding =
precomposed, decomposed or  mixed composed usage.  RFC5890 states:  "A "U-la=
bel" is an IDNA-valid string of Unicode characters, in Normalization Form C =
(NFC)". Because entries are normalized dual encoding cannot exist.

In creating the repertoire each combining sequence needs to be individually=
 justified and should be separately enumerated; combining marks should not b=
e individually members of the repertoire.

In applying these principles, attention must be paid to the foundational do=
cuments for this work as summarized in the "Guidelines for Developing Script=
-Specific Label Generation Rules for Integration into the  Root Zone LGR".

Further, the exclusion principles should mention explicitly that the LGR re=
pertoire is constrained by MSR: =C2=AB A code point not in the latest version of=
 the MSR is excluded. If there is a clear need to add one, the GP will conta=
ct the Integration  Panel to assess the possibility of adding one to the MSR=
 =C2=BB.=20

The IP has reviewed "Analysis of Variants in the Latin Script for the Root =
Zone" and has the following comments:

The actual guiding principle (contained in the second paragraph of the docu=
ment) appears to cover the important considerations and will likely serve th=
e GP well in arriving at at list of proposed candidate variants. The IP woul=
d like to caution that the  final decision on whether to include or exclude =
a variant may not be possible by rote application of this (or any other) pri=
nciple, and that additional factors may have to be considered in individual =
cases.=20

The IP is looking forward to the next stage of the Latin GP's work and to r=
eviewing actual examples of draft variants.

Additional notes:

The IP has some concerns about the remainder of the document.

The procedure sets a very narrow limit on the kinds of cases that can be co=
nsidered variants for the Root Zone; this is the basis of the statement by t=
he IP that is quoted in a footnote. It might perhaps be better if this state=
ment were incorporated into  the definition of "scope".

In that section, the opening remark about script mixing seem unconnected to=
 the discussion that follows. A straight listing of which related scripts th=
e GP will consider would be more useful.

The IP  would like to point out that the example given the document of Lati=
n =C3=A8 (U+00E8) and Cyrillic =D1=90 (U+0450) may be moot because the final Cyrilli=
c repertoire does not contain U+0450. In general, it is expected that the an=
alysis of cross-script repertoires  remain limited to code points that are i=
n the respective scripts' LGRs or draft LGRs.

The general discussion of "classes of variants" may be "of interest to the =
reader", but it isn't helpful in understanding which principles the Latin GP=
 will follow in deciding whether something is a variant or not -- most of th=
e items discussed are not applicable  in the context of the Root Zone LGR.=20

In the context of the Root Zone, the Procedure is quite clear in that it co=
nsiders simple similarity of appearance to be outside the scope of the Root =
Zone LGR. In admitting exact homoglyphs, the IP has been making the argument=
 that =E2=80=98e=E2=80=99 in Latin (U+0065)  and =E2=80=98=D0=B5=E2=80=99 in Cyrillic(U+0435) are not ju=
st visually indistinguishable, but that their distinct code points effective=
ly represent a disunification by script property. (A disunification not unli=
ke that of 01DD and 0259, which are disunified based on case, or the  two se=
ts of Arabic digits disunified largely on directional properties).

In the context of other script LGRs for the Root Zone, the IP has argued st=
rongly against embodying rules intended to deal with spelling issues. Theref=
ore, any orthographic variation (spelling differences) would require a very =
compelling case being made; the  examples given may not rise to that level. =
For instance, =E2=80=98ss=E2=80=99 (U+0073 U+0073) and =E2=80=98=C3=9F=E2=80=99 (U+00DF) are separately =
available on the second level, in the .de ccTLD (and presumably others). Thi=
s would strongly argue against the claim that German usage would require  th=
em to be variants - in fact the opposite might be concluded.

Consideration of established practice in existing Latin-based IDNs ought to=
 be an important principle. The procedure makes reference to the "Least Asto=
nishment Principle". This principle argues against solutions that produce un=
expected or surprising behavior.  Having the Root Zone exhibit fundamentally=
 different design decisions with respect to variants than those found on the=
 second level would have to be justified by strong arguments based on factor=
s special to the Root Zone. Absent such factors, the expectation  would be t=
hat the various levels are more or less compatible in their treatment of IDN=
 labels for a given script.

Finally, the claimed normalization exceptions appear based on a misundersta=
nding of the normalization algorithm. In normalizing to precomposed form (No=
rmalization Form C), the first step is to fully decompose the input string a=
nd then to reorder all combining  marks in a canonical order. Because of tha=
t, the two examples of e with grave and dot below would become identical at =
that stage of normalization. In the final stage of the algorithm, as much of=
 the sequence as possible is composed. But because both inputs  have the sam=
e fully decomposed and reordered form, their final NFC form is identical.=20

Or, put differently, only one of the two forms is in NFC, the other is unno=
rmalized and as such not admissible in the LGR.

=20


--B_3597993043_1373650010
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr=
osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" xmlns=3D"http:=
//www.w3.org/TR/REC-html40"><head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dutf-8">
<meta name=3D"Generator" content=3D"Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:"Yu Gothic";
	panose-1:2 11 4 0 0 0 0 0 0 0;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
	{font-family:"\@Yu Gothic";
	panose-1:2 11 4 0 0 0 0 0 0 0;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:11.0pt;
	font-family:"Calibri",sans-serif;
	color:black;}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:#0563C1;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:#954F72;
	text-decoration:underline;}
p.msonormal0, li.msonormal0, div.msonormal0
	{mso-style-name:msonormal;
	mso-margin-top-alt:auto;
	margin-right:0in;
	mso-margin-bottom-alt:auto;
	margin-left:0in;
	font-size:11.0pt;
	font-family:"Calibri",sans-serif;
	color:black;}
span.EmailStyle19
	{mso-style-type:personal-reply;
	font-family:"Calibri",sans-serif;
	color:windowtext;}
.MsoChpDefault
	{mso-style-type:export-only;
	font-size:10.0pt;}
@page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]-->
</head>
<body bgcolor=3D"white" lang=3D"EN-SG" link=3D"#0563C1" vlink=3D"#954F72">
<div class=3D"WordSection1">
<p class=3D"MsoNormal"><span style=3D"color:windowtext">Dear All,<o:p></o:p=
></span></p>
<p class=3D"MsoNormal"><span style=3D"color:windowtext"><o:p>&nbsp;</o:p></=
span></p>
<p class=3D"MsoNormal"><span style=3D"color:windowtext">Please find input f=
rom the Integration Panel in response for the call for comments on the prin=
ciples documents.<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"color:windowtext"><o:p>&nbsp;</o:p></=
span></p>
<p class=3D"MsoNormal"><span style=3D"color:windowtext">Regards,<br>
Sarmad<o:p></o:p></span></p>
<div>
<div class=3D"MsoNormal" align=3D"center" style=3D"text-align:center">
<hr size=3D"2" width=3D"100%" align=3D"center">
</div>
<p>The Integration Panel (IP) has reviewed &quot;Principles for Inclusion a=
nd Exclusion of Code Points in Latin Script for the Root Zone (Latin LGR)e&=
quot; and has the following comments:<o:p></o:p></p>
<p>The IP congratulates the Latin GP on the formulation of its &quot;Princi=
ples for Inclusion and Exclusion of Code Points in Latin Script for the Roo=
t Zone (Latin LGR)&quot;. They appear to cover the important considerations=
 and will likely serve the GP well in arriving
 at at list of proposed candidate code points. The IP would like to caution=
 that the final decision on whether to include or exclude a code point may =
not be possible by rote application of these (or any other set of) principl=
es, and that additional factors
 may have to be considered in individual cases. <o:p></o:p></p>
<p>The IP is looking forward to the next stage of the Latin GP's work and t=
o reviewing actual examples of draft code points.<o:p></o:p></p>
<p><u>Additional notes:</u><o:p></o:p></p>
<p>The IP would like to note that all entries in an LGR need to be in Unico=
de Normalization Form C (see RFC 7940) and further that IDNA requires NFC, =
even if it doesn't agree with the native typing order, or conventions regar=
ding precomposed, decomposed or
 mixed composed usage.&nbsp; RFC5890 states:&nbsp; &quot;A &quot;U-label&qu=
ot; is an IDNA-valid string of Unicode characters, in Normalization Form C =
(NFC)&quot;. Because entries are normalized dual encoding cannot exist.<o:p=
></o:p></p>
<p>In creating the repertoire each combining sequence needs to be individua=
lly justified and should be separately enumerated; combining marks should n=
ot be individually members of the repertoire.<o:p></o:p></p>
<p class=3D"MsoNormal" style=3D"margin-bottom:12.0pt">In applying these pri=
nciples, attention must be paid to the foundational documents for this work=
 as summarized in the &quot;Guidelines for Developing Script-Specific Label=
 Generation Rules for Integration into the
 Root Zone LGR&quot;.<o:p></o:p></p>
</div>
<p class=3D"MsoNormal">Further, the exclusion principles should mention exp=
licitly that the LGR repertoire is constrained by MSR: =C2=AB&nbsp;A code p=
oint not in the latest version of the MSR is excluded.&nbsp;If there is a c=
lear need to add one, the GP will contact the Integration
 Panel to assess the possibility of adding one to the MSR =C2=BB. <o:p></o:=
p></p>
<div class=3D"MsoNormal" align=3D"center" style=3D"text-align:center">
<hr size=3D"2" width=3D"100%" align=3D"center">
</div>
<p>The IP has reviewed &quot;Analysis of Variants in the Latin Script for t=
he Root Zone&quot; and has the following comments:<o:p></o:p></p>
<p>The actual guiding principle (contained in the second paragraph of the d=
ocument) appears to cover the important considerations and will likely serv=
e the GP well in arriving at at list of proposed candidate variants. The IP=
 would like to caution that the
 final decision on whether to include or exclude a variant may not be possi=
ble by rote application of this (or any other) principle, and that addition=
al factors may have to be considered in individual cases.
<o:p></o:p></p>
<p>The IP is looking forward to the next stage of the Latin GP's work and t=
o reviewing actual examples of draft variants.<o:p></o:p></p>
<p><u>Additional notes:</u><o:p></o:p></p>
<p>The IP has some concerns about the remainder of the document.<o:p></o:p>=
</p>
<p>The procedure sets a very narrow limit on the kinds of cases that can be=
 considered variants for the Root Zone; this is the basis of the statement =
by the IP that is quoted in a footnote. It might perhaps be better if this =
statement were incorporated into
 the definition of &quot;scope&quot;.<o:p></o:p></p>
<p>In that section, the opening remark about script mixing seem unconnected=
 to the discussion that follows. A straight listing of which related script=
s the GP will consider would be more useful.<o:p></o:p></p>
<p>The IP&nbsp; would like to point out that the example given the document=
 of Latin =C3=A8 (U&#43;00E8) and Cyrillic =D1=90 (U&#43;0450) may be moot =
because the final Cyrillic repertoire does not contain U&#43;0450. In gener=
al, it is expected that the analysis of cross-script repertoires
 remain limited to code points that are in the respective scripts' LGRs or =
draft LGRs.<o:p></o:p></p>
<p>The general discussion of &quot;classes of variants&quot; may be &quot;o=
f interest to the reader&quot;, but it isn't helpful in understanding which=
 principles the Latin GP will follow in deciding whether something is a var=
iant or not -- most of the items discussed are not applicable
 in the context of the Root Zone LGR. <o:p></o:p></p>
<p>In the context of the Root Zone, the Procedure is quite clear in that it=
 considers simple similarity of appearance to be outside the scope of the R=
oot Zone LGR. In admitting exact homoglyphs, the IP has been making the arg=
ument that =E2=80=98e=E2=80=99 in Latin (U&#43;0065)
 and =E2=80=98=D0=B5=E2=80=99 in Cyrillic(U&#43;0435) are not just visually=
 indistinguishable, but that their distinct code points effectively represe=
nt a disunification by script property. (A disunification not unlike that o=
f 01DD and 0259, which are disunified based on case, or the
 two sets of Arabic digits disunified largely on directional properties).<b=
r>
<br>
In the context of other script LGRs for the Root Zone, the IP has argued st=
rongly against embodying rules intended to deal with spelling issues. There=
fore, any orthographic variation (spelling differences) would require a ver=
y compelling case being made; the
 examples given may not rise to that level. For instance, =E2=80=98ss=E2=80=
=99 (U&#43;0073 U&#43;0073) and =E2=80=98=C3=9F=E2=80=99 (U&#43;00DF) are s=
eparately available on the second level, in the .de ccTLD (and presumably o=
thers). This would strongly argue against the claim that German usage would=
 require
 them to be variants - in fact the opposite might be concluded.<o:p></o:p><=
/p>
<p>Consideration of established practice in existing Latin-based IDNs ought=
 to be an important principle. The procedure makes reference to the &quot;L=
east Astonishment Principle&quot;. This principle argues against solutions =
that produce unexpected or surprising behavior.
 Having the Root Zone exhibit fundamentally different design decisions with=
 respect to variants than those found on the second level would have to be =
justified by strong arguments based on factors special to the Root Zone. Ab=
sent such factors, the expectation
 would be that the various levels are more or less compatible in their trea=
tment of IDN labels for a given script.<o:p></o:p></p>
<p>Finally, the claimed normalization exceptions appear based on a misunder=
standing of the normalization algorithm. In normalizing to precomposed form=
 (Normalization Form C), the first step is to fully decompose the input str=
ing and then to reorder all combining
 marks in a canonical order. Because of that, the two examples of e with gr=
ave and dot below would become identical at that stage of normalization. In=
 the final stage of the algorithm, as much of the sequence as possible is c=
omposed. But because both inputs
 have the same fully decomposed and reordered form, their final NFC form is=
 identical.
<o:p></o:p></p>
<p>Or, put differently, only one of the two forms is in NFC, the other is u=
nnormalized and as such not admissible in the LGR.<o:p></o:p></p>
<div class=3D"MsoNormal" align=3D"center" style=3D"text-align:center">
<hr size=3D"2" width=3D"100%" align=3D"center">
</div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
</body>
</html>


--B_3597993043_1373650010--

--B_3597993043_2077910385
Content-Type: text/plain; name="ATT00001.txt"
Content-ID: <BBCEE2CC24C4584B885C1E18F42334EE@verisign.com>
Content-Disposition: attachment; filename="ATT00001.txt"
Content-Transfer-Encoding: base64

X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18NCkxhdGluZ3Ag
bWFpbGluZyBsaXN0DQpMYXRpbmdwQGljYW5uLm9yZw0KaHR0cHM6Ly9tbS5pY2Fubi5vcmcvbWFp
bG1hbi9saXN0aW5mby9sYXRpbmdwDQo=

--B_3597993043_2077910385--