Correction: Discussion of the Armenian GP request
[[ As Eric Brunner-Williams pointed out on our first call, I had a pretty serious typo in the message I sent out yesterday. This is the corrected version. ]] It is not clear to me that the Latin GP should addressing the issues that were sent to us by the Armenian GP. Here are a few thoughts on the subject. 1) The Integration Panel believes that the Armenian script is separable from the Latin script, according to Section 3.8.3 of the MSR2. 2) Section B.3.4.2 of the LGR Procedures document says "Finally, in investigating the possible variant relations, Generation Panels should ignore cases where the relation is based exclusively on aspects of visual similarity." 3) Even ignoring the previous two points, the request seems to be for cross-script variants, such as for an Armenian letter that looks like a Latin letter. The "Guidelines for LGR" document indicates that cross-script variants might be created, but gives no hints about why we should do that given that the root zone will consist of labels of a single script (according to Section A.3.1). This feels to me like the Armenian GP's submission is not related to ours, so we should not be commenting on it. --Paul Hoffman
Paul, Thank you for the clarifications, on today's conference call and on the mailing list. We have the general question of what, in addition to the base character set specified in rfc1034/1035, drawing on earlier rfcs, letters-digits-and-hyphen, is necessary for constructing labels, for users of latin script. Our work product will be of the form of some rules for the formation of identifiers, constrained by the limitations on labels arising from the IDNA work of 2003 and 2010. There may be context-specific rules, perhaps for labels which originate, or terminate, a sequence of labels, e.g., those labels published as part of the IANA root zone and are composed of characters a single script as defined in the current version of UNICODE. What ever those context-specific rules may be, ours is the general problem of identifiers expressed in the latin script, used to associate resources at public addresses by the protocol defined in rfc1034/1035 and their successors. If a label is terminal, there may be terminal-specific rules. My understanding is that our peers in the Armenian GP have informed us (via the "similar scripts" question in our common boiler-plate initial document) that there are one or more glyphs common to the Armenian script which are similar to one or more glyphs common to the Latin script. In general this is probably not "news", as whatever the final form of general rules we issue as our work product, our rules are likely to "be aware" that homoglyphs exist, etc. Eric Brunner-Williams Eugene, Oregon On 9/22/15 11:02 AM, Paul Hoffman wrote:
[[ As Eric Brunner-Williams pointed out on our first call, I had a pretty serious typo in the message I sent out yesterday. This is the corrected version. ]]
It is not clear to me that the Latin GP should addressing the issues that were sent to us by the Armenian GP. Here are a few thoughts on the subject.
1) The Integration Panel believes that the Armenian script is separable from the Latin script, according to Section 3.8.3 of the MSR2.
2) Section B.3.4.2 of the LGR Procedures document says "Finally, in investigating the possible variant relations, Generation Panels should ignore cases where the relation is based exclusively on aspects of visual similarity."
3) Even ignoring the previous two points, the request seems to be for cross-script variants, such as for an Armenian letter that looks like a Latin letter. The "Guidelines for LGR" document indicates that cross-script variants might be created, but gives no hints about why we should do that given that the root zone will consist of labels of a single script (according to Section A.3.1).
This feels to me like the Armenian GP's submission is not related to ours, so we should not be commenting on it.
--Paul Hoffman
_______________________________________________ Latingp mailing list Latingp@icann.org https://mm.icann.org/mailman/listinfo/latingp
[[ Finally following up after a long break. ]]
Thank you for the clarifications, on today's conference call and on the mailing list.
We have the general question of what, in addition to the base character set specified in rfc1034/1035, drawing on earlier rfcs, letters-digits-and-hyphen, is necessary for constructing labels, for users of latin script.
Our work product will be of the form of some rules for the formation of identifiers, constrained by the limitations on labels arising from the IDNA work of 2003 and 2010.
Aren't we are supposed to only be looking at IDNA2008 (which was finished in 2010)? All the documents seem to list that version.
There may be context-specific rules, perhaps for labels which originate, or terminate, a sequence of labels, e.g., those labels published as part of the IANA root zone and are composed of characters a single script as defined in the current version of UNICODE.
The Generation Panels output is for labels in the root zone only, not for labels in the second level and below. Our output goes to the Integration Panel to put into the Root Zone LGR. All of the documents I've seen so far talk about the Root Zone LGR.
What ever those context-specific rules may be, ours is the general problem of identifiers expressed in the latin script, used to associate resources at public addresses by the protocol defined in rfc1034/1035 and their successors. If a label is terminal, there may be terminal-specific rules.
My understanding is that our peers in the Armenian GP have informed us (via the "similar scripts" question in our common boiler-plate initial document) that there are one or more glyphs common to the Armenian script which are similar to one or more glyphs common to the Latin script. In general this is probably not "news", as whatever the final form of general rules we issue as our work product, our rules are likely to "be aware" that homoglyphs exist, etc.
Have we heard back from the Integration Panel on this? --Paul Hoffman
Paul, Taking your interlinear comments in sequence, your first comment goes to the work completed in 2010, which updates and significantly changed the prior IETF work product, published in 2003. However, I should have mentioned the prior "IDN Testbed" of 2000, memorialized by the Corporation Board, as my intent was to suggest that work restricted to domain names has been ongoing for some time, though obviously not as long as internationalization and localization work, and the publication of standards for character repertoires. Your second comment goes to the scope of the Latin panel's work. Is it limited to a single label, the initial label to the left of the terminating dot character? I've no doubt that there exists a consumer of the Latin panel's work product that has both (a) a scope of interest restricted to the initial label to the left of the terminating dot character, and (b) additional rules generally unknown to the members of the Latin panel, not arising from the general properties of Latin script. More generally, we can't know that an adjacent label can only exist at one, but not both, terminators of a label, as that is not a property of Latin script, nor a property of the sets of processing rules we assume valid, where not contradicting, which are the encoding rules of 2010. If we did "know that", then we would, of necessity, also "know that" our work product is only true for one label in any sequence of labels, even were all labels in a sequence are composed of characters from the Latin script repertoire. I suggest it is better we do not know, and assume any label is neither initial nor terminal, and where initial and/or terminal rules exist, they are the responsibility of those who make positional rules rather than organic script rules. To your third comment I've not heard from the Integration Panel, nor do I expect to. I should point out that as Staff, you are free to direct the volunteer contributors in any way you see fit, and I may lack the qualifications necessary to assist the Corporation in this area. Eric On 10/12/15 1:29 PM, Paul Hoffman wrote:
[[ Finally following up after a long break. ]]
Thank you for the clarifications, on today's conference call and on the mailing list.
We have the general question of what, in addition to the base character set specified in rfc1034/1035, drawing on earlier rfcs, letters-digits-and-hyphen, is necessary for constructing labels, for users of latin script.
Our work product will be of the form of some rules for the formation of identifiers, constrained by the limitations on labels arising from the IDNA work of 2003 and 2010.
Aren't we are supposed to only be looking at IDNA2008 (which was finished in 2010)? All the documents seem to list that version.
There may be context-specific rules, perhaps for labels which originate, or terminate, a sequence of labels, e.g., those labels published as part of the IANA root zone and are composed of characters a single script as defined in the current version of UNICODE.
The Generation Panels output is for labels in the root zone only, not for labels in the second level and below. Our output goes to the Integration Panel to put into the Root Zone LGR. All of the documents I've seen so far talk about the Root Zone LGR.
What ever those context-specific rules may be, ours is the general problem of identifiers expressed in the latin script, used to associate resources at public addresses by the protocol defined in rfc1034/1035 and their successors. If a label is terminal, there may be terminal-specific rules.
My understanding is that our peers in the Armenian GP have informed us (via the "similar scripts" question in our common boiler-plate initial document) that there are one or more glyphs common to the Armenian script which are similar to one or more glyphs common to the Latin script. In general this is probably not "news", as whatever the final form of general rules we issue as our work product, our rules are likely to "be aware" that homoglyphs exist, etc.
Have we heard back from the Integration Panel on this?
--Paul Hoffman
participants (2)
-
Eric Brunner-Williams -
Paul Hoffman