Re: [Translationtransliterationirt] T/T Update

Jan. 19, 2017

      This is an important point Amr and I really don’t know what the best 
answer is here.

On the one hand, the recommendation is specific, and what you are saying 
is exactly correct.  I’ll even go so far as to say that I have argued 
this position quite strongly, up until recently.

On the other hand, my thinking has evolved.  I’m much more circumspect 
now about the practical realities of implementing this, which is to say 
there’s probably an 80% solution (or something close to it) but with 
the emphasis on accuracy these days I’m extremely concerned that the 
remaining 20% solution is neither practical nor, in some cases, even 
possible.

Therefore I’m much more inclined towards a nuanced “spirit” of the 
recommendation.  However, I am sensitive to process, too, so perhaps 
this is a problem.

In any case, this is worthy of some discussion.  I’m expecting Roger 
to have some things to say about the implementation of all this.

Jim

On 17 Jan 2017, at 8:13, Amr Elsadr wrote:
...
Hi,
Thanks for the responses, Jim. I found them to be very helpful, and 
pretty much agree except for one nuance. I do appreciate the 
challenges you’ve identified regarding identification of languages 
(as opposed to scripts). I can also think of different scenarios where 
this could be complicated. For example, there are common names between 
Arabic and Urdu, which are spelled identically using the same script, 
as well as common Unicode ranges. If all one had to go on to identify 
the language of one of those names was having it as an entry in the 
RDS, then it’d be a challenge to say the least.
Having said that, it doesn’t really change the fact that 
identification of the language is part of the current policy the IRT 
is tasked with implementing, does it? Speaking for myself, I had hoped 
that the T/T WG’s recommendation #3 would somehow mitigate this 
issue:
...
Recommendation #3 The Working Group recommends that the language(s) 
and script(s) supported for registrants to submit their contact 
information data may be chosen in accordance with gTLD-provider 
business models.
So (hypothetically), a registrar operating out of Dubai (or perhaps 
even a reseller) offering registration services (including submission 
of contact info) in Arabic should have some practical way of 
differentiating between Arabic and Urdu. I certainly don’t have any 
suggestions on implementation details for this, and look to contracted 
parties to indicate what is and isn’t feasible. My standing 
impression of the T/T policy recommendations in their entirety, is 
that the policy is meant to make the provision of registration 
services using IRD less, not more complex. If automated language 
detection will not be a mandatory implementation measure, then my hope 
is that gTLD providers can come up with innovative ways to get 
language detection done as reliably as possible. I don’t imagine 
that there is one solution that will work across all the different 
languages out there. I’d be very glad to be corrected on this.
Thanks.
Amr
...
On Jan 16, 2017, at 7:14 PM, James Galvin <jgalvin@afilias.info> 
wrote:
On 13 Jan 2017, at 13:19, Amr Elsadr wrote:
...
Hi again,
I realize now that some of what I said in the email below may seem 
confusing and conflicting with what I said in my first email on this 
thread. To be clear on my understanding of the policy (and I would 
welcome being corrected on this) — the language/script of the data 
submitted by the registrant needs to be identified regardless of 
whether or not transformation is done.
Agree regarding script.  Language is more problematic; see earlier 
response.
...
If the “gTLD Provider” chooses to transform the contact 
information, then the language/script of the transformed contact 
info also needs to be identified, as well as who provided the 
transformed info.
Did I get this right?
Agree.
Jim
...
Thanks again, and apologies for any confusion.
Amr
...
On Jan 13, 2017, at 7:33 PM, Amr Elsadr <aelsadr@egyptig.org> 
wrote:
Hi Jim,
Some thoughts in-line below:
...
On Jan 12, 2017, at 10:37 PM, James Galvin <jgalvin@afilias.info> 
wrote:
On 5 Jan 2017, at 14:31, Brian Aitchison wrote:
...
Dear T/T IRT members,
Happy New Year! I hope you all had a relaxing holiday.
I¹ve had a chance to review our last call, and there¹s a number 
of nuanced
issues we need to sort out regarding the T/T WG Recommendations, 
ie:
* How to approach language and script tags. Some discussion was 
had on
whether we need these at all. Which leads to
I would nuance this in a slightly different way.
The scripts that support a given code point are known as part of 
the Unicode table.  So the principle in the recommendation that 
the script must always be known could be “implemented” by 
accepting that it’s already present.
That sounds reasonable to me.
...
The principle that the language tag must be present is the 
problematic one.  My thinking has evolved since both the IRD 
working group and the T&T working group have completed their work.
I believe it is consistent with the various recommendations from 
both groups to suggest the following.
It must be possible to store a language tag with a data element 
(we’ll have to talk about what is meant by an “element” 
here, e.g., postal address in total or separately for each part of 
a postal address) if that value is known.
When the T/T PDP was being chartered (when the working group 
charter was being drafted), the charter drafting team considered 
adding what data elements should be considered in the context of 
transformation of contact information. The response from staff at 
the time was that this should remain out of scope of this PDP, as 
it would be considered more thoroughly in another PDP in the 
future. This was presumably the next generation RDS PDP, which is 
indeed considering data elements, and should do so taking in to 
account this PDP as well as the work done by the IRD WG.
My understanding of this policy is that the language/script used to 
enter registration data by a registrant must be easily 
identifiable, but I’m not sure that a tag needs to be attached to 
every data element. I’m have no technical expertise on this, so 
can’t say what is more technically feasible, but all that is 
really required is that the language/script be identified. This 
could be interpreted as an independent data element that is 
included with the rest of the registration data, couldn’t it? I 
am presuming that the registrant will use only one language/script 
to enter the contact information, not multiple ones.
...
A registrar should provide it if known and the registry should 
store it.  If it is present it should be displayed whenever that 
element is output as part of a directory service.
I don’t see why a registrar wouldn’t know what language/script 
is being used. It is the business model of the registrar and 
registry that will dictate what languages/scripts are permitted to 
be used, no? Surely, it is one that the registrar is familiar with, 
and is offering services to customers using it?
...
If a data element is transformed, then both the origin language 
and the destination language must be known and both must be 
displayed on output.  If the originating language tag is not 
specified, the requestor of the transformation must determine the 
origin language through a means that is outside the scope of these 
recommendations.
I think we may be reading this policy a little differently. My 
understanding is that regardless of whether or not the 
authoritative original data is transformed, the language/script 
used must be identified, not only if it is transformed. If it is 
transformed, then yes…, but languages/scripts need to be 
identified, as well as the source of the transformation (who 
actually did it). There shouldn’t be any need for the requestor 
of the transformation to have to determine the original language.
Have I gotten this wrong?
...
Implications:
1. If a third party (i.e., not the registrar or the registry) is 
doing the transformation, no additional requirements apply.
If the registrant is voluntarily doing the transformation, then 
there should be a field that indicates that the data is 
transformed, as well as an indication to who did it. Same would 
apply for the registrar or registry.
...
2. If a registry or a registrar is doing the transformation, then 
upon display or storage: a) both forms must always be shown; b) 
the language tag must always be included.
Yes. And again…, the source of the transformation needs to be 
included as well. Keeping in mind of course, that the transformed 
data is not authoritative, and the accuracy of 
translation/transliteration cannot be guaranteed.
Thanks.
Amr
...
...
* Requirements for gathering language data, if any. The IRT seems 
to be
gravitating toward automated methods for detecting script. Some 
discussion
was had on whether parties requesting transformation should bear 
the burden
of inferring what language was entered by a registrant based on 
the country
he/she entered into the RDS system, script used, and any other 
method the
requesting party deems appropriate to make their transformation.
Please see my comment above.
Jim
...
* Reconciling the optional provisions contained within the T/T WG
recommendationsespecially Rec. 1with any requirements we 
identify for
contracted parties to gather language and script data to enable
transformations.
Some of our team is still out on holiday, so we won¹t have a 
chance to
brainstorm possible approaches and solutions to these issues 
until next
week. Once we¹ve had a chance to do so, I¹ll send out a doodle 
poll and
invite as usual. I expect our next call to take place during the 
week of the
16th.
Thanks all very much for your thoughtful contributions to this 
project. I
look forward to our discussions this year.
All best,
Brian
Brian Aitchison, MRes, PhD
Lead Researcher
Operations & Policy Research
Internet Corporation for Assigned Names and Numbers (ICANN)
12025 Waterfront Drive, Suite 300
Los Angeles, CA 90094-2536
Direct Line: +1 310 578 8688
Mobile: +1 424 353 9041
Email: brian.aitchison@icann.org
Skype: brian.aitchison.icann
Twitter: @BrianAitch
LinkedIn: linkedin.com/in/baitchison 
<http://linkedin.com/in/baitchison>
www.icann.org <http://www.icann.org>
CONFIDENTIALITY NOTICE:  This email and any attachments are for 
the sole use
of the intended recipient(s) and contain information that may be
confidential and/or legally privileged.  If you have received 
this email in
error, please notify the sender by reply email and delete the 
message.  Any
disclosure, copying, distribution or use of this communication by 
someone
other than the intended recipient is prohibited.
...
_______________________________________________
Translationtransliterationirt mailing list
Translationtransliterationirt@icann.org
https://mm.icann.org/mailman/listinfo/translationtransliterationirt

Translationtransliterationirt mailing list
Translationtransliterationirt@icann.org
https://mm.icann.org/mailman/listinfo/translationtransliterationirt
_______________________________________________
Translationtransliterationirt mailing list
Translationtransliterationirt@icann.org
https://mm.icann.org/mailman/listinfo/translationtransliterationirt
_______________________________________________
Translationtransliterationirt mailing list
Translationtransliterationirt@icann.org
https://mm.icann.org/mailman/listinfo/translationtransliterationirt