This is an important point Amr and I really don’t know what the best answer is here. On the one hand, the recommendation is specific, and what you are saying is exactly correct. I’ll even go so far as to say that I have argued this position quite strongly, up until recently. On the other hand, my thinking has evolved. I’m much more circumspect now about the practical realities of implementing this, which is to say there’s probably an 80% solution (or something close to it) but with the emphasis on accuracy these days I’m extremely concerned that the remaining 20% solution is neither practical nor, in some cases, even possible. Therefore I’m much more inclined towards a nuanced “spirit” of the recommendation. However, I am sensitive to process, too, so perhaps this is a problem. In any case, this is worthy of some discussion. I’m expecting Roger to have some things to say about the implementation of all this. Jim On 17 Jan 2017, at 8:13, Amr Elsadr wrote:
Hi,
Thanks for the responses, Jim. I found them to be very helpful, and pretty much agree except for one nuance. I do appreciate the challenges you’ve identified regarding identification of languages (as opposed to scripts). I can also think of different scenarios where this could be complicated. For example, there are common names between Arabic and Urdu, which are spelled identically using the same script, as well as common Unicode ranges. If all one had to go on to identify the language of one of those names was having it as an entry in the RDS, then it’d be a challenge to say the least.
Having said that, it doesn’t really change the fact that identification of the language is part of the current policy the IRT is tasked with implementing, does it? Speaking for myself, I had hoped that the T/T WG’s recommendation #3 would somehow mitigate this issue:
Recommendation #3 The Working Group recommends that the language(s) and script(s) supported for registrants to submit their contact information data may be chosen in accordance with gTLD-provider business models.
So (hypothetically), a registrar operating out of Dubai (or perhaps even a reseller) offering registration services (including submission of contact info) in Arabic should have some practical way of differentiating between Arabic and Urdu. I certainly don’t have any suggestions on implementation details for this, and look to contracted parties to indicate what is and isn’t feasible. My standing impression of the T/T policy recommendations in their entirety, is that the policy is meant to make the provision of registration services using IRD less, not more complex. If automated language detection will not be a mandatory implementation measure, then my hope is that gTLD providers can come up with innovative ways to get language detection done as reliably as possible. I don’t imagine that there is one solution that will work across all the different languages out there. I’d be very glad to be corrected on this.
Thanks.
Amr
On Jan 16, 2017, at 7:14 PM, James Galvin <jgalvin@afilias.info> wrote:
On 13 Jan 2017, at 13:19, Amr Elsadr wrote:
Hi again,
I realize now that some of what I said in the email below may seem confusing and conflicting with what I said in my first email on this thread. To be clear on my understanding of the policy (and I would welcome being corrected on this) — the language/script of the data submitted by the registrant needs to be identified regardless of whether or not transformation is done.
Agree regarding script. Language is more problematic; see earlier response.
If the “gTLD Provider” chooses to transform the contact information, then the language/script of the transformed contact info also needs to be identified, as well as who provided the transformed info.
Did I get this right?
Agree.
Jim
Thanks again, and apologies for any confusion.
Amr
On Jan 13, 2017, at 7:33 PM, Amr Elsadr <aelsadr@egyptig.org> wrote:
Hi Jim,
Some thoughts in-line below:
On Jan 12, 2017, at 10:37 PM, James Galvin <jgalvin@afilias.info> wrote:
On 5 Jan 2017, at 14:31, Brian Aitchison wrote:
Dear T/T IRT members,
Happy New Year! I hope you all had a relaxing holiday.
I¹ve had a chance to review our last call, and there¹s a number of nuanced issues we need to sort out regarding the T/T WG Recommendations, ie: * How to approach language and script tags. Some discussion was had on whether we need these at all. Which leads to
I would nuance this in a slightly different way.
The scripts that support a given code point are known as part of the Unicode table. So the principle in the recommendation that the script must always be known could be “implemented” by accepting that it’s already present.
That sounds reasonable to me.
The principle that the language tag must be present is the problematic one. My thinking has evolved since both the IRD working group and the T&T working group have completed their work.
I believe it is consistent with the various recommendations from both groups to suggest the following.
It must be possible to store a language tag with a data element (we’ll have to talk about what is meant by an “element” here, e.g., postal address in total or separately for each part of a postal address) if that value is known.
When the T/T PDP was being chartered (when the working group charter was being drafted), the charter drafting team considered adding what data elements should be considered in the context of transformation of contact information. The response from staff at the time was that this should remain out of scope of this PDP, as it would be considered more thoroughly in another PDP in the future. This was presumably the next generation RDS PDP, which is indeed considering data elements, and should do so taking in to account this PDP as well as the work done by the IRD WG.
My understanding of this policy is that the language/script used to enter registration data by a registrant must be easily identifiable, but I’m not sure that a tag needs to be attached to every data element. I’m have no technical expertise on this, so can’t say what is more technically feasible, but all that is really required is that the language/script be identified. This could be interpreted as an independent data element that is included with the rest of the registration data, couldn’t it? I am presuming that the registrant will use only one language/script to enter the contact information, not multiple ones.
A registrar should provide it if known and the registry should store it. If it is present it should be displayed whenever that element is output as part of a directory service.
I don’t see why a registrar wouldn’t know what language/script is being used. It is the business model of the registrar and registry that will dictate what languages/scripts are permitted to be used, no? Surely, it is one that the registrar is familiar with, and is offering services to customers using it?
If a data element is transformed, then both the origin language and the destination language must be known and both must be displayed on output. If the originating language tag is not specified, the requestor of the transformation must determine the origin language through a means that is outside the scope of these recommendations.
I think we may be reading this policy a little differently. My understanding is that regardless of whether or not the authoritative original data is transformed, the language/script used must be identified, not only if it is transformed. If it is transformed, then yes…, but languages/scripts need to be identified, as well as the source of the transformation (who actually did it). There shouldn’t be any need for the requestor of the transformation to have to determine the original language.
Have I gotten this wrong?
Implications:
1. If a third party (i.e., not the registrar or the registry) is doing the transformation, no additional requirements apply.
If the registrant is voluntarily doing the transformation, then there should be a field that indicates that the data is transformed, as well as an indication to who did it. Same would apply for the registrar or registry.
2. If a registry or a registrar is doing the transformation, then upon display or storage: a) both forms must always be shown; b) the language tag must always be included.
Yes. And again…, the source of the transformation needs to be included as well. Keeping in mind of course, that the transformed data is not authoritative, and the accuracy of translation/transliteration cannot be guaranteed.
Thanks.
Amr
* Requirements for gathering language data, if any. The IRT seems to be gravitating toward automated methods for detecting script. Some discussion was had on whether parties requesting transformation should bear the burden of inferring what language was entered by a registrant based on the country he/she entered into the RDS system, script used, and any other method the requesting party deems appropriate to make their transformation.
Please see my comment above.
Jim
* Reconciling the optional provisions contained within the T/T WG recommendationsespecially Rec. 1with any requirements we identify for contracted parties to gather language and script data to enable transformations. Some of our team is still out on holiday, so we won¹t have a chance to brainstorm possible approaches and solutions to these issues until next week. Once we¹ve had a chance to do so, I¹ll send out a doodle poll and invite as usual. I expect our next call to take place during the week of the 16th.
Thanks all very much for your thoughtful contributions to this project. I look forward to our discussions this year.
All best,
Brian
Brian Aitchison, MRes, PhD Lead Researcher Operations & Policy Research Internet Corporation for Assigned Names and Numbers (ICANN) 12025 Waterfront Drive, Suite 300 Los Angeles, CA 90094-2536
Direct Line: +1 310 578 8688 Mobile: +1 424 353 9041 Email: brian.aitchison@icann.org Skype: brian.aitchison.icann Twitter: @BrianAitch LinkedIn: linkedin.com/in/baitchison <http://linkedin.com/in/baitchison> www.icann.org <http://www.icann.org>
CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of the intended recipient(s) and contain information that may be confidential and/or legally privileged. If you have received this email in error, please notify the sender by reply email and delete the message. Any disclosure, copying, distribution or use of this communication by someone other than the intended recipient is prohibited.
_______________________________________________ Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
_______________________________________________ Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
_______________________________________________ Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt