Dear T/T IRT members, Happy New Year! I hope you all had a relaxing holiday. I¹ve had a chance to review our last call, and there¹s a number of nuanced issues we need to sort out regarding the T/T WG Recommendations, ie: * How to approach language and script tags. Some discussion was had on whether we need these at all. Which leads to * Requirements for gathering language data, if any. The IRT seems to be gravitating toward automated methods for detecting script. Some discussion was had on whether parties requesting transformation should bear the burden of inferring what language was entered by a registrant based on the country he/she entered into the RDS system, script used, and any other method the requesting party deems appropriate to make their transformation. * Reconciling the optional provisions contained within the T/T WG recommendationsespecially Rec. 1with any requirements we identify for contracted parties to gather language and script data to enable transformations. Some of our team is still out on holiday, so we won¹t have a chance to brainstorm possible approaches and solutions to these issues until next week. Once we¹ve had a chance to do so, I¹ll send out a doodle poll and invite as usual. I expect our next call to take place during the week of the 16th. Thanks all very much for your thoughtful contributions to this project. I look forward to our discussions this year. All best, Brian Brian Aitchison, MRes, PhD Lead Researcher Operations & Policy Research Internet Corporation for Assigned Names and Numbers (ICANN) 12025 Waterfront Drive, Suite 300 Los Angeles, CA 90094-2536 Direct Line: +1 310 578 8688 Mobile: +1 424 353 9041 Email: brian.aitchison@icann.org Skype: brian.aitchison.icann Twitter: @BrianAitch LinkedIn: linkedin.com/in/baitchison <http://linkedin.com/in/baitchison> www.icann.org <http://www.icann.org> CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of the intended recipient(s) and contain information that may be confidential and/or legally privileged. If you have received this email in error, please notify the sender by reply email and delete the message. Any disclosure, copying, distribution or use of this communication by someone other than the intended recipient is prohibited.
Good Afternoon, Thanks for the update Brian, looking forward to getting back to this work. In response (mostly so I don't forget J). Not that you were suggesting this but I think the group should not mandate "automated" methods for detecting, I think we should suggest "automated" but leave it up to the implementers as how they want to detect. In regards to the last point, and as I have stated before in different ways, my expectation coming out of the T/T PDP was that transformation (in its entirety) was not mandatory. And to me this meant none of the recommendations were mandatory but that if you did translate and/or transliterate then these recommendations would apply. My expectation for the T/T implementation group was to come up with an implementation for those that choose to translate and/or transliterate. Happy New Year everyone!! Thanks Roger From: translationtransliterationirt-bounces@icann.org [mailto:translationtransliterationirt-bounces@icann.org] On Behalf Of Brian Aitchison Sent: Thursday, January 05, 2017 1:32 PM To: translationtransliterationirt@icann.org Subject: [Translationtransliterationirt] T/T Update Dear T/T IRT members, Happy New Year! I hope you all had a relaxing holiday. I've had a chance to review our last call, and there's a number of nuanced issues we need to sort out regarding the T/T WG Recommendations, ie: * How to approach language and script tags. Some discussion was had on whether we need these at all. Which leads to... * Requirements for gathering language data, if any. The IRT seems to be gravitating toward automated methods for detecting script. Some discussion was had on whether parties requesting transformation should bear the burden of inferring what language was entered by a registrant based on the country he/she entered into the RDS system, script used, and any other method the requesting party deems appropriate to make their transformation. * Reconciling the optional provisions contained within the T/T WG recommendations-especially Rec. 1-with any requirements we identify for contracted parties to gather language and script data to enable transformations. Some of our team is still out on holiday, so we won't have a chance to brainstorm possible approaches and solutions to these issues until next week. Once we've had a chance to do so, I'll send out a doodle poll and invite as usual. I expect our next call to take place during the week of the 16th. Thanks all very much for your thoughtful contributions to this project. I look forward to our discussions this year. All best, Brian Brian Aitchison, MRes, PhD Lead Researcher Operations & Policy Research Internet Corporation for Assigned Names and Numbers (ICANN) 12025 Waterfront Drive, Suite 300 Los Angeles, CA 90094-2536 Direct Line: +1 310 578 8688 Mobile: +1 424 353 9041 Email: brian.aitchison@icann.org<mailto:brian.aitchison@icann.org> Skype: brian.aitchison.icann Twitter: @BrianAitch LinkedIn: linkedin.com/in/baitchison<http://linkedin.com/in/baitchison> www.icann.org<http://www.icann.org> CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of the intended recipient(s) and contain information that may be confidential and/or legally privileged. If you have received this email in error, please notify the sender by reply email and delete the message. Any disclosure, copying, distribution or use of this communication by someone other than the intended recipient is prohibited.
Hi, Apologies for not attending the call (if there is an archived mp3 recording, I’d appreciate a link), so forgive me if my comments are a bit out of context. Whether or not this group should mandate automated methods for language detection is not something I have a ready thought on, but I imagine that whether or not the contact data language detection is automated, it needs to be accurate in its detection. Is that something that is possible? There were studies done a couple of years ago on accuracy of automated T/T, and those did not prove fruitful. Too many identified languages scripts with large error margins (more on this can be found on the PDP WG wiki page here: https://community.icann.org/display/tatcipdp/7+Studies+and+Background+Docume...). Language/script detection may be a different matter, but are there any indications out there that this can indeed be done reliably? That might be something worth considering before contemplating mandatory automated detection as an implementation measure. Also on this from Roger:
On Jan 5, 2017, at 10:17 PM, Roger D Carney <rcarney@godaddy.com> wrote:
[snip]
In regards to the last point, and as I have stated before in different ways, my expectation coming out of the T/T PDP was that transformation (in its entirety) was not mandatory. And to me this meant none of the recommendations were mandatory but that if you did translate and/or transliterate then these recommendations would apply. My expectation for the T/T implementation group was to come up with an implementation for those that choose to translate and/or transliterate.
Yes…, this sounds right to me. If a “gTLD Provider”, which I have always understood to be a registry and/or registrar chooses to provide registration services in local languages and scripts with optional transformation, the rest of the consensus policy kicks in. So if a contracted party chooses to transform, it will be required to fill in language tags. It is however my understanding the the whois replacement system needs to be able to provide language tags as a feature, so that in the event that the contracted party chooses to transform data, the ability to tag certain fields is in place. If I recall correctly, the PDP WG also recommended that it be indicated who actually transformed the authoritative data in its local language/script. You may also find an exchange that took place between the ICANN Board and GNSO Council on some of this to be insightful. Letter from the ICANN Board to the GNSO Council: https://www.icann.org/en/system/files/correspondence/crocker-to-bladel-11may... GNSO Council response: https://gnso.icann.org/en/correspondence/gnso-council-to-icann-board-21dec16... Thanks. Amr
Thanks very much Roger and Amr for your responses. I think we¹re in agreement regarding automated script detection as a mandatory provision. We¹ll discuss further with our team next week, but I do not imagine we¹ll want to craft any language that mandates this. And to restate what both Roger and Amr have said, no one is required to transform, but if someone requests it, it should be done according to the standards we develop. Think we¹re on the same page here. I encourage everyone to read the GNSO letters Amr linked in his email. They pertain to the relationship between the Internationalized Registration Data Final Report and the T/T Final Report. Among other things, they provide very useful context for language tagging. All best, Brian On 1/5/17, 1:11 PM, "translationtransliterationirt-bounces@icann.org on behalf of Amr Elsadr" <translationtransliterationirt-bounces@icann.org on behalf of aelsadr@egyptig.org> wrote:
Hi,
Apologies for not attending the call (if there is an archived mp3 recording, I¹d appreciate a link), so forgive me if my comments are a bit out of context. Whether or not this group should mandate automated methods for language detection is not something I have a ready thought on, but I imagine that whether or not the contact data language detection is automated, it needs to be accurate in its detection. Is that something that is possible? There were studies done a couple of years ago on accuracy of automated T/T, and those did not prove fruitful. Too many identified languages scripts with large error margins (more on this can be found on the PDP WG wiki page here: https://community.icann.org/display/tatcipdp/7+Studies+and+Background+Docu ments). Language/script detection may be a different matter, but are there any indications out there that this can indeed be done reliably? That might be something worth considering before contemplating mandatory automated detection as an implementation measure.
Also on this from Roger:
On Jan 5, 2017, at 10:17 PM, Roger D Carney <rcarney@godaddy.com> wrote:
[snip]
In regards to the last point, and as I have stated before in different ways, my expectation coming out of the T/T PDP was that transformation (in its entirety) was not mandatory. And to me this meant none of the recommendations were mandatory but that if you did translate and/or transliterate then these recommendations would apply. My expectation for the T/T implementation group was to come up with an implementation for those that choose to translate and/or transliterate.
Yes, this sounds right to me. If a ³gTLD Provider², which I have always understood to be a registry and/or registrar chooses to provide registration services in local languages and scripts with optional transformation, the rest of the consensus policy kicks in. So if a contracted party chooses to transform, it will be required to fill in language tags. It is however my understanding the the whois replacement system needs to be able to provide language tags as a feature, so that in the event that the contracted party chooses to transform data, the ability to tag certain fields is in place. If I recall correctly, the PDP WG also recommended that it be indicated who actually transformed the authoritative data in its local language/script.
You may also find an exchange that took place between the ICANN Board and GNSO Council on some of this to be insightful.
Letter from the ICANN Board to the GNSO Council: https://www.icann.org/en/system/files/correspondence/crocker-to-bladel-11m ay16-en.pdf
GNSO Council response: https://gnso.icann.org/en/correspondence/gnso-council-to-icann-board-21dec 16-en.pdf
Thanks.
Amr _______________________________________________ Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
Hi Amr, Brian and Roger I just wanted to support Amr's points, and also thank you Brian for your reassurance on the issues. Best Emily On 5 January 2017 at 23:33, Brian Aitchison <brian.aitchison@icann.org> wrote:
Thanks very much Roger and Amr for your responses.
I think we¹re in agreement regarding automated script detection as a mandatory provision. We¹ll discuss further with our team next week, but I do not imagine we¹ll want to craft any language that mandates this.
And to restate what both Roger and Amr have said, no one is required to transform, but if someone requests it, it should be done according to the standards we develop. Think we¹re on the same page here.
I encourage everyone to read the GNSO letters Amr linked in his email. They pertain to the relationship between the Internationalized Registration Data Final Report and the T/T Final Report. Among other things, they provide very useful context for language tagging.
All best,
Brian
On 1/5/17, 1:11 PM, "translationtransliterationirt-bounces@icann.org on behalf of Amr Elsadr" <translationtransliterationirt-bounces@icann.org on behalf of aelsadr@egyptig.org> wrote:
Hi,
Apologies for not attending the call (if there is an archived mp3 recording, I¹d appreciate a link), so forgive me if my comments are a bit out of context. Whether or not this group should mandate automated methods for language detection is not something I have a ready thought on, but I imagine that whether or not the contact data language detection is automated, it needs to be accurate in its detection. Is that something that is possible? There were studies done a couple of years ago on accuracy of automated T/T, and those did not prove fruitful. Too many identified languages scripts with large error margins (more on this can be found on the PDP WG wiki page here: https://community.icann.org/display/tatcipdp/7+Studies+ and+Background+Docu ments). Language/script detection may be a different matter, but are there any indications out there that this can indeed be done reliably? That might be something worth considering before contemplating mandatory automated detection as an implementation measure.
Also on this from Roger:
On Jan 5, 2017, at 10:17 PM, Roger D Carney <rcarney@godaddy.com> wrote:
[snip]
In regards to the last point, and as I have stated before in different ways, my expectation coming out of the T/T PDP was that transformation (in its entirety) was not mandatory. And to me this meant none of the recommendations were mandatory but that if you did translate and/or transliterate then these recommendations would apply. My expectation for the T/T implementation group was to come up with an implementation for those that choose to translate and/or transliterate.
YesŠ, this sounds right to me. If a ³gTLD Provider², which I have always understood to be a registry and/or registrar chooses to provide registration services in local languages and scripts with optional transformation, the rest of the consensus policy kicks in. So if a contracted party chooses to transform, it will be required to fill in language tags. It is however my understanding the the whois replacement system needs to be able to provide language tags as a feature, so that in the event that the contracted party chooses to transform data, the ability to tag certain fields is in place. If I recall correctly, the PDP WG also recommended that it be indicated who actually transformed the authoritative data in its local language/script.
You may also find an exchange that took place between the ICANN Board and GNSO Council on some of this to be insightful.
Letter from the ICANN Board to the GNSO Council: https://www.icann.org/en/system/files/correspondence/ crocker-to-bladel-11m ay16-en.pdf
GNSO Council response: https://gnso.icann.org/en/correspondence/gnso-council- to-icann-board-21dec 16-en.pdf
Thanks.
Amr _______________________________________________ Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
_______________________________________________ Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
-- Emily Taylor *MA(Cantab), MBA* Director *Netistrar Ltd *- Domain Names at Trade Prices W: http://www.netistrar.com | M: 07540 049322 | T: 01283 617808 Repton House, Bretby Business Park, Bretby, Derbyshire, DE15 0YZ Registered in England and Wales No. 08735583. VAT No. 190062332
On 5 Jan 2017, at 16:11, Amr Elsadr wrote:
You may also find an exchange that took place between the ICANN Board and GNSO Council on some of this to be insightful.
Letter from the ICANN Board to the GNSO Council: https://www.icann.org/en/system/files/correspondence/crocker-to-bladel-11may...
GNSO Council response: https://gnso.icann.org/en/correspondence/gnso-council-to-icann-board-21dec16...
Yes, well, I guess I’ll admit to crafting most of the text in the GNSO Council response, so if anyone has any issues with it you get to blame me and we can talk about it during our next call. :-) Jim
Hi, No issues on my part, except gratitude to you for getting this done so well. :) Thanks Jim. Amr
On Jan 12, 2017, at 10:22 PM, James Galvin <jgalvin@afilias.info> wrote:
On 5 Jan 2017, at 16:11, Amr Elsadr wrote:
You may also find an exchange that took place between the ICANN Board and GNSO Council on some of this to be insightful.
Letter from the ICANN Board to the GNSO Council: https://www.icann.org/en/system/files/correspondence/crocker-to-bladel-11may...
GNSO Council response: https://gnso.icann.org/en/correspondence/gnso-council-to-icann-board-21dec16...
Yes, well, I guess I’ll admit to crafting most of the text in the GNSO Council response, so if anyone has any issues with it you get to blame me and we can talk about it during our next call.
:-)
Jim
+1 On 5 Jan 2017, at 15:17, Roger D Carney wrote:
Good Afternoon,
Thanks for the update Brian, looking forward to getting back to this work.
In response (mostly so I don't forget J).
Not that you were suggesting this but I think the group should not mandate "automated" methods for detecting, I think we should suggest "automated" but leave it up to the implementers as how they want to detect.
In regards to the last point, and as I have stated before in different ways, my expectation coming out of the T/T PDP was that transformation (in its entirety) was not mandatory. And to me this meant none of the recommendations were mandatory but that if you did translate and/or transliterate then these recommendations would apply. My expectation for the T/T implementation group was to come up with an implementation for those that choose to translate and/or transliterate.
Happy New Year everyone!!
Thanks Roger
From: translationtransliterationirt-bounces@icann.org [mailto:translationtransliterationirt-bounces@icann.org] On Behalf Of Brian Aitchison Sent: Thursday, January 05, 2017 1:32 PM To: translationtransliterationirt@icann.org Subject: [Translationtransliterationirt] T/T Update
Dear T/T IRT members,
Happy New Year! I hope you all had a relaxing holiday.
I've had a chance to review our last call, and there's a number of nuanced issues we need to sort out regarding the T/T WG Recommendations, ie:
* How to approach language and script tags. Some discussion was had on whether we need these at all. Which leads to... * Requirements for gathering language data, if any. The IRT seems to be gravitating toward automated methods for detecting script. Some discussion was had on whether parties requesting transformation should bear the burden of inferring what language was entered by a registrant based on the country he/she entered into the RDS system, script used, and any other method the requesting party deems appropriate to make their transformation. * Reconciling the optional provisions contained within the T/T WG recommendations-especially Rec. 1-with any requirements we identify for contracted parties to gather language and script data to enable transformations. Some of our team is still out on holiday, so we won't have a chance to brainstorm possible approaches and solutions to these issues until next week. Once we've had a chance to do so, I'll send out a doodle poll and invite as usual. I expect our next call to take place during the week of the 16th.
Thanks all very much for your thoughtful contributions to this project. I look forward to our discussions this year.
All best,
Brian
Brian Aitchison, MRes, PhD Lead Researcher Operations & Policy Research Internet Corporation for Assigned Names and Numbers (ICANN) 12025 Waterfront Drive, Suite 300 Los Angeles, CA 90094-2536
Direct Line: +1 310 578 8688 Mobile: +1 424 353 9041 Email: brian.aitchison@icann.org<mailto:brian.aitchison@icann.org> Skype: brian.aitchison.icann Twitter: @BrianAitch LinkedIn: linkedin.com/in/baitchison<http://linkedin.com/in/baitchison> www.icann.org<http://www.icann.org>
CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of the intended recipient(s) and contain information that may be confidential and/or legally privileged. If you have received this email in error, please notify the sender by reply email and delete the message. Any disclosure, copying, distribution or use of this communication by someone other than the intended recipient is prohibited.
_______________________________________________ Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
On 5 Jan 2017, at 14:31, Brian Aitchison wrote:
Dear T/T IRT members,
Happy New Year! I hope you all had a relaxing holiday.
I¹ve had a chance to review our last call, and there¹s a number of nuanced issues we need to sort out regarding the T/T WG Recommendations, ie: * How to approach language and script tags. Some discussion was had on whether we need these at all. Which leads to
I would nuance this in a slightly different way. The scripts that support a given code point are known as part of the Unicode table. So the principle in the recommendation that the script must always be known could be “implemented” by accepting that it’s already present. The principle that the language tag must be present is the problematic one. My thinking has evolved since both the IRD working group and the T&T working group have completed their work. I believe it is consistent with the various recommendations from both groups to suggest the following. It must be possible to store a language tag with a data element (we’ll have to talk about what is meant by an “element” here, e.g., postal address in total or separately for each part of a postal address) if that value is known. A registrar should provide it if known and the registry should store it. If it is present it should be displayed whenever that element is output as part of a directory service. If a data element is transformed, then both the origin language and the destination language must be known and both must be displayed on output. If the originating language tag is not specified, the requestor of the transformation must determine the origin language through a means that is outside the scope of these recommendations. Implications: 1. If a third party (i.e., not the registrar or the registry) is doing the transformation, no additional requirements apply. 2. If a registry or a registrar is doing the transformation, then upon display or storage: a) both forms must always be shown; b) the language tag must always be included.
* Requirements for gathering language data, if any. The IRT seems to be gravitating toward automated methods for detecting script. Some discussion was had on whether parties requesting transformation should bear the burden of inferring what language was entered by a registrant based on the country he/she entered into the RDS system, script used, and any other method the requesting party deems appropriate to make their transformation.
Please see my comment above. Jim
* Reconciling the optional provisions contained within the T/T WG recommendationsespecially Rec. 1with any requirements we identify for contracted parties to gather language and script data to enable transformations. Some of our team is still out on holiday, so we won¹t have a chance to brainstorm possible approaches and solutions to these issues until next week. Once we¹ve had a chance to do so, I¹ll send out a doodle poll and invite as usual. I expect our next call to take place during the week of the 16th.
Thanks all very much for your thoughtful contributions to this project. I look forward to our discussions this year.
All best,
Brian
Brian Aitchison, MRes, PhD Lead Researcher Operations & Policy Research Internet Corporation for Assigned Names and Numbers (ICANN) 12025 Waterfront Drive, Suite 300 Los Angeles, CA 90094-2536
Direct Line: +1 310 578 8688 Mobile: +1 424 353 9041 Email: brian.aitchison@icann.org Skype: brian.aitchison.icann Twitter: @BrianAitch LinkedIn: linkedin.com/in/baitchison <http://linkedin.com/in/baitchison> www.icann.org <http://www.icann.org>
CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of the intended recipient(s) and contain information that may be confidential and/or legally privileged. If you have received this email in error, please notify the sender by reply email and delete the message. Any disclosure, copying, distribution or use of this communication by someone other than the intended recipient is prohibited.
_______________________________________________ Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
Hi Jim, Some thoughts in-line below:
On Jan 12, 2017, at 10:37 PM, James Galvin <jgalvin@afilias.info> wrote:
On 5 Jan 2017, at 14:31, Brian Aitchison wrote:
Dear T/T IRT members,
Happy New Year! I hope you all had a relaxing holiday.
I¹ve had a chance to review our last call, and there¹s a number of nuanced issues we need to sort out regarding the T/T WG Recommendations, ie: * How to approach language and script tags. Some discussion was had on whether we need these at all. Which leads to
I would nuance this in a slightly different way.
The scripts that support a given code point are known as part of the Unicode table. So the principle in the recommendation that the script must always be known could be “implemented” by accepting that it’s already present.
That sounds reasonable to me.
The principle that the language tag must be present is the problematic one. My thinking has evolved since both the IRD working group and the T&T working group have completed their work.
I believe it is consistent with the various recommendations from both groups to suggest the following.
It must be possible to store a language tag with a data element (we’ll have to talk about what is meant by an “element” here, e.g., postal address in total or separately for each part of a postal address) if that value is known.
When the T/T PDP was being chartered (when the working group charter was being drafted), the charter drafting team considered adding what data elements should be considered in the context of transformation of contact information. The response from staff at the time was that this should remain out of scope of this PDP, as it would be considered more thoroughly in another PDP in the future. This was presumably the next generation RDS PDP, which is indeed considering data elements, and should do so taking in to account this PDP as well as the work done by the IRD WG. My understanding of this policy is that the language/script used to enter registration data by a registrant must be easily identifiable, but I’m not sure that a tag needs to be attached to every data element. I’m have no technical expertise on this, so can’t say what is more technically feasible, but all that is really required is that the language/script be identified. This could be interpreted as an independent data element that is included with the rest of the registration data, couldn’t it? I am presuming that the registrant will use only one language/script to enter the contact information, not multiple ones.
A registrar should provide it if known and the registry should store it. If it is present it should be displayed whenever that element is output as part of a directory service.
I don’t see why a registrar wouldn’t know what language/script is being used. It is the business model of the registrar and registry that will dictate what languages/scripts are permitted to be used, no? Surely, it is one that the registrar is familiar with, and is offering services to customers using it?
If a data element is transformed, then both the origin language and the destination language must be known and both must be displayed on output. If the originating language tag is not specified, the requestor of the transformation must determine the origin language through a means that is outside the scope of these recommendations.
I think we may be reading this policy a little differently. My understanding is that regardless of whether or not the authoritative original data is transformed, the language/script used must be identified, not only if it is transformed. If it is transformed, then yes…, but languages/scripts need to be identified, as well as the source of the transformation (who actually did it). There shouldn’t be any need for the requestor of the transformation to have to determine the original language. Have I gotten this wrong?
Implications:
1. If a third party (i.e., not the registrar or the registry) is doing the transformation, no additional requirements apply.
If the registrant is voluntarily doing the transformation, then there should be a field that indicates that the data is transformed, as well as an indication to who did it. Same would apply for the registrar or registry.
2. If a registry or a registrar is doing the transformation, then upon display or storage: a) both forms must always be shown; b) the language tag must always be included.
Yes. And again…, the source of the transformation needs to be included as well. Keeping in mind of course, that the transformed data is not authoritative, and the accuracy of translation/transliteration cannot be guaranteed. Thanks. Amr
* Requirements for gathering language data, if any. The IRT seems to be gravitating toward automated methods for detecting script. Some discussion was had on whether parties requesting transformation should bear the burden of inferring what language was entered by a registrant based on the country he/she entered into the RDS system, script used, and any other method the requesting party deems appropriate to make their transformation.
Please see my comment above.
Jim
* Reconciling the optional provisions contained within the T/T WG recommendationsespecially Rec. 1with any requirements we identify for contracted parties to gather language and script data to enable transformations. Some of our team is still out on holiday, so we won¹t have a chance to brainstorm possible approaches and solutions to these issues until next week. Once we¹ve had a chance to do so, I¹ll send out a doodle poll and invite as usual. I expect our next call to take place during the week of the 16th.
Thanks all very much for your thoughtful contributions to this project. I look forward to our discussions this year.
All best,
Brian
Brian Aitchison, MRes, PhD Lead Researcher Operations & Policy Research Internet Corporation for Assigned Names and Numbers (ICANN) 12025 Waterfront Drive, Suite 300 Los Angeles, CA 90094-2536
Direct Line: +1 310 578 8688 Mobile: +1 424 353 9041 Email: brian.aitchison@icann.org Skype: brian.aitchison.icann Twitter: @BrianAitch LinkedIn: linkedin.com/in/baitchison <http://linkedin.com/in/baitchison> www.icann.org <http://www.icann.org>
CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of the intended recipient(s) and contain information that may be confidential and/or legally privileged. If you have received this email in error, please notify the sender by reply email and delete the message. Any disclosure, copying, distribution or use of this communication by someone other than the intended recipient is prohibited.
_______________________________________________ Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
Hi again, I realize now that some of what I said in the email below may seem confusing and conflicting with what I said in my first email on this thread. To be clear on my understanding of the policy (and I would welcome being corrected on this) — the language/script of the data submitted by the registrant needs to be identified regardless of whether or not transformation is done. If the “gTLD Provider” chooses to transform the contact information, then the language/script of the transformed contact info also needs to be identified, as well as who provided the transformed info. Did I get this right? Thanks again, and apologies for any confusion. Amr
On Jan 13, 2017, at 7:33 PM, Amr Elsadr <aelsadr@egyptig.org> wrote:
Hi Jim,
Some thoughts in-line below:
On Jan 12, 2017, at 10:37 PM, James Galvin <jgalvin@afilias.info> wrote:
On 5 Jan 2017, at 14:31, Brian Aitchison wrote:
Dear T/T IRT members,
Happy New Year! I hope you all had a relaxing holiday.
I¹ve had a chance to review our last call, and there¹s a number of nuanced issues we need to sort out regarding the T/T WG Recommendations, ie: * How to approach language and script tags. Some discussion was had on whether we need these at all. Which leads to
I would nuance this in a slightly different way.
The scripts that support a given code point are known as part of the Unicode table. So the principle in the recommendation that the script must always be known could be “implemented” by accepting that it’s already present.
That sounds reasonable to me.
The principle that the language tag must be present is the problematic one. My thinking has evolved since both the IRD working group and the T&T working group have completed their work.
I believe it is consistent with the various recommendations from both groups to suggest the following.
It must be possible to store a language tag with a data element (we’ll have to talk about what is meant by an “element” here, e.g., postal address in total or separately for each part of a postal address) if that value is known.
When the T/T PDP was being chartered (when the working group charter was being drafted), the charter drafting team considered adding what data elements should be considered in the context of transformation of contact information. The response from staff at the time was that this should remain out of scope of this PDP, as it would be considered more thoroughly in another PDP in the future. This was presumably the next generation RDS PDP, which is indeed considering data elements, and should do so taking in to account this PDP as well as the work done by the IRD WG.
My understanding of this policy is that the language/script used to enter registration data by a registrant must be easily identifiable, but I’m not sure that a tag needs to be attached to every data element. I’m have no technical expertise on this, so can’t say what is more technically feasible, but all that is really required is that the language/script be identified. This could be interpreted as an independent data element that is included with the rest of the registration data, couldn’t it? I am presuming that the registrant will use only one language/script to enter the contact information, not multiple ones.
A registrar should provide it if known and the registry should store it. If it is present it should be displayed whenever that element is output as part of a directory service.
I don’t see why a registrar wouldn’t know what language/script is being used. It is the business model of the registrar and registry that will dictate what languages/scripts are permitted to be used, no? Surely, it is one that the registrar is familiar with, and is offering services to customers using it?
If a data element is transformed, then both the origin language and the destination language must be known and both must be displayed on output. If the originating language tag is not specified, the requestor of the transformation must determine the origin language through a means that is outside the scope of these recommendations.
I think we may be reading this policy a little differently. My understanding is that regardless of whether or not the authoritative original data is transformed, the language/script used must be identified, not only if it is transformed. If it is transformed, then yes…, but languages/scripts need to be identified, as well as the source of the transformation (who actually did it). There shouldn’t be any need for the requestor of the transformation to have to determine the original language.
Have I gotten this wrong?
Implications:
1. If a third party (i.e., not the registrar or the registry) is doing the transformation, no additional requirements apply.
If the registrant is voluntarily doing the transformation, then there should be a field that indicates that the data is transformed, as well as an indication to who did it. Same would apply for the registrar or registry.
2. If a registry or a registrar is doing the transformation, then upon display or storage: a) both forms must always be shown; b) the language tag must always be included.
Yes. And again…, the source of the transformation needs to be included as well. Keeping in mind of course, that the transformed data is not authoritative, and the accuracy of translation/transliteration cannot be guaranteed.
Thanks.
Amr
* Requirements for gathering language data, if any. The IRT seems to be gravitating toward automated methods for detecting script. Some discussion was had on whether parties requesting transformation should bear the burden of inferring what language was entered by a registrant based on the country he/she entered into the RDS system, script used, and any other method the requesting party deems appropriate to make their transformation.
Please see my comment above.
Jim
* Reconciling the optional provisions contained within the T/T WG recommendationsespecially Rec. 1with any requirements we identify for contracted parties to gather language and script data to enable transformations. Some of our team is still out on holiday, so we won¹t have a chance to brainstorm possible approaches and solutions to these issues until next week. Once we¹ve had a chance to do so, I¹ll send out a doodle poll and invite as usual. I expect our next call to take place during the week of the 16th.
Thanks all very much for your thoughtful contributions to this project. I look forward to our discussions this year.
All best,
Brian
Brian Aitchison, MRes, PhD Lead Researcher Operations & Policy Research Internet Corporation for Assigned Names and Numbers (ICANN) 12025 Waterfront Drive, Suite 300 Los Angeles, CA 90094-2536
Direct Line: +1 310 578 8688 Mobile: +1 424 353 9041 Email: brian.aitchison@icann.org Skype: brian.aitchison.icann Twitter: @BrianAitch LinkedIn: linkedin.com/in/baitchison <http://linkedin.com/in/baitchison> www.icann.org <http://www.icann.org>
CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of the intended recipient(s) and contain information that may be confidential and/or legally privileged. If you have received this email in error, please notify the sender by reply email and delete the message. Any disclosure, copying, distribution or use of this communication by someone other than the intended recipient is prohibited.
_______________________________________________ Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
_______________________________________________ Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
On 13 Jan 2017, at 13:19, Amr Elsadr wrote:
Hi again,
I realize now that some of what I said in the email below may seem confusing and conflicting with what I said in my first email on this thread. To be clear on my understanding of the policy (and I would welcome being corrected on this) — the language/script of the data submitted by the registrant needs to be identified regardless of whether or not transformation is done.
Agree regarding script. Language is more problematic; see earlier response.
If the “gTLD Provider” chooses to transform the contact information, then the language/script of the transformed contact info also needs to be identified, as well as who provided the transformed info.
Did I get this right?
Agree. Jim
Thanks again, and apologies for any confusion.
Amr
On Jan 13, 2017, at 7:33 PM, Amr Elsadr <aelsadr@egyptig.org> wrote:
Hi Jim,
Some thoughts in-line below:
On Jan 12, 2017, at 10:37 PM, James Galvin <jgalvin@afilias.info> wrote:
On 5 Jan 2017, at 14:31, Brian Aitchison wrote:
Dear T/T IRT members,
Happy New Year! I hope you all had a relaxing holiday.
I¹ve had a chance to review our last call, and there¹s a number of nuanced issues we need to sort out regarding the T/T WG Recommendations, ie: * How to approach language and script tags. Some discussion was had on whether we need these at all. Which leads to
I would nuance this in a slightly different way.
The scripts that support a given code point are known as part of the Unicode table. So the principle in the recommendation that the script must always be known could be “implemented” by accepting that it’s already present.
That sounds reasonable to me.
The principle that the language tag must be present is the problematic one. My thinking has evolved since both the IRD working group and the T&T working group have completed their work.
I believe it is consistent with the various recommendations from both groups to suggest the following.
It must be possible to store a language tag with a data element (we’ll have to talk about what is meant by an “element” here, e.g., postal address in total or separately for each part of a postal address) if that value is known.
When the T/T PDP was being chartered (when the working group charter was being drafted), the charter drafting team considered adding what data elements should be considered in the context of transformation of contact information. The response from staff at the time was that this should remain out of scope of this PDP, as it would be considered more thoroughly in another PDP in the future. This was presumably the next generation RDS PDP, which is indeed considering data elements, and should do so taking in to account this PDP as well as the work done by the IRD WG.
My understanding of this policy is that the language/script used to enter registration data by a registrant must be easily identifiable, but I’m not sure that a tag needs to be attached to every data element. I’m have no technical expertise on this, so can’t say what is more technically feasible, but all that is really required is that the language/script be identified. This could be interpreted as an independent data element that is included with the rest of the registration data, couldn’t it? I am presuming that the registrant will use only one language/script to enter the contact information, not multiple ones.
A registrar should provide it if known and the registry should store it. If it is present it should be displayed whenever that element is output as part of a directory service.
I don’t see why a registrar wouldn’t know what language/script is being used. It is the business model of the registrar and registry that will dictate what languages/scripts are permitted to be used, no? Surely, it is one that the registrar is familiar with, and is offering services to customers using it?
If a data element is transformed, then both the origin language and the destination language must be known and both must be displayed on output. If the originating language tag is not specified, the requestor of the transformation must determine the origin language through a means that is outside the scope of these recommendations.
I think we may be reading this policy a little differently. My understanding is that regardless of whether or not the authoritative original data is transformed, the language/script used must be identified, not only if it is transformed. If it is transformed, then yes…, but languages/scripts need to be identified, as well as the source of the transformation (who actually did it). There shouldn’t be any need for the requestor of the transformation to have to determine the original language.
Have I gotten this wrong?
Implications:
1. If a third party (i.e., not the registrar or the registry) is doing the transformation, no additional requirements apply.
If the registrant is voluntarily doing the transformation, then there should be a field that indicates that the data is transformed, as well as an indication to who did it. Same would apply for the registrar or registry.
2. If a registry or a registrar is doing the transformation, then upon display or storage: a) both forms must always be shown; b) the language tag must always be included.
Yes. And again…, the source of the transformation needs to be included as well. Keeping in mind of course, that the transformed data is not authoritative, and the accuracy of translation/transliteration cannot be guaranteed.
Thanks.
Amr
* Requirements for gathering language data, if any. The IRT seems to be gravitating toward automated methods for detecting script. Some discussion was had on whether parties requesting transformation should bear the burden of inferring what language was entered by a registrant based on the country he/she entered into the RDS system, script used, and any other method the requesting party deems appropriate to make their transformation.
Please see my comment above.
Jim
* Reconciling the optional provisions contained within the T/T WG recommendationsespecially Rec. 1with any requirements we identify for contracted parties to gather language and script data to enable transformations. Some of our team is still out on holiday, so we won¹t have a chance to brainstorm possible approaches and solutions to these issues until next week. Once we¹ve had a chance to do so, I¹ll send out a doodle poll and invite as usual. I expect our next call to take place during the week of the 16th.
Thanks all very much for your thoughtful contributions to this project. I look forward to our discussions this year.
All best,
Brian
Brian Aitchison, MRes, PhD Lead Researcher Operations & Policy Research Internet Corporation for Assigned Names and Numbers (ICANN) 12025 Waterfront Drive, Suite 300 Los Angeles, CA 90094-2536
Direct Line: +1 310 578 8688 Mobile: +1 424 353 9041 Email: brian.aitchison@icann.org Skype: brian.aitchison.icann Twitter: @BrianAitch LinkedIn: linkedin.com/in/baitchison <http://linkedin.com/in/baitchison> www.icann.org <http://www.icann.org>
CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of the intended recipient(s) and contain information that may be confidential and/or legally privileged. If you have received this email in error, please notify the sender by reply email and delete the message. Any disclosure, copying, distribution or use of this communication by someone other than the intended recipient is prohibited.
_______________________________________________ Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
_______________________________________________ Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
_______________________________________________ Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
Hi, Thanks for the responses, Jim. I found them to be very helpful, and pretty much agree except for one nuance. I do appreciate the challenges you’ve identified regarding identification of languages (as opposed to scripts). I can also think of different scenarios where this could be complicated. For example, there are common names between Arabic and Urdu, which are spelled identically using the same script, as well as common Unicode ranges. If all one had to go on to identify the language of one of those names was having it as an entry in the RDS, then it’d be a challenge to say the least. Having said that, it doesn’t really change the fact that identification of the language is part of the current policy the IRT is tasked with implementing, does it? Speaking for myself, I had hoped that the T/T WG’s recommendation #3 would somehow mitigate this issue:
Recommendation #3 The Working Group recommends that the language(s) and script(s) supported for registrants to submit their contact information data may be chosen in accordance with gTLD-provider business models.
So (hypothetically), a registrar operating out of Dubai (or perhaps even a reseller) offering registration services (including submission of contact info) in Arabic should have some practical way of differentiating between Arabic and Urdu. I certainly don’t have any suggestions on implementation details for this, and look to contracted parties to indicate what is and isn’t feasible. My standing impression of the T/T policy recommendations in their entirety, is that the policy is meant to make the provision of registration services using IRD less, not more complex. If automated language detection will not be a mandatory implementation measure, then my hope is that gTLD providers can come up with innovative ways to get language detection done as reliably as possible. I don’t imagine that there is one solution that will work across all the different languages out there. I’d be very glad to be corrected on this. Thanks. Amr
On Jan 16, 2017, at 7:14 PM, James Galvin <jgalvin@afilias.info> wrote:
On 13 Jan 2017, at 13:19, Amr Elsadr wrote:
Hi again,
I realize now that some of what I said in the email below may seem confusing and conflicting with what I said in my first email on this thread. To be clear on my understanding of the policy (and I would welcome being corrected on this) — the language/script of the data submitted by the registrant needs to be identified regardless of whether or not transformation is done.
Agree regarding script. Language is more problematic; see earlier response.
If the “gTLD Provider” chooses to transform the contact information, then the language/script of the transformed contact info also needs to be identified, as well as who provided the transformed info.
Did I get this right?
Agree.
Jim
Thanks again, and apologies for any confusion.
Amr
On Jan 13, 2017, at 7:33 PM, Amr Elsadr <aelsadr@egyptig.org> wrote:
Hi Jim,
Some thoughts in-line below:
On Jan 12, 2017, at 10:37 PM, James Galvin <jgalvin@afilias.info> wrote:
On 5 Jan 2017, at 14:31, Brian Aitchison wrote:
Dear T/T IRT members,
Happy New Year! I hope you all had a relaxing holiday.
I¹ve had a chance to review our last call, and there¹s a number of nuanced issues we need to sort out regarding the T/T WG Recommendations, ie: * How to approach language and script tags. Some discussion was had on whether we need these at all. Which leads to
I would nuance this in a slightly different way.
The scripts that support a given code point are known as part of the Unicode table. So the principle in the recommendation that the script must always be known could be “implemented” by accepting that it’s already present.
That sounds reasonable to me.
The principle that the language tag must be present is the problematic one. My thinking has evolved since both the IRD working group and the T&T working group have completed their work.
I believe it is consistent with the various recommendations from both groups to suggest the following.
It must be possible to store a language tag with a data element (we’ll have to talk about what is meant by an “element” here, e.g., postal address in total or separately for each part of a postal address) if that value is known.
When the T/T PDP was being chartered (when the working group charter was being drafted), the charter drafting team considered adding what data elements should be considered in the context of transformation of contact information. The response from staff at the time was that this should remain out of scope of this PDP, as it would be considered more thoroughly in another PDP in the future. This was presumably the next generation RDS PDP, which is indeed considering data elements, and should do so taking in to account this PDP as well as the work done by the IRD WG.
My understanding of this policy is that the language/script used to enter registration data by a registrant must be easily identifiable, but I’m not sure that a tag needs to be attached to every data element. I’m have no technical expertise on this, so can’t say what is more technically feasible, but all that is really required is that the language/script be identified. This could be interpreted as an independent data element that is included with the rest of the registration data, couldn’t it? I am presuming that the registrant will use only one language/script to enter the contact information, not multiple ones.
A registrar should provide it if known and the registry should store it. If it is present it should be displayed whenever that element is output as part of a directory service.
I don’t see why a registrar wouldn’t know what language/script is being used. It is the business model of the registrar and registry that will dictate what languages/scripts are permitted to be used, no? Surely, it is one that the registrar is familiar with, and is offering services to customers using it?
If a data element is transformed, then both the origin language and the destination language must be known and both must be displayed on output. If the originating language tag is not specified, the requestor of the transformation must determine the origin language through a means that is outside the scope of these recommendations.
I think we may be reading this policy a little differently. My understanding is that regardless of whether or not the authoritative original data is transformed, the language/script used must be identified, not only if it is transformed. If it is transformed, then yes…, but languages/scripts need to be identified, as well as the source of the transformation (who actually did it). There shouldn’t be any need for the requestor of the transformation to have to determine the original language.
Have I gotten this wrong?
Implications:
1. If a third party (i.e., not the registrar or the registry) is doing the transformation, no additional requirements apply.
If the registrant is voluntarily doing the transformation, then there should be a field that indicates that the data is transformed, as well as an indication to who did it. Same would apply for the registrar or registry.
2. If a registry or a registrar is doing the transformation, then upon display or storage: a) both forms must always be shown; b) the language tag must always be included.
Yes. And again…, the source of the transformation needs to be included as well. Keeping in mind of course, that the transformed data is not authoritative, and the accuracy of translation/transliteration cannot be guaranteed.
Thanks.
Amr
* Requirements for gathering language data, if any. The IRT seems to be gravitating toward automated methods for detecting script. Some discussion was had on whether parties requesting transformation should bear the burden of inferring what language was entered by a registrant based on the country he/she entered into the RDS system, script used, and any other method the requesting party deems appropriate to make their transformation.
Please see my comment above.
Jim
* Reconciling the optional provisions contained within the T/T WG recommendationsespecially Rec. 1with any requirements we identify for contracted parties to gather language and script data to enable transformations. Some of our team is still out on holiday, so we won¹t have a chance to brainstorm possible approaches and solutions to these issues until next week. Once we¹ve had a chance to do so, I¹ll send out a doodle poll and invite as usual. I expect our next call to take place during the week of the 16th.
Thanks all very much for your thoughtful contributions to this project. I look forward to our discussions this year.
All best,
Brian
Brian Aitchison, MRes, PhD Lead Researcher Operations & Policy Research Internet Corporation for Assigned Names and Numbers (ICANN) 12025 Waterfront Drive, Suite 300 Los Angeles, CA 90094-2536
Direct Line: +1 310 578 8688 Mobile: +1 424 353 9041 Email: brian.aitchison@icann.org Skype: brian.aitchison.icann Twitter: @BrianAitch LinkedIn: linkedin.com/in/baitchison <http://linkedin.com/in/baitchison> www.icann.org <http://www.icann.org>
CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of the intended recipient(s) and contain information that may be confidential and/or legally privileged. If you have received this email in error, please notify the sender by reply email and delete the message. Any disclosure, copying, distribution or use of this communication by someone other than the intended recipient is prohibited.
_______________________________________________ Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
_______________________________________________ Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
_______________________________________________ Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
This is an important point Amr and I really don’t know what the best answer is here. On the one hand, the recommendation is specific, and what you are saying is exactly correct. I’ll even go so far as to say that I have argued this position quite strongly, up until recently. On the other hand, my thinking has evolved. I’m much more circumspect now about the practical realities of implementing this, which is to say there’s probably an 80% solution (or something close to it) but with the emphasis on accuracy these days I’m extremely concerned that the remaining 20% solution is neither practical nor, in some cases, even possible. Therefore I’m much more inclined towards a nuanced “spirit” of the recommendation. However, I am sensitive to process, too, so perhaps this is a problem. In any case, this is worthy of some discussion. I’m expecting Roger to have some things to say about the implementation of all this. Jim On 17 Jan 2017, at 8:13, Amr Elsadr wrote:
Hi,
Thanks for the responses, Jim. I found them to be very helpful, and pretty much agree except for one nuance. I do appreciate the challenges you’ve identified regarding identification of languages (as opposed to scripts). I can also think of different scenarios where this could be complicated. For example, there are common names between Arabic and Urdu, which are spelled identically using the same script, as well as common Unicode ranges. If all one had to go on to identify the language of one of those names was having it as an entry in the RDS, then it’d be a challenge to say the least.
Having said that, it doesn’t really change the fact that identification of the language is part of the current policy the IRT is tasked with implementing, does it? Speaking for myself, I had hoped that the T/T WG’s recommendation #3 would somehow mitigate this issue:
Recommendation #3 The Working Group recommends that the language(s) and script(s) supported for registrants to submit their contact information data may be chosen in accordance with gTLD-provider business models.
So (hypothetically), a registrar operating out of Dubai (or perhaps even a reseller) offering registration services (including submission of contact info) in Arabic should have some practical way of differentiating between Arabic and Urdu. I certainly don’t have any suggestions on implementation details for this, and look to contracted parties to indicate what is and isn’t feasible. My standing impression of the T/T policy recommendations in their entirety, is that the policy is meant to make the provision of registration services using IRD less, not more complex. If automated language detection will not be a mandatory implementation measure, then my hope is that gTLD providers can come up with innovative ways to get language detection done as reliably as possible. I don’t imagine that there is one solution that will work across all the different languages out there. I’d be very glad to be corrected on this.
Thanks.
Amr
On Jan 16, 2017, at 7:14 PM, James Galvin <jgalvin@afilias.info> wrote:
On 13 Jan 2017, at 13:19, Amr Elsadr wrote:
Hi again,
I realize now that some of what I said in the email below may seem confusing and conflicting with what I said in my first email on this thread. To be clear on my understanding of the policy (and I would welcome being corrected on this) — the language/script of the data submitted by the registrant needs to be identified regardless of whether or not transformation is done.
Agree regarding script. Language is more problematic; see earlier response.
If the “gTLD Provider” chooses to transform the contact information, then the language/script of the transformed contact info also needs to be identified, as well as who provided the transformed info.
Did I get this right?
Agree.
Jim
Thanks again, and apologies for any confusion.
Amr
On Jan 13, 2017, at 7:33 PM, Amr Elsadr <aelsadr@egyptig.org> wrote:
Hi Jim,
Some thoughts in-line below:
On Jan 12, 2017, at 10:37 PM, James Galvin <jgalvin@afilias.info> wrote:
On 5 Jan 2017, at 14:31, Brian Aitchison wrote:
Dear T/T IRT members,
Happy New Year! I hope you all had a relaxing holiday.
I¹ve had a chance to review our last call, and there¹s a number of nuanced issues we need to sort out regarding the T/T WG Recommendations, ie: * How to approach language and script tags. Some discussion was had on whether we need these at all. Which leads to
I would nuance this in a slightly different way.
The scripts that support a given code point are known as part of the Unicode table. So the principle in the recommendation that the script must always be known could be “implemented” by accepting that it’s already present.
That sounds reasonable to me.
The principle that the language tag must be present is the problematic one. My thinking has evolved since both the IRD working group and the T&T working group have completed their work.
I believe it is consistent with the various recommendations from both groups to suggest the following.
It must be possible to store a language tag with a data element (we’ll have to talk about what is meant by an “element” here, e.g., postal address in total or separately for each part of a postal address) if that value is known.
When the T/T PDP was being chartered (when the working group charter was being drafted), the charter drafting team considered adding what data elements should be considered in the context of transformation of contact information. The response from staff at the time was that this should remain out of scope of this PDP, as it would be considered more thoroughly in another PDP in the future. This was presumably the next generation RDS PDP, which is indeed considering data elements, and should do so taking in to account this PDP as well as the work done by the IRD WG.
My understanding of this policy is that the language/script used to enter registration data by a registrant must be easily identifiable, but I’m not sure that a tag needs to be attached to every data element. I’m have no technical expertise on this, so can’t say what is more technically feasible, but all that is really required is that the language/script be identified. This could be interpreted as an independent data element that is included with the rest of the registration data, couldn’t it? I am presuming that the registrant will use only one language/script to enter the contact information, not multiple ones.
A registrar should provide it if known and the registry should store it. If it is present it should be displayed whenever that element is output as part of a directory service.
I don’t see why a registrar wouldn’t know what language/script is being used. It is the business model of the registrar and registry that will dictate what languages/scripts are permitted to be used, no? Surely, it is one that the registrar is familiar with, and is offering services to customers using it?
If a data element is transformed, then both the origin language and the destination language must be known and both must be displayed on output. If the originating language tag is not specified, the requestor of the transformation must determine the origin language through a means that is outside the scope of these recommendations.
I think we may be reading this policy a little differently. My understanding is that regardless of whether or not the authoritative original data is transformed, the language/script used must be identified, not only if it is transformed. If it is transformed, then yes…, but languages/scripts need to be identified, as well as the source of the transformation (who actually did it). There shouldn’t be any need for the requestor of the transformation to have to determine the original language.
Have I gotten this wrong?
Implications:
1. If a third party (i.e., not the registrar or the registry) is doing the transformation, no additional requirements apply.
If the registrant is voluntarily doing the transformation, then there should be a field that indicates that the data is transformed, as well as an indication to who did it. Same would apply for the registrar or registry.
2. If a registry or a registrar is doing the transformation, then upon display or storage: a) both forms must always be shown; b) the language tag must always be included.
Yes. And again…, the source of the transformation needs to be included as well. Keeping in mind of course, that the transformed data is not authoritative, and the accuracy of translation/transliteration cannot be guaranteed.
Thanks.
Amr
* Requirements for gathering language data, if any. The IRT seems to be gravitating toward automated methods for detecting script. Some discussion was had on whether parties requesting transformation should bear the burden of inferring what language was entered by a registrant based on the country he/she entered into the RDS system, script used, and any other method the requesting party deems appropriate to make their transformation.
Please see my comment above.
Jim
* Reconciling the optional provisions contained within the T/T WG recommendationsespecially Rec. 1with any requirements we identify for contracted parties to gather language and script data to enable transformations. Some of our team is still out on holiday, so we won¹t have a chance to brainstorm possible approaches and solutions to these issues until next week. Once we¹ve had a chance to do so, I¹ll send out a doodle poll and invite as usual. I expect our next call to take place during the week of the 16th.
Thanks all very much for your thoughtful contributions to this project. I look forward to our discussions this year.
All best,
Brian
Brian Aitchison, MRes, PhD Lead Researcher Operations & Policy Research Internet Corporation for Assigned Names and Numbers (ICANN) 12025 Waterfront Drive, Suite 300 Los Angeles, CA 90094-2536
Direct Line: +1 310 578 8688 Mobile: +1 424 353 9041 Email: brian.aitchison@icann.org Skype: brian.aitchison.icann Twitter: @BrianAitch LinkedIn: linkedin.com/in/baitchison <http://linkedin.com/in/baitchison> www.icann.org <http://www.icann.org>
CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of the intended recipient(s) and contain information that may be confidential and/or legally privileged. If you have received this email in error, please notify the sender by reply email and delete the message. Any disclosure, copying, distribution or use of this communication by someone other than the intended recipient is prohibited.
_______________________________________________ Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
_______________________________________________ Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
_______________________________________________ Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
Thanks very much Amr and Jim for these thoughts. We��re having similar discussions on our end, and are crafting a set of decision trees that should help us navigate through the issues we��re bringing up. For our next two calls, rather than discuss policy language, we��ll go through these decision trees to gauge sentiment on the following: 1. Language and script data provisioning: if and how to populate language and script data��ie tags as we��ve been discussing��to enable transformations 2. Transformed data display: how to define standards for showing transformations based on the T/T policy recommendations once we have language and script data populated Amr, Jim: it doesn��t look like either of you will be available for the most popular preferred call time next week. Given your expertise and experience with the T/T PDP, I think it��s important to have your voices in the discussion. And as I think about it, it would be good to give the IRT more than just a few days to consider the first decision tree I was planning to send out on Tuesday (Monday��s a holiday in the US). It��s a small set of decision points, but they will have implications for how we draft the policy. Thus it will be important to have as many voices from the IRT as possible. Bear in mind these decision points will serve as a preliminary gauge of sentiment on which direction to pursue, and not a final determination of anything. Once we have an idea of sentiment, we��ll go back to the policy language to update and discuss further. I will send out another doodle poll presently for the week of the 23rd with more choices for call timing. And expect the first decision tree on Tuesday. Thanks again everyone, Brian On 1/13/17, 9:33 AM, "Amr Elsadr" <aelsadr@egyptig.org> wrote:
Hi Jim,
Some thoughts in-line below:
On Jan 12, 2017, at 10:37 PM, James Galvin <jgalvin@afilias.info> wrote:
On 5 Jan 2017, at 14:31, Brian Aitchison wrote:
Dear T/T IRT members,
Happy New Year! I hope you all had a relaxing holiday.
I��ve had a chance to review our last call, and there��s a number of nuanced issues we need to sort out regarding the T/T WG Recommendations, ie: * How to approach language and script tags. Some discussion was had on whether we need these at all. Which leads to�
I would nuance this in a slightly different way.
The scripts that support a given code point are known as part of the Unicode table. So the principle in the recommendation that the script must always be known could be ��implemented�� by accepting that it��s already present.
That sounds reasonable to me.
The principle that the language tag must be present is the problematic one. My thinking has evolved since both the IRD working group and the T&T working group have completed their work.
I believe it is consistent with the various recommendations from both groups to suggest the following.
It must be possible to store a language tag with a data element (we��ll have to talk about what is meant by an ��element�� here, e.g., postal address in total or separately for each part of a postal address) if that value is known.
When the T/T PDP was being chartered (when the working group charter was being drafted), the charter drafting team considered adding what data elements should be considered in the context of transformation of contact information. The response from staff at the time was that this should remain out of scope of this PDP, as it would be considered more thoroughly in another PDP in the future. This was presumably the next generation RDS PDP, which is indeed considering data elements, and should do so taking in to account this PDP as well as the work done by the IRD WG.
My understanding of this policy is that the language/script used to enter registration data by a registrant must be easily identifiable, but I��m not sure that a tag needs to be attached to every data element. I��m have no technical expertise on this, so can��t say what is more technically feasible, but all that is really required is that the language/script be identified. This could be interpreted as an independent data element that is included with the rest of the registration data, couldn��t it? I am presuming that the registrant will use only one language/script to enter the contact information, not multiple ones.
A registrar should provide it if known and the registry should store it. If it is present it should be displayed whenever that element is output as part of a directory service.
I don��t see why a registrar wouldn��t know what language/script is being used. It is the business model of the registrar and registry that will dictate what languages/scripts are permitted to be used, no? Surely, it is one that the registrar is familiar with, and is offering services to customers using it?
If a data element is transformed, then both the origin language and the destination language must be known and both must be displayed on output. If the originating language tag is not specified, the requestor of the transformation must determine the origin language through a means that is outside the scope of these recommendations.
I think we may be reading this policy a little differently. My understanding is that regardless of whether or not the authoritative original data is transformed, the language/script used must be identified, not only if it is transformed. If it is transformed, then yes��, but languages/scripts need to be identified, as well as the source of the transformation (who actually did it). There shouldn��t be any need for the requestor of the transformation to have to determine the original language.
Have I gotten this wrong?
Implications:
1. If a third party (i.e., not the registrar or the registry) is doing the transformation, no additional requirements apply.
If the registrant is voluntarily doing the transformation, then there should be a field that indicates that the data is transformed, as well as an indication to who did it. Same would apply for the registrar or registry.
2. If a registry or a registrar is doing the transformation, then upon display or storage: a) both forms must always be shown; b) the language tag must always be included.
Yes. And again��, the source of the transformation needs to be included as well. Keeping in mind of course, that the transformed data is not authoritative, and the accuracy of translation/transliteration cannot be guaranteed.
Thanks.
Amr
* Requirements for gathering language data, if any. The IRT seems to be gravitating toward automated methods for detecting script. Some discussion was had on whether parties requesting transformation should bear the burden of inferring what language was entered by a registrant based on the country he/she entered into the RDS system, script used, and any other method the requesting party deems appropriate to make their transformation.
Please see my comment above.
Jim
* Reconciling the optional provisions contained within the T/T WG recommendations�especially Rec. 1�with any requirements we identify for contracted parties to gather language and script data to enable transformations. Some of our team is still out on holiday, so we won��t have a chance to brainstorm possible approaches and solutions to these issues until next week. Once we��ve had a chance to do so, I��ll send out a doodle poll and invite as usual. I expect our next call to take place during the week of the 16th.
Thanks all very much for your thoughtful contributions to this project. I look forward to our discussions this year.
All best,
Brian
Brian Aitchison, MRes, PhD Lead Researcher Operations & Policy Research Internet Corporation for Assigned Names and Numbers (ICANN) 12025 Waterfront Drive, Suite 300 Los Angeles, CA 90094-2536
Direct Line: +1 310 578 8688 Mobile: +1 424 353 9041 Email: brian.aitchison@icann.org Skype: brian.aitchison.icann Twitter: @BrianAitch LinkedIn: linkedin.com/in/baitchison <https://urldefense.proofpoint.com/v2/url?u=http-3A__linkedin.com_in_bai tchison&d=DgIFaQ&c=FmY1u3PJp6wrcrwll3mSVzgfkbPSS6sJms7xcl4I5cM&r=oOkZhM- vcooKF6_v3AmN7zbUWXCLjOjNm5S2_iA3wOQ&m=e67ml8us918b0aYR8f0raoupkMKHVBUbe TxpwBe5bP0&s=6CwMGsEcQk8yAXRS4Ceb9czlKAKkGtmaE-KZHJztQYM&e= > www.icann.org <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.icann.org&d=DgI FaQ&c=FmY1u3PJp6wrcrwll3mSVzgfkbPSS6sJms7xcl4I5cM&r=oOkZhM-vcooKF6_v3AmN 7zbUWXCLjOjNm5S2_iA3wOQ&m=e67ml8us918b0aYR8f0raoupkMKHVBUbeTxpwBe5bP0&s= K6eHHJqqggaPPKx-8QbLOeH5mRdkV_tKltDghhKaDxs&e= >
CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of the intended recipient(s) and contain information that may be confidential and/or legally privileged. If you have received this email in error, please notify the sender by reply email and delete the message. Any disclosure, copying, distribution or use of this communication by someone other than the intended recipient is prohibited.
_______________________________________________ Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
This message is getting a little thick but I’ve added some comments inline while keeping most of the history. On 13 Jan 2017, at 12:33, Amr Elsadr wrote:
Hi Jim,
Some thoughts in-line below:
On Jan 12, 2017, at 10:37 PM, James Galvin <jgalvin@afilias.info> wrote:
On 5 Jan 2017, at 14:31, Brian Aitchison wrote:
The principle that the language tag must be present is the problematic one. My thinking has evolved since both the IRD working group and the T&T working group have completed their work.
I believe it is consistent with the various recommendations from both groups to suggest the following.
It must be possible to store a language tag with a data element (we’ll have to talk about what is meant by an “element” here, e.g., postal address in total or separately for each part of a postal address) if that value is known.
When the T/T PDP was being chartered (when the working group charter was being drafted), the charter drafting team considered adding what data elements should be considered in the context of transformation of contact information. The response from staff at the time was that this should remain out of scope of this PDP, as it would be considered more thoroughly in another PDP in the future. This was presumably the next generation RDS PDP, which is indeed considering data elements, and should do so taking in to account this PDP as well as the work done by the IRD WG.
My understanding of this policy is that the language/script used to enter registration data by a registrant must be easily identifiable, but I’m not sure that a tag needs to be attached to every data element. I’m have no technical expertise on this, so can’t say what is more technically feasible, but all that is really required is that the language/script be identified. This could be interpreted as an independent data element that is included with the rest of the registration data, couldn’t it? I am presuming that the registrant will use only one language/script to enter the contact information, not multiple ones.
A registrar should provide it if known and the registry should store it. If it is present it should be displayed whenever that element is output as part of a directory service.
I don’t see why a registrar wouldn’t know what language/script is being used. It is the business model of the registrar and registry that will dictate what languages/scripts are permitted to be used, no? Surely, it is one that the registrar is familiar with, and is offering services to customers using it?
I called the language tag problematic in part because, as you suggest, there is this perception that it ought to be straightforward but, in fact, it’s not. Just a couple example issues to consider. Should there be a language tag for the “name”? Well, people may change their name for a number of different reasons and each part of the name may have a different language tag (now there’s a technical challenge for you). Consider though, is a name likely to be transformed anyway? Isn’t it best to just leave that alone? The postal address itself probably needs a tag separate from any other element. How about a country like India, with 21 official languages, and scripts that are used in multiple languages. You can probably guess the language by intersecting all the possible languages for each of the code points, but what do you do when still end up with two or more languages to choose from? Registrars have argued that asking a registrant to tell them what language is being used is just asking for trouble. Geolocation for IP address source is pretty good, but what if I’m traveling and I want to buy something while away from my home but use my home address? What language do I default to? Give me some time and I may think of a few more things.
If a data element is transformed, then both the origin language and the destination language must be known and both must be displayed on output. If the originating language tag is not specified, the requestor of the transformation must determine the origin language through a means that is outside the scope of these recommendations.
I think we may be reading this policy a little differently. My understanding is that regardless of whether or not the authoritative original data is transformed, the language/script used must be identified, not only if it is transformed. If it is transformed, then yes…, but languages/scripts need to be identified, as well as the source of the transformation (who actually did it). There shouldn’t be any need for the requestor of the transformation to have to determine the original language.
Have I gotten this wrong?
See above regarding language. I think we already agree regarding script.
Implications:
1. If a third party (i.e., not the registrar or the registry) is doing the transformation, no additional requirements apply.
If the registrant is voluntarily doing the transformation, then there should be a field that indicates that the data is transformed, as well as an indication to who did it. Same would apply for the registrar or registry.
Agree.
2. If a registry or a registrar is doing the transformation, then upon display or storage: a) both forms must always be shown; b) the language tag must always be included.
Yes. And again…, the source of the transformation needs to be included as well. Keeping in mind of course, that the transformed data is not authoritative, and the accuracy of translation/transliteration cannot be guaranteed.
Agree. Jim
Thanks.
Amr
* Requirements for gathering language data, if any. The IRT seems to be gravitating toward automated methods for detecting script. Some discussion was had on whether parties requesting transformation should bear the burden of inferring what language was entered by a registrant based on the country he/she entered into the RDS system, script used, and any other method the requesting party deems appropriate to make their transformation.
Please see my comment above.
Jim
* Reconciling the optional provisions contained within the T/T WG recommendationsespecially Rec. 1with any requirements we identify for contracted parties to gather language and script data to enable transformations. Some of our team is still out on holiday, so we won¹t have a chance to brainstorm possible approaches and solutions to these issues until next week. Once we¹ve had a chance to do so, I¹ll send out a doodle poll and invite as usual. I expect our next call to take place during the week of the 16th.
Thanks all very much for your thoughtful contributions to this project. I look forward to our discussions this year.
All best,
Brian
Brian Aitchison, MRes, PhD Lead Researcher Operations & Policy Research Internet Corporation for Assigned Names and Numbers (ICANN) 12025 Waterfront Drive, Suite 300 Los Angeles, CA 90094-2536
Direct Line: +1 310 578 8688 Mobile: +1 424 353 9041 Email: brian.aitchison@icann.org Skype: brian.aitchison.icann Twitter: @BrianAitch LinkedIn: linkedin.com/in/baitchison <http://linkedin.com/in/baitchison> www.icann.org <http://www.icann.org>
CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of the intended recipient(s) and contain information that may be confidential and/or legally privileged. If you have received this email in error, please notify the sender by reply email and delete the message. Any disclosure, copying, distribution or use of this communication by someone other than the intended recipient is prohibited.
_______________________________________________ Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
Translationtransliterationirt mailing list Translationtransliterationirt@icann.org https://mm.icann.org/mailman/listinfo/translationtransliterationirt
participants (5)
-
Amr Elsadr -
Brian Aitchison -
Emily Taylor -
James Galvin -
Roger D Carney