Re: [gtld-tech] Specification 5 - Country names... again..
John, James, I have a few more comments on the lists that we have been collecting on the Github site. Specifically:
2. I've attached Verisign's spreadsheet for section 5.4.3 (UN Member States and translations). There are 7 languages (including both traditional and simplified Chinese). Spaces and hyphens are tricky. The issue is that translation engines (here I mostly use Google's service) will often handle spaces intelligently. For instance, "Central African Republic" translated to Russian becomes "Центрально-Африканская Республика". That is, a hyphen is introduced for the first space. I presume this is a standard for the "Central" modifier. You could choose to create a variant without any hyphens or spaces, or you could try all the permutations of hyphens and spaces. Instead of heading down that rabbit hole, we elected to yield to the translation wherever possible. But we'll respect the group decision and update our implementation accordingly.
On Specification 5.4.3: --------------------------- The agreement says "the list of United Nations member states in 6 official United Nations languages prepared by the Working Group on Country Names of the United Nations Conference on the Standardization of Geographical Names;" I do understand that there is an offical document from the UN that provides translations for those country names. The URL (at least the one i've been using) is: http://unstats.un.org/unsd/geoinfo/UNGEGN/docs/26th-gegn-docs/WP/WP54_UNGEGN... I do understand that the above document already contains readily translated / agreed country names for the six UN languages, as required in 5.4.2. However, i do also understand that the list that's now in the Github repository was actually produced using automated translation, which potentially could yield different results compared to extracting the list of country names from the document (or the respective source file)? Therefore, could you please clarify whether the file that's now on the Github was produced by translation engines, or taken from said document? I think that's a significant impact on whether or not that file is usable to cover ICANN's requirement. On Specification 5.4.2: --------------------------- The agreement requires "the United Nations Group of Experts on Geographical Names, Technical Reference Manual for the Standardization of Geographical Names, Part III Names of Countries of the World". Looking at the respective document (http://unstats.un.org/unsd/geoinfo/ungegn/docs/pubs/UNGEGN%20tech%20ref%20ma...), the actual list (starting on page 185) contains several columns: - ISO3166 code - COUNTRY- English (UN) - ISO Code; language - Short Name - Formal Name It is unclear to me which of the various columns is to be considered for the reserved names list. Candidates would be the "Country-ENGLISH" (although that is likely already covered by the UN list described in 5.4.3?), the "Short Name", and the "Formal Name". However, looking at the file that is now uploaded on the github for the respective Specification (https://github.com/centralnic/spec5-reserved-names/blob/master/S5.4.2.txt), it doesn't seem to correspond to *any* of these columns. for example, the Entry for "AF" (assumed) in the file lists "afganistán" does not exist in the PDF - in neither of the respective columns. I'm slightly worried that therefore the list that is currently uploaded does not fulfill the respective requirement (besides that i think the original requirement is highly inprecise and hence disputable).. Do i look at the wrong files? Do people interpret the requirements in a different form? Any comments appreciated... thanks, Alex
On Dec 4, 2013, at 9:11 AM, Alexander Mayrhofer wrote:
I do understand that the above document already contains readily translated / agreed country names for the six UN languages, as required in 5.4.2. However, i do also understand that the list that's now in the Github repository was actually produced using automated translation, which potentially could yield different results compared to extracting the list of country names from the document (or the respective source file)?
Therefore, could you please clarify whether the file that's now on the Github was produced by translation engines, or taken from said document? I think that's a significant impact on whether or not that file is usable to cover ICANN's requirement.
The file I provided was obtained by querying the source organization directly through their communication channels until reaching a person able to provide the official translations in a usable format. It wasn't automatically translated by our organization. Best regards -lem
Therefore, could you please clarify whether the file that's now on the Github was produced by translation engines, or taken from said document? I think that's a significant impact on whether or not that file is usable to cover ICANN's requirement.
The file I provided was obtained by querying the source organization directly through their communication channels until reaching a person able to provide the official translations in a usable format. It wasn't automatically translated by our organization.
[Alexander Mayrhofer] Thanks for the clarification - sounds perfect to me - guess i was just confused by the discussion :-) lg Alex
Alexander Mayrhofer wrote:
Therefore, could you please clarify whether the file that's now on the Github was produced by translation engines, or taken from said document? I think that's a significant impact on whether or not that file is usable to cover ICANN's requirement.
Thanks for your comments @Alex. The file submitted by Verisign to satisfy Specification 5.4.3 was created using automatic translation of the English Language names of the 193 UN Member states. @Alex, if you have a less-error prone method of arriving at a solution, Verisign would invest some time in reworking the data. But cutting and pasting from the PDF seems error prone. We would need a programmatic solution to consuming those records. And records meant to be consumed programmatically are not generally encoded in PDF. But again, we're open to ideas. -- John John Colosi Senior Manager of Product Development JColosi@Verisign.com m: 703-967-4062 t: 703-948-3211 12061 Bluemont Way, Reston VA 20190 VerisignInc.com
Hey Guys, Veesh here from Charleston Road Registry. Big thanks all who helped curate the GitHub repository. Out of curiosity, is the current text in the Github section 4.3 based on the data pulled directly from UN contacts by Luis or the machine translated text mentioned by John? Cheers veesh On Wed, Dec 4, 2013 at 1:18 PM, Colosi, John <jcolosi@verisign.com> wrote:
Alexander Mayrhofer wrote:
Therefore, could you please clarify whether the file that's now on the Github was produced by translation engines, or taken from said document? I think that's a significant impact on whether or not that file is usable to cover ICANN's requirement.
Thanks for your comments @Alex. The file submitted by Verisign to satisfy Specification 5.4.3 was created using automatic translation of the English Language names of the 193 UN Member states.
@Alex, if you have a less-error prone method of arriving at a solution, Verisign would invest some time in reworking the data. But cutting and pasting from the PDF seems error prone. We would need a programmatic solution to consuming those records. And records meant to be consumed programmatically are not generally encoded in PDF. But again, we're open to ideas.
-- John
John Colosi Senior Manager of Product Development JColosi@Verisign.com
m: 703-967-4062 t: 703-948-3211 12061 Bluemont Way, Reston VA 20190
VerisignInc.com
-- Vishal Jain Business Product Manager, gTLD Google NYC M: 412-418-4201 E: veesh@google.com
Alexander Mayrhofer wrote:
Therefore, could you please clarify whether the file that's now on the Github was produced by translation engines, or taken from said document? I think that's a significant impact on whether or not that file is usable to cover ICANN's requirement.
Thanks for your comments @Alex. The file submitted by Verisign to satisfy Specification 5.4.3 was created using automatic translation of the English Language names of the 193 UN Member states.
[Alexander Mayrhofer] Thanks for the clarification. As i said above, it would be great to hear from ICANN what they intended with the 5.4.3 requirement - more specifically, which components of the date in the document cited in the requirement is to be used. My point is - the requirements are vague, and hence there is no clear way to quantify the quality of a solution. Which, in turn, is bound to create interopability issues.
@Alex, if you have a less-error prone method of arriving at a solution, Verisign would invest some time in reworking the data. But cutting and pasting from the PDF seems error prone. We would need a programmatic solution to consuming those records. And records meant to be consumed programmatically are not generally encoded in PDF. But again, we're open to ideas.
[Alexander Mayrhofer] I totally agree that cut & past from the PDF is extremely error prone, even if there would be someone understanding all the scripts used in the document. Alex
(I hate replying to my own messages... but, well..)
Thanks for your comments @Alex. The file submitted by Verisign to satisfy Specification 5.4.3 was created using automatic translation of the English Language names of the 193 UN Member states.
[Alexander Mayrhofer] Thanks for the clarification. As i said above, it would be great to hear from ICANN what they intended with the 5.4.3 requirement - more specifically, which components of the date in the document cited in the requirement is to be used.
My point is - the requirements are vague, and hence there is no clear way to quantify the quality of a solution. Which, in turn, is bound to create interopability issues.
[Alexander Mayrhofer] I did mean 5.4.2 here, which, imho, is much more vague than the 5.4.3 requirement. It seems to me as if both https://github.com/centralnic/spec5-reserved-names/blob/master/S5.4.2.txt https://github.com/centralnic/spec5-reserved-names/blob/master/S5.4.3.txt aim at covering the "translation into the 6 UN languages" (which is Requirement 5.4.3), while none of them covers the (vague) requirement of 5.4.2...? Alex
participants (4)
-
Alexander Mayrhofer -
Colosi, John -
Luis Muñoz -
Vishal Jain