Specification 5 - Country names... again..
I've started looking at the two PDFs of country names that ICANN mandates us to reserve in Section 4.2 and 4.3 of Specification 5 of the Registry Agreement. I have a couple of comments on this: a) I can only guess whether or not the PDFs that i use (http://unstats.un.org/unsd/geoinfo/ungegn/docs/ungegn-tech-ref-manual_M87_co... and http://unstats.un.org/unsd/geoinfo/UNGEGN/docs/26th-gegn-docs/WP/WP54_UNGEGN...) are the correct ones, since ICANN failed to include a reference to the documents in the Agreement. b) Extracting names from those documents (particularly in scripts that i'm not familiar with) is absolutely horrible, and will likely lead to interopability problems between registries. c) The mapping of the strings required to be reserved to the actual A-Label is undefined. Therefore, could ICANN please look into creating an reservation list covering these two requirements, and publish that list as a recommended way to implement the two requirements? ICANN could use a tiny tiny fraction of the application fees to contract someone to create an authoritative list of names to be reserved - Or, maybe as a secondary option in case there should be insufficient budget, allocate resources from their "Languages Services Department" http://blog.icann.org/2013/10/introducing-the-language-services-team/ to create this list? I'm sure that many members of the community would welcome this as a useful contribution to ensure a unified approach to those requirements. thanks, Alex Mayrhofer Head of R&D nic.at
On Nov 25, 2013, at 11:18 AM, Alexander Mayrhofer wrote:
I've started looking at the two PDFs of country names that ICANN mandates us to reserve in Section 4.2 and 4.3 of Specification 5 of the Registry Agreement. I have a couple of comments on this:
a) I can only guess whether or not the PDFs that i use (http://unstats.un.org/unsd/geoinfo/ungegn/docs/ungegn-tech-ref-manual_M87_co... and http://unstats.un.org/unsd/geoinfo/UNGEGN/docs/26th-gegn-docs/WP/WP54_UNGEGN...) are the correct ones, since ICANN failed to include a reference to the documents in the Agreement.
b) Extracting names from those documents (particularly in scripts that i'm not familiar with) is absolutely horrible, and will likely lead to interopability problems between registries.
c) The mapping of the strings required to be reserved to the actual A-Label is undefined.
Therefore, could ICANN please look into creating an reservation list covering these two requirements, and publish that list as a recommended way to implement the two requirements?
ICANN could use a tiny tiny fraction of the application fees to contract someone to create an authoritative list of names to be reserved - Or, maybe as a secondary option in case there should be insufficient budget, allocate resources from their "Languages Services Department" http://blog.icann.org/2013/10/introducing-the-language-services-team/ to create this list?
I'm sure that many members of the community would welcome this as a useful contribution to ensure a unified approach to those requirements.
++ I went to the trouble of interacting with the primary sources and obtaining workable references because what is published is quite hard to coerce into the format we all need it. In the process I received rather amusing responses from various organizations that should know better. I'm attaching a list that should be easier to work with. Rather than simply sending them to this mailing list, wouldn't it be better to setup a github repo or similar device so that updates can be made to the data, and we all have access to the latest version of the common work? In the meanwhile, comments and additions are welcome. Best regards -lem
I went to the trouble of interacting with the primary sources and obtaining workable references because what is published is quite hard to coerce into the format we all need it. In the process I received rather amusing responses from various organizations that should know better. I'm attaching a list that should be easier to work with.
Rather than simply sending them to this mailing list, wouldn't it be better to setup a github repo or similar device so that updates can be made to the data, and we all have access to the latest version of the common work? In the meanwhile, comments and additions are welcome.
I agree: https://github.com/centralnic/spec5-reserved-names I've added your list, plus the file I was working on. Let's see if we can work out a format that makes sense for this - JSON? XML? YAML? G. -- Gavin Brown Chief Technology Officer CentralNic Group plc (LSE:CNIC) Innovative, Reliable and Flexible Registry Services for ccTLD, gTLD and private domain name registries https://www.centralnic.com/ CentralNic Group plc is a company registered in England and Wales with company number 8576358. Registered Offices: 35-39 Moorgate, London, EC2R 6AR.
Gavin, We have also done work in this area that we can contribute. It's best that we work off of one authoritative list. Thanks, -- JG James Gould Principal Software Engineer jgould@verisign.com 703-948-3271 (Office) 12061 Bluemont Way Reston, VA 20190 VerisignInc.com On 11/25/13 12:09 PM, "Gavin Brown" <gavin.brown@centralnic.com> wrote:
I went to the trouble of interacting with the primary sources and obtaining workable references because what is published is quite hard to coerce into the format we all need it. In the process I received rather amusing responses from various organizations that should know better. I'm attaching a list that should be easier to work with.
Rather than simply sending them to this mailing list, wouldn't it be better to setup a github repo or similar device so that updates can be made to the data, and we all have access to the latest version of the common work? In the meanwhile, comments and additions are welcome.
I agree:
https://github.com/centralnic/spec5-reserved-names
I've added your list, plus the file I was working on. Let's see if we can work out a format that makes sense for this - JSON? XML? YAML?
G.
-- Gavin Brown Chief Technology Officer CentralNic Group plc (LSE:CNIC) Innovative, Reliable and Flexible Registry Services for ccTLD, gTLD and private domain name registries https://www.centralnic.com/
CentralNic Group plc is a company registered in England and Wales with company number 8576358. Registered Offices: 35-39 Moorgate, London, EC2R 6AR.
Thanks for setting up the github site Gavin. (https://github.com/centralnic/spec5-reserved-names) Some observations: 1. There are 12 items in S5.4.2.txt that end with a hyphen. These names are not dns compatible. It may have been an issue in processing. I think the trailing hyphen can be removed. (фиджи- for example) 2. I've attached Verisign's spreadsheet for section 5.4.3 (UN Member States and translations). There are 7 languages (including both traditional and simplified Chinese). Spaces and hyphens are tricky. The issue is that translation engines (here I mostly use Google's service) will often handle spaces intelligently. For instance, "Central African Republic" translated to Russian becomes "Центрально-Африканская Республика". That is, a hyphen is introduced for the first space. I presume this is a standard for the "Central" modifier. You could choose to create a variant without any hyphens or spaces, or you could try all the permutations of hyphens and spaces. Instead of heading down that rabbit hole, we elected to yield to the translation wherever possible. But we'll respect the group decision and update our implementation accordingly. 3. I think we can change S5.5-igos.txt to s5.6-igos.txt. Intergovernmental orgs are under bullet 6. Again, I appreciate this effort. Thanks Gavin, -- John John Colosi Senior Manager of Product Development JColosi@Verisign.com m: 703-967-4062 t: 703-948-3211 12061 Bluemont Way, Reston VA 20190 VerisignInc.com
Hi John,
Some observations: 1. There are 12 items in S5.4.2.txt that end with a hyphen. These names are not dns compatible. It may have been an issue in processing. I think the trailing hyphen can be removed. (фиджи- for example)
Thanks, I've fixed these.
2. I've attached Verisign's spreadsheet for section 5.4.3 (UN Member States and translations). There are 7 languages (including both traditional and simplified Chinese). Spaces and hyphens are tricky. The issue is that translation engines (here I mostly use Google's service) will often handle spaces intelligently. For instance, "Central African Republic" translated to Russian becomes "Центрально-Африканская Республика". That is, a hyphen is introduced for the first space. I presume this is a standard for the "Central" modifier. You could choose to create a variant without any hyphens or spaces, or you could try all the permutations of hyphens and spaces. Instead of heading down that rabbit hole, we elected to yield to the translation wherever possible. But we'll respect the group decision and update our implementation accordingly.
Much appreciated. I've converted this to text and added it to the repository.
3. I think we can change S5.5-igos.txt to s5.6-igos.txt. Intergovernmental orgs are under bullet 6.
Done. G. -- Gavin Brown Chief Technology Officer CentralNic Group plc (LSE:CNIC) Innovative, Reliable and Flexible Registry Services for ccTLD, gTLD and private domain name registries https://www.centralnic.com/ CentralNic Group plc is a company registered in England and Wales with company number 8576358. Registered Offices: 35-39 Moorgate, London, EC2R 6AR.
Thanks for the great work, really looks usable... Two small things: a) https://github.com/centralnic/spec5-reserved-names/blob/master/S5.4.3.txt seems to contain at least two empty lines b) I found it very useful to keep the ISO3166 code with all the country names, specifically for scripts that are unreadable to me. Would be an option to extend the txt files to csv files, like Austria;AT Autriche;AT I think that would prove very useful.. Alex
-----Ursprüngliche Nachricht----- Von: Colosi, John [mailto:jcolosi@verisign.com] Gesendet: Mittwoch, 27. November 2013 20:13 An: Gould, James; Gavin Brown; Luis Muñoz; Alexander Mayrhofer Cc: gtld-tech@icann.org Betreff: RE: [gtld-tech] Specification 5 - Country names... again.. [att]
Thanks for setting up the github site Gavin. (https://github.com/centralnic/spec5-reserved-names)
Some observations: 1. There are 12 items in S5.4.2.txt that end with a hyphen. These names are not dns compatible. It may have been an issue in processing. I think the trailing hyphen can be removed. (фиджи- for example)
2. I've attached Verisign's spreadsheet for section 5.4.3 (UN Member States and translations). There are 7 languages (including both traditional and simplified Chinese). Spaces and hyphens are tricky. The issue is that translation engines (here I mostly use Google's service) will often handle spaces intelligently. For instance, "Central African Republic" translated to Russian becomes "Центрально-Африканская Республика". That is, a hyphen is introduced for the first space. I presume this is a standard for the "Central" modifier. You could choose to create a variant without any hyphens or spaces, or you could try all the permutations of hyphens and spaces. Instead of heading down that rabbit hole, we elected to yield to the translation wherever possible. But we'll respect the group decision and update our implementation accordingly.
3. I think we can change S5.5-igos.txt to s5.6-igos.txt. Intergovernmental orgs are under bullet 6.
Again, I appreciate this effort. Thanks Gavin, -- John
John Colosi Senior Manager of Product Development JColosi@Verisign.com
m: 703-967-4062 t: 703-948-3211 12061 Bluemont Way, Reston VA 20190
VerisignInc.com
Hi Alex,
a) https://github.com/centralnic/spec5-reserved-names/blob/master/S5.4.3.txt seems to contain at least two empty lines
This has been fixed.
b) I found it very useful to keep the ISO3166 code with all the country names, specifically for scripts that are unreadable to me. Would be an option to extend the txt files to csv files, like
Austria;AT Autriche;AT
I think that would prove very useful..
How about using a comment? Austria # AT Autriche # AT G. -- Gavin Brown Chief Technology Officer CentralNic Group plc (LSE:CNIC) Innovative, Reliable and Flexible Registry Services for ccTLD, gTLD and private domain name registries https://www.centralnic.com/ CentralNic Group plc is a company registered in England and Wales with company number 8576358. Registered Offices: 35-39 Moorgate, London, EC2R 6AR.
b) I found it very useful to keep the ISO3166 code with all the country names, specifically for scripts that are unreadable to me. Would be an option to extend the txt files to csv files, like
Austria;AT Autriche;AT
I think that would prove very useful..
How about using a comment?
Austria # AT Autriche # AT
[Alexander Mayrhofer] Sounds good to me too. I just want to be able to "track down" a label to its data source and country. Alex
Hi Alex, On 28/11/2013 14:02, Alexander Mayrhofer wrote:
b) I found it very useful to keep the ISO3166 code with all the country names, specifically for scripts that are unreadable to me. Would be an option to extend the txt files to csv files, like
Austria;AT Autriche;AT
I think that would prove very useful..
How about using a comment?
Austria # AT Autriche # AT
[Alexander Mayrhofer] Sounds good to me too. I just want to be able to "track down" a label to its data source and country.
OK - the build script will ignore anything after a "#" on each line. G. -- Gavin Brown Chief Technology Officer CentralNic Group plc (LSE:CNIC) Innovative, Reliable and Flexible Registry Services for ccTLD, gTLD and private domain name registries https://www.centralnic.com/ CentralNic Group plc is a company registered in England and Wales with company number 8576358. Registered Offices: 35-39 Moorgate, London, EC2R 6AR.
Hi Gavin, I forked the Reserved Names Repo (https://github.com/centralnic/spec5-reserved-names) and added a small program to merge, normalize, and sort, the various files of reserved names. The program reads all the text files in a directory and produces a single file of unique domains to be reserved. I think this complements the current file organization. The included JAR file has a main class, so double clicking the jar, in a windows environment anyway, should merge all the files in a particular directory and create the single output file. I've issued a Pull Request. Comments welcome. -- John John Colosi Senior Manager of Product Development JColosi@Verisign.com m: 703-967-4062 t: 703-948-3211 12061 Bluemont Way, Reston VA 20190 VerisignInc.com
Hi Gavin, it looks like most of the files in the repo are using the utf8 format. But S5.4.3.txt seems to be in utf16. (It starts with a bunch of surrogate pairs.) I wonder if we can standardize on a single format. I might even suggest using A Labels as copying and pasting and comparing is less error prone (for me). Maybe I just don't have the right tools. If we decide to standardize then I can help with conversions, but wanted to get some input from folks. -- John John Colosi Senior Manager of Product Development JColosi@Verisign.com m: 703-967-4062 t: 703-948-3211 12061 Bluemont Way, Reston VA 20190 VerisignInc.com
On 10/01/2014 15:52, Colosi, John wrote:
Hi Gavin, it looks like most of the files in the repo are using the utf8 format. But S5.4.3.txt seems to be in utf16. (It starts with a bunch of surrogate pairs.) I wonder if we can standardize on a single format.
I'll see what I can do about converting that file to UTF-8. iconv complains for me when I try to convert that file from utf-16 to utf-8, so I wonder if there has been some mixing of encodings when the file was assembled.
I might even suggest using A Labels as copying and pasting and comparing is less error prone (for me). Maybe I just don't have the right tools. If we decide to standardize then I can help with conversions, but wanted to get some input from folks.
Using A-labels would be less error-prone, but also harder for people who speak the relevant languages. Under ideal circumstances, we'd have language experts reviewing the strings, and it would be a real pain for them to have to keep converting A-labels to U-labels and back again. The U-labels are the source code: the stuff that human beings work with. G. -- Gavin Brown Chief Technology Officer CentralNic Group plc (LSE:CNIC) Innovative, Reliable and Flexible Registry Services for ccTLD, gTLD and private domain name registries https://www.centralnic.com/ CentralNic Group plc is a company registered in England and Wales with company number 8576358. Registered Offices: 35-39 Moorgate, London, EC2R 6AR.
Good points Gavin. Makes sense to me. Using your strategy then, we should only check in U-labels to the repo. But it might be nice to have a tool that could capture all of the files in the repo, and "compile" them into a single file of unique A-labels. John Colosi Senior Manager of Product Development JColosi@Verisign.com m: 703-967-4062 t: 703-948-3211 12061 Bluemont Way, Reston VA 20190 VerisignInc.com -----Original Message----- From: Gavin Brown [mailto:gavin.brown@centralnic.com] Sent: Monday, January 13, 2014 9:48 AM To: Colosi, John; gtld-tech@icann.org Cc: Gould, James; Anderson, Marc Subject: Re: [gtld-tech] Specification 5 - Country names... again.. On 10/01/2014 15:52, Colosi, John wrote:
Hi Gavin, it looks like most of the files in the repo are using the utf8 format. But S5.4.3.txt seems to be in utf16. (It starts with a bunch of surrogate pairs.) I wonder if we can standardize on a single format.
I'll see what I can do about converting that file to UTF-8. iconv complains for me when I try to convert that file from utf-16 to utf-8, so I wonder if there has been some mixing of encodings when the file was assembled.
I might even suggest using A Labels as copying and pasting and comparing is less error prone (for me). Maybe I just don't have the right tools. If we decide to standardize then I can help with conversions, but wanted to get some input from folks.
Using A-labels would be less error-prone, but also harder for people who speak the relevant languages. Under ideal circumstances, we'd have language experts reviewing the strings, and it would be a real pain for them to have to keep converting A-labels to U-labels and back again. The U-labels are the source code: the stuff that human beings work with. G. -- Gavin Brown Chief Technology Officer CentralNic Group plc (LSE:CNIC) Innovative, Reliable and Flexible Registry Services for ccTLD, gTLD and private domain name registries https://www.centralnic.com/ CentralNic Group plc is a company registered in England and Wales with company number 8576358. Registered Offices: 35-39 Moorgate, London, EC2R 6AR.
On 25/11/2013 17:09, Gavin Brown wrote:
I've added your list, plus the file I was working on. Let's see if we can work out a format that makes sense for this - JSON? XML? YAML?
Actually, IANA has already provided us with one: the XML file at http://www.icann.org/sites/default/files/packages/reserved-names/ReservedNam... There is a RelaxNG schema at http://www.icann.org/sites/default/files/packages/reserved-names/reservedNam... So I will try massaging the different data sources into this format. G. -- Gavin Brown Chief Technology Officer CentralNic Group plc (LSE:CNIC) Innovative, Reliable and Flexible Registry Services for ccTLD, gTLD and private domain name registries https://www.centralnic.com/ CentralNic Group plc is a company registered in England and Wales with company number 8576358. Registered Offices: 35-39 Moorgate, London, EC2R 6AR.
On 25/11/2013 17:09, Gavin Brown wrote:
I've added your list, plus the file I was working on. Let's see if we can work out a format that makes sense for this - JSON? XML? YAML?
[Alexander Mayrhofer] I'm attaching my version of the ISO3166 english names list (Section 4.1). As ICANN suggested, i went the "conservative" approach, and created multiple labels for names that allow multiple interpretations. There are still a couple of open issues with my list, namely [most of them are politically sensitive]: - For "bosnia and herzegovina", shall we also reserve "bosnia" as well as "herzegovina". At least "bosnia" is well used here in Austria. - Same for "bonaire-sint-eustatius-and-saba". - "brunei-darussalam" - add "brunei" standalone as well? - "cocos (keeling) islands" - add "keeling islands" standalone? - "FALKLAND ISLANDS (MALVINAS)" add "malvinas" standalone? - "HOLY SEE (VATICAN CITY STATE)" - "vatican" just by itself as well? - Particularly interesting: "KOREA, DEMOCRATIC PEOPLE'S REPUBLIC OF/KP" vs "KOREA, DEMOCRATIC PEOPLE'S REPUBLIC OF/KP" - "LAO PEOPLE'S DEMOCRATIC REPUBLIC" - "laos" is more well known... add that too? I'm all in favour of creating a git repository etc. where the community of backend operators creates a unified list in an open manner. ICANN is watching this list, so at least they can't complain we did something wrong afterwards. Alex
Hi Alex, Thanks for your contributions. I have merged them into the lists that Luis and I have already prepared, they are available on Github now.
- For "bosnia and herzegovina", shall we also reserve "bosnia" as well as "herzegovina". At least "bosnia" is well used here in Austria. - Same for "bonaire-sint-eustatius-and-saba". - "brunei-darussalam" - add "brunei" standalone as well?
Perhaps there are people who are more familiar with these countries who can offer guidance? My personal feeling is that it would be better (from a layer 10 POV) to block too many variants than too few.
- "cocos (keeling) islands" - add "keeling islands" standalone? - "FALKLAND ISLANDS (MALVINAS)" add "malvinas" standalone? - "HOLY SEE (VATICAN CITY STATE)" - "vatican" just by itself as well?
Where there are variants on the name, both have been added, so there are entries for the Cocos Islands, the Keeling Islands, the Falkland Islands, the Malvinas, the Holy See and the Vatican.
- Particularly interesting: "KOREA, DEMOCRATIC PEOPLE'S REPUBLIC OF/KP" vs "KOREA, DEMOCRATIC PEOPLE'S REPUBLIC OF/KP"
Those strings appear identical?
- "LAO PEOPLE'S DEMOCRATIC REPUBLIC" - "laos" is more well known... add that too?
"Laos" has been added. G. -- Gavin Brown Chief Technology Officer CentralNic Group plc (LSE:CNIC) Innovative, Reliable and Flexible Registry Services for ccTLD, gTLD and private domain name registries https://www.centralnic.com/ CentralNic Group plc is a company registered in England and Wales with company number 8576358. Registered Offices: 35-39 Moorgate, London, EC2R 6AR.
Perhaps there are people who are more familiar with these countries who can offer guidance? My personal feeling is that it would be better (from a layer 10 POV) to block too many variants than too few.
Why are they (potentially) being blocked at all ? What's the justification for denying (real examples) customers I have of John Laos from registering laos.(something) - the .com is up for some silly amount of money on sedo and he has his name in other tlds and Samantha Keeling from keeling.(something) It's bad enough provisions were put in place for the IOC and RedCross/Lion/Sun/Crescent wihout any real precendent or legality - with Red Lion being the 2nd most popular pub name in the UK and Red Cross being the name of *thousands* of roads and locations throughout the country) Extending that with all this other tripe just spreads the madness. Rob
I've already used the ICANN list below without any major hassles. I needed the a-label representation as we don't store u-label reservations (only under description). +1 for the IANA XML formatted list On 25 Nov 2013, at 10:57 PM, Gavin Brown <gavin.brown@centralnic.com> wrote:
On 25/11/2013 17:09, Gavin Brown wrote:
I've added your list, plus the file I was working on. Let's see if we can work out a format that makes sense for this - JSON? XML? YAML?
Actually, IANA has already provided us with one: the XML file at
http://www.icann.org/sites/default/files/packages/reserved-names/ReservedNam...
There is a RelaxNG schema at
http://www.icann.org/sites/default/files/packages/reserved-names/reservedNam...
So I will try massaging the different data sources into this format.
G.
-- Gavin Brown Chief Technology Officer CentralNic Group plc (LSE:CNIC) Innovative, Reliable and Flexible Registry Services for ccTLD, gTLD and private domain name registries https://www.centralnic.com/
CentralNic Group plc is a company registered in England and Wales with company number 8576358. Registered Offices: 35-39 Moorgate, London, EC2R 6AR.
Hi Alexander, We are also bit confused about the correct document to use. Regarding Section 4.3 of Specification 5 of the RA, when I visit the website for the UNGEGN, it states that the latest list is from June 27th 2007: http://unstats.un.org/unsd/geoinfo/ungegn/docs/9th-uncsgn-docs/econf/9th_UNC... The version you referenced is more recent (from the 26th session in 2011), but the link to it on the site seems to be grayed out. So there are two plausible versions - one that seems to be official, and the other that seems more recent. We would also be very happy if ICANN would just provide the officially mandated list for everyone to use. Thanks, Seth Goldman Google On Mon, Nov 25, 2013 at 11:18 AM, Alexander Mayrhofer < alexander.mayrhofer@nic.at> wrote:
I've started looking at the two PDFs of country names that ICANN mandates us to reserve in Section 4.2 and 4.3 of Specification 5 of the Registry Agreement. I have a couple of comments on this:
a) I can only guess whether or not the PDFs that i use ( http://unstats.un.org/unsd/geoinfo/ungegn/docs/ungegn-tech-ref-manual_M87_co... http://unstats.un.org/unsd/geoinfo/UNGEGN/docs/26th-gegn-docs/WP/WP54_UNGEGN...) are the correct ones, since ICANN failed to include a reference to the documents in the Agreement.
b) Extracting names from those documents (particularly in scripts that i'm not familiar with) is absolutely horrible, and will likely lead to interopability problems between registries.
c) The mapping of the strings required to be reserved to the actual A-Label is undefined.
Therefore, could ICANN please look into creating an reservation list covering these two requirements, and publish that list as a recommended way to implement the two requirements?
ICANN could use a tiny tiny fraction of the application fees to contract someone to create an authoritative list of names to be reserved - Or, maybe as a secondary option in case there should be insufficient budget, allocate resources from their "Languages Services Department" http://blog.icann.org/2013/10/introducing-the-language-services-team/ to create this list?
I'm sure that many members of the community would welcome this as a useful contribution to ensure a unified approach to those requirements.
thanks,
Alex Mayrhofer Head of R&D nic.at
participants (8)
-
Alexander Mayrhofer -
Colosi, John -
Gavin Brown -
Gould, James -
Luis Muñoz -
Mike O'Connell -
rob.golding@astutium.com -
Seth Goldman