Hi I am a bit confused about how our IDN table should be structured with respect to variants as I feel section 5, A Model Table Format, of RFC 4290 (https://www.ietf.org/rfc/rfc4290.txt) is a bit ambiguous. I am therefore hoping there is somebody out there with experience of these things that could help me. I have looked at other IDN tables but none seem to help me with the issue below. The RFC states: Each non-comment line in the table starts with the character that is allowed in the registry and expected to be used in registrations, which is also called the "base character". It then goes on to say: If the base character has any variants, the base character is followed by a vertical bar character ("|", ASCII 0x7C) and the variant string. If the base character has more than one variant, the variants are separated by a colon (":", ASCII 0x3A). So if I have the following characters which are equivalent within the registry e (U+0065) = è (U+00E8) = é (U+00E9) = ê (U+00EA) = ë (U+00EB) Would my table need to be U+0065|U+00E8:U+00E9:U+00EA:U+00EB # LATIN SMALL LETTER E (e) which implies that everything after the | is an allowed character or do I have to explicitly take the first statement above into account where I need to put each character with its variants in a separate line therefore listing out all combinations i.e. U+0065|U+00E8:U+00E9:U+00EA:U+00EB # LATIN SMALL LETTER E (e) U+00E8|U+0065:U+00E9:U+00EA:U+00EB # LATIN SMALL LETTER E WITH GRAVE U+00E9|U+0065:U+00E8:U+00EA:U+00EB # LATIN SMALL LETTER E WITH ACUTE U+00EA|U+0065:U+00E8:U+00E9:U+00EB # LATIN SMALL LETTER E WITH CIRCUMFLEX U+00EB|U+0065:U+00E8:U+00E9:U+00EA # LATIN SMALL LETTER E WITH DIAERESIS If the second method is the correct way does the order then become important? Any help would be appreciated. Regards John John Hollifield GTLD Systems and Business Data Lead Nominet Tel: 01865 332333 Mob: 07979 696734 Email: John.Hollifield@nominet.org.uk<mailto:John.Hollifield@nominet.org.uk> John Hollifield GTLD Systems and Business Data Lead Nominet Tel: 01865 332333 Mob: 07979 696734 Email: John.Hollifield@nominet.org.uk<mailto:John.Hollifield@nominet.org.uk>
Hi John, On Oct 22, 2013, at 4:49 AM, John Hollifield <John.Hollifield@nominet.org.uk<mailto:John.Hollifield@nominet.org.uk>> wrote: I am a bit confused about how our IDN table should be structured with respect to variants as I feel section 5, A Model Table Format, of RFC 4290 (https://www.ietf.org/rfc/rfc4290.txt) is a bit ambiguous. I am therefore hoping there is somebody out there with experience of these things that could help me. I have looked at other IDN tables but none seem to help me with the issue below. The RFC states: ... So if I have the following characters which are equivalent within the registry e (U+0065) = è (U+00E8) = é (U+00E9) = ê (U+00EA) = ë (U+00EB) ... U+0065|U+00E8:U+00E9:U+00EA:U+00EB # LATIN SMALL LETTER E (e) U+00E8|U+0065:U+00E9:U+00EA:U+00EB # LATIN SMALL LETTER E WITH GRAVE U+00E9|U+0065:U+00E8:U+00EA:U+00EB # LATIN SMALL LETTER E WITH ACUTE U+00EA|U+0065:U+00E8:U+00E9:U+00EB # LATIN SMALL LETTER E WITH CIRCUMFLEX U+00EB|U+0065:U+00E8:U+00E9:U+00EA # LATIN SMALL LETTER E WITH DIAERESIS If the second method is the correct way does the order then become important? I believe explicitly listing them out, your second method, is more desirable to ensure a naïve implementor gets it right. While some registries do enforce symmetry in their tables, I don't think it can be assumed. As for ordering, I don't think ordering is material within a set of variants. This is probably a good point to raise awareness on work I and others have been doing on producing an improved table format which caters for more complex rulesets. You can find the current draft at http://tools.ietf.org/html/draft-davies-idntables-04. Our goal is to see this become a common format that all existing tables could be ported to. It will be used for the root zone label generation ruleset (i.e. the IDN table that governs variants in the root zone) kim
Thank you for your very clear response Kim, it is very much appreciated. Regards John From: Kim Davies [mailto:kim.davies@icann.org] Sent: 22 October 2013 16:18 To: John Hollifield Cc: gtld-tech@icann.org Subject: Re: [gtld-tech] IDN Tables Hi John, On Oct 22, 2013, at 4:49 AM, John Hollifield <John.Hollifield@nominet.org.uk<mailto:John.Hollifield@nominet.org.uk>> wrote: I am a bit confused about how our IDN table should be structured with respect to variants as I feel section 5, A Model Table Format, of RFC 4290 (https://www.ietf.org/rfc/rfc4290.txt) is a bit ambiguous. I am therefore hoping there is somebody out there with experience of these things that could help me. I have looked at other IDN tables but none seem to help me with the issue below. The RFC states: ... So if I have the following characters which are equivalent within the registry e (U+0065) = è (U+00E8) = é (U+00E9) = ê (U+00EA) = ë (U+00EB) ... U+0065|U+00E8:U+00E9:U+00EA:U+00EB # LATIN SMALL LETTER E (e) U+00E8|U+0065:U+00E9:U+00EA:U+00EB # LATIN SMALL LETTER E WITH GRAVE U+00E9|U+0065:U+00E8:U+00EA:U+00EB # LATIN SMALL LETTER E WITH ACUTE U+00EA|U+0065:U+00E8:U+00E9:U+00EB # LATIN SMALL LETTER E WITH CIRCUMFLEX U+00EB|U+0065:U+00E8:U+00E9:U+00EA # LATIN SMALL LETTER E WITH DIAERESIS If the second method is the correct way does the order then become important? I believe explicitly listing them out, your second method, is more desirable to ensure a naïve implementor gets it right. While some registries do enforce symmetry in their tables, I don't think it can be assumed. As for ordering, I don't think ordering is material within a set of variants. This is probably a good point to raise awareness on work I and others have been doing on producing an improved table format which caters for more complex rulesets. You can find the current draft at http://tools.ietf.org/html/draft-davies-idntables-04. Our goal is to see this become a common format that all existing tables could be ported to. It will be used for the root zone label generation ruleset (i.e. the IDN table that governs variants in the root zone) kim
participants (2)
-
John Hollifield -
Kim Davies