Hi, On Wed, Mar 07, 2012 at 12:47:36PM +1100, James Mitchell wrote:
2) The set of code points (or sequences of code points) that are considered equivalent by the registry
Could we please not use "equivalent" for this? That word has caused enough trouble already.
The table should not attempt to place rules on the use of code points within a label as these rules are often non-trivial. One can easily tell whether a name is registered by performing a DNS lookup or a WHOIS query for the name. Alternatively a registrar will be able to notify a potential registrant should a name be considered "invalid".
Withouth rules on the use of code points within a label, the table needs to be supplemented by something else in order to create a complete policy. Where would you want to put those rules?
Further to the above the table should not attempt to define those variants that are activated/allowed/blocked. An active variant can be determined from a query to the DNS or WHOIS and these protocols will have to used considering a variant may have been activated post-registration. Additionally the rules for determining whether a variant can be activated are non-trivial. Consider the example below.
See above. Of course they're not trivial. But they need to be expressed somewhere so that one can unambiguously determine whether a string is a candidate to be a U-label in a zone. If you can't determine that, then the policy doesn't actually cover everything it's supposed to.
To avoid the somewhat common mistake of incorrectly defining equivalence I suggest that equivalent sequences of code points are defined in one place. For example
<char cp="0627"> <var cp="0625"/> </char> <char cp="0625"> <!-- whoops, forgot to identify 0627 as an equivalent character --> </char>
should be expressed as
<equivalent> <char cp="0625"> <char cp="0627"> </equivalent>
This won't work for cases where the alternatation goes only one way. In Russian, for instance, IE (U+0435) can be used (casually, though not formally) where IO (U+0451) is used; but it is never the case that IO can be used where IE is used. Trivially, in French it is sometimes the case that one substitutes undecorated characters for decorated characters; but you don't do things the opposite way. Now, of course, you might just say, "Do them all in order to avoid confusion." But this is an example where at least some registries have talked about wanting to ensure blocking for some cases and withholding for others; and in that case, "equivalence" is certainly the wrong concept and symmetry is not what you get. Best, A -- Andrew Sullivan ajs@anvilwalrusden.com