New gTLD "validation" problems...
I did a little Google search for e-mail validation tools and tricks, and checked for assumptions on what TLDs look like. (Searches: "email validation javascript", "email valid javascript", "email validation asp", and so on.) Looking at the results, there's an awful lot of sometimes bad, sometimes horrible code (in JavaScript, VBScript, Perl, and friends) and regular expressions around. For a sample dirty dozen, see the link list in the end of this e-mail. All of these are from either the Google top ten or top twenty for some of the searches I did. The most typical mistakes include assuming that: - TLDs are 2-N characters long, with N ranging anywhere from 3 to 6 (In fact, 3 and 6 seem to be the most common upper bounds assumed; the maybe most absurd case had 4 and a comment explicitly referencing .info...) - It is a good idea to have a hard-coded list of TLDs. Such lists frequently *include* the current set of new gTLDs, so these are good news for the current new gTLD operators, and really bad news for the next round. (In one case, there was at least a comment referencing ICANN and the need to update -- but, of course, these JavaScript code snippets are the kind of stuff which gets deployed and forgotten, so that comment is worthless.) Remarkably, most of the code I looked at just accepted two-letter TLDs, with just one (probably not so popular) exception that would only accept ".tv" and ".us". In general terms, I'd suggest that any advisory the GNSO may initiate on the topic of acceptance problems with respect to new TLDs should generally take up the basic theme that the root zone is a dynamic thing, and that operators and programmers should not make unwarranted assumptions on what's in there. Besides the kinds of programming errors mentioned above, that brings up two more dangerous practices: 1. Downloading a copy of the root zone, installing that on a resolver running bind, practically turning that resolver into a root server. If the root zone copy isn't updated regularly, things will break -- not just when new TLDs are added, but also when existing TLDs migrate to different servers. (What's the transition plan for .org, again?) I have no idea how common this kind of setup actually is. 2. Using fake TLDs for local networks. It's not uncommon to just use a random, unused TLD for machines on an intranet; these host names aren't supposed to be seen on the Internet. Of course, it's extremely easy to screw up this kind of setup, and to inadvertently create a "local" collision with a future TLD. Fixing setups like this might get quite costly. At the same time, all this indicates that the "visibility" problems for new gTLDs will persist for quite some time. The dirty dozen address validators: http://sageweb.sage.org/resources/publications/perl/perl17.html http://www.hexillion.com/samples/#Regex http://www.xs4all.nl/~ppk/js/mailcheck.html?email=webmaster%40nic.museum http://insights.iwarp.com/advanced/javascript/validate/formemail.html?email=... http://javascript.internet.com/forms/email-address-validation.html?email=roe... http://members.blue.net.au/felgall/emailval.js http://forums.devshed.com/archive/1/2002/09/4/44410 http://www.experts-exchange.com/Web/Web_Languages/JavaScript/Q_20572818.html http://www.js-examples.com/example/?ex=946&mode=1 http://javascriptkit.com/script/script2/acheck.shtml?emailcheck=roessler%40d... http://www.aspfree.com/examples/1574,1/examples.aspx http://www.123aspx.com/resdetail.aspx?sfm=308&res=890 -- Thomas Roessler <roessler@does-not-exist.org>
participants (1)
-
Thomas Roessler