During the UASG Workshop in Abu Dhabi there was a brief discussion about Punycode converters. 1) Is anyone aware of any punycode converters (particularly in libraries) that produce bad results? 2) Is there a test suite that can be used to test Punycode converters? 3) Would the source of input (typed, cut/paste, input from a data file) make any difference? This probably has to do with RTL scripts Thanks. Don Don Hollander Universal Acceptance Steering Group Skype: don_hollander
Don, let me ask another questions , which punnycode converter produces correct results according to RFC ? If we got that , we will get the list of bad results punny-code converters. Thnx All the Best, Abdalmonem Tharwat Galila Deputy Manager, Dot Masr Registry, Operation Sector. [NTRA Logo 2016] National Telecommunication Regulatory Authority [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523405_telephone] Office Tel.: +2 02 35341582 - +2 02 35341300 [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: Mobile] Mobile: +2 010 00049068 [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: ICON] Fax : +2 02 35370537 [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: oNLINE] Website : http:\\www.mcit.gov.eg<http://www.mcit.gov.eg/> : http:\\www.tra.gov.eg<http://www.mcit.gov.eg/> [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523294_email] E-mail : agalila@mcit.gov.eg<mailto:agalila@mcit.gov.eg> : atharwat@tra.gov.eg<mailto:atharwat@tra.gov.eg> [Description: 1447802547_skype] Skype : abdalmonem.galila [Description: static_qr_code_without_logo] [Description: Description: Description: Description: Description: Description: Description: Description: Description: Description: 1365523469_error]DISCLAIMER This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to which they are addressed. If you have received this email in error please notify your system support manager. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the National Telecom Regulatory Authority (NTRA) . Finally, the recipient should check this email and any attachments for the presence of viruses. The NTRA accepts no liability for any damage caused by any virus transmitted by this email. -----Original Message----- From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Don Hollander Sent: Friday, November 03, 2017 6:42 PM To: Universal Acceptance <ua-discuss@icann.org> Subject: [UA-discuss] Punycode Converters During the UASG Workshop in Abu Dhabi there was a brief discussion about Punycode converters. 1) Is anyone aware of any punycode converters (particularly in libraries) that produce bad results? 2) Is there a test suite that can be used to test Punycode converters? 3) Would the source of input (typed, cut/paste, input from a data file) make any difference? This probably has to do with RTL scripts Thanks. Don Don Hollander Universal Acceptance Steering Group Skype: don_hollander
On 11/3/2017 9:59 AM, Abdalmonem Tharwat Galila wrote:
Don, let me ask another questions , which punnycode converter produces correct results according to RFC ?
If we got that , we will get the list of bad results punny-code converters.
there is no such thing as "punny"-code. A./
Thnx
All the Best,
Abdalmonem Tharwat Galila
Deputy Manager, Dot Masr Registry,
-----Original Message----- From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Don Hollander Sent: Friday, November 03, 2017 6:42 PM To: Universal Acceptance <ua-discuss@icann.org> Subject: [UA-discuss] Punycode Converters
During the UASG Workshop in Abu Dhabi there was a brief discussion about Punycode converters.
1) Is anyone aware of any punycode converters (particularly in libraries) that produce bad results?
2) Is there a test suite that can be used to test Punycode converters?
3) Would the source of input (typed, cut/paste, input from a data file) make any difference? This probably has to do with RTL scripts
Thanks.
Don
Don Hollander
Universal Acceptance Steering Group
Skype: don_hollander
On Fri, Nov 03, 2017 at 04:42:19PM +0000, Don Hollander wrote:
During the UASG Workshop in Abu Dhabi there was a brief discussion about Punycode converters.
1) Is anyone aware of any punycode converters (particularly in libraries) that produce bad results?
What do you mean by "bad" :-) One problem is that, depending on the rules you're using, you get different results. This tool illustrates the issue nicely: https://unicode.org/cldr/utility/idna.jsp
3) Would the source of input (typed, cut/paste, input from a data file) make any difference? This probably has to do with RTL scripts
It probably depends on what the intervening application did. As long as the string really is in NFC, the source should not matter, but if an intervening application did something else to the string before the string went into NFC you'll have a problem. A -- Andrew Sullivan ajs@anvilwalrusden.com
Don: In talking about punycode convertors that "produce bad results", we probably have to distinguish between a narrow, technical view of "bad results", and a more system-level, user view of "bad results". Which did the UASG Workshop discussion refer to? Specifically, to your questions, On 2017-11-03 09:42, Don Hollander wrote:
During the UASG Workshop in Abu Dhabi there was a brief discussion about Punycode converters.
1) Is anyone aware of any punycode converters (particularly in libraries) that produce bad results? As a software engineer, I'm confident that in the narrow technical sense, many punycode converters produce bad results. In other words, they probably have bugs. Most software does. They might be rare, however.
Also, I'm confident that many apps or systems which use internationalised domain names do the conversion to and from A-Label form (punycode conversion) wrong, even if the libraries they use behave correctly. This would be due to bugs in how the app or system uses the library.
2) Is there a test suite that can be used to test Punycode converters? In the narrow, technical sense, our UASG018 /Programming Languages Evaluation Criteria/ document is a test suite, or at least instructions on how to construct a test suite. The obvious next step in the UASG018 is to implement actual test suites, runnable software test code, which exercise the library's Punycode conversion functionality (among other things).
In the system-level, user view, our other evaluation activities would be that "test suite". For instance, the /Evaluation of UA Readiness of Popular Websites/, the /Universal Acceptance of Popular Browser (UASG016)/, etc.
3) Would the source of input (typed, cut/paste, input from a data file) make any difference? This probably has to do with RTL scripts
In the narrow, technical sense, the source of input should make no difference at all. The Punycode conversion algorithm doesn't depend on the source of input. It starts with a sequence of data, and the source of that data is not material. In the system-level, user view, the source of input might well make a difference. I would expect that this takes the form of how the app handles the data before it calls the library. When the user selects a domain name, does the app select all the necessary characters? Does the app implement the Unicode bidi algorithm correctly, for text with both right-to-left and left-to-right components? Does the app pass the domain name correctly to the library? And so on.
Thanks.
Don
Don Hollander Universal Acceptance Steering Group Skype: don_hollander
-- --Jim DeLaHunt, jdlh@jdlh.com http://blog.jdlh.com/ (http://jdlh.com/) multilingual websites consultant 355-1027 Davie St, Vancouver BC V6E 4L2, Canada Canada mobile +1-604-376-8953
On 11/3/2017 12:53 PM, Jim DeLaHunt wrote:
Don:
In talking about punycode convertors that "produce bad results", we probably have to distinguish between a narrow, technical view of "bad results", and a more system-level, user view of "bad results". Which did the UASG Workshop discussion refer to?
It's not just the converters in the browsers, it's converters in various platform libraries as well. Some of those are not even well documented, so you can't tell, without experimentation, what rules they follow. In this context, will UASG adopt a position vis-a-vis UTS#46? That standard attempts to somehow handle both IDNA2003 and IDNA2008 labels. I haven't looked into to what degree it fails valid IDNA2008 labels, but it certainly handles many IDNA2003 ones. Naively, "universal acceptance" would seem to mean you'd want this kind of permissive handling, but it lands you deep in the morass of emoji labels, among other things. A./
Specifically, to your questions,
On 2017-11-03 09:42, Don Hollander wrote:
During the UASG Workshop in Abu Dhabi there was a brief discussion about Punycode converters.
1) Is anyone aware of any punycode converters (particularly in libraries) that produce bad results? As a software engineer, I'm confident that in the narrow technical sense, many punycode converters produce bad results. In other words, they probably have bugs. Most software does. They might be rare, however.
Also, I'm confident that many apps or systems which use internationalised domain names do the conversion to and from A-Label form (punycode conversion) wrong, even if the libraries they use behave correctly. This would be due to bugs in how the app or system uses the library.
2) Is there a test suite that can be used to test Punycode converters? In the narrow, technical sense, our UASG018 /Programming Languages Evaluation Criteria/ document is a test suite, or at least instructions on how to construct a test suite. The obvious next step in the UASG018 is to implement actual test suites, runnable software test code, which exercise the library's Punycode conversion functionality (among other things).
In the system-level, user view, our other evaluation activities would be that "test suite". For instance, the /Evaluation of UA Readiness of Popular Websites/, the /Universal Acceptance of Popular Browser (UASG016)/, etc.
3) Would the source of input (typed, cut/paste, input from a data file) make any difference? This probably has to do with RTL scripts
In the narrow, technical sense, the source of input should make no difference at all. The Punycode conversion algorithm doesn't depend on the source of input. It starts with a sequence of data, and the source of that data is not material.
In the system-level, user view, the source of input might well make a difference. I would expect that this takes the form of how the app handles the data before it calls the library. When the user selects a domain name, does the app select all the necessary characters? Does the app implement the Unicode bidi algorithm correctly, for text with both right-to-left and left-to-right components? Does the app pass the domain name correctly to the library? And so on.
Thanks.
Don
Don Hollander Universal Acceptance Steering Group Skype: don_hollander
-- --Jim DeLaHunt,jdlh@jdlh.com http://blog.jdlh.com/ (http://jdlh.com/) multilingual websites consultant
355-1027 Davie St, Vancouver BC V6E 4L2, Canada Canada mobile +1-604-376-8953
On 03/11/2017 16:42, Don Hollander wrote:
During the UASG Workshop in Abu Dhabi there was a brief discussion about Punycode converters.
1) Is anyone aware of any punycode converters (particularly in libraries) that produce bad results?
Andrew has already noted the problem of defining what you are testing. At this point, I guess that focusing on IDNA2008 and encouraging the world to move in that direction may make the most sense? Though pointing at a library and rating its results BAD when the authors have correctly followed an existing standard would raise some hackles.
2) Is there a test suite that can be used to test Punycode converters?
The UTS#46 test suite at https://www.unicode.org/Public/idna/latest/IdnaTest.txt is a good start.
3) Would the source of input (typed, cut/paste, input from a data file) make any difference? This probably has to do with RTL scripts
For me, input from a data file should be preferred. Copy/Paste exposes you to potential bugs in the host clipboard system, and manual typed entry is obviously error-prone, especially when you consider an operator entering data in scripts with which they are unfamiliar. -- Jim Hague - jim@sinodun.com Never trust a computer you can't lift.
participants (6)
-
Abdalmonem Tharwat Galila -
Andrew Sullivan -
Asmus Freytag -
Don Hollander -
Jim DeLaHunt -
Jim Hague