Re: [Latingp] Draft variants principles document - Dotless I - Latingp

April 6, 2018

      During the Brussels workshop we talked about doing visual tests, etc. We asked the IP whether such tests would be acceptable to support the definition of variant sets. They replied, no. Visual confusion if out of scope. We discussed it, accepted that reality and moved on.

About the Japanese case. What they told the Japanese GP is consistent what IP told us in Brussels regarding cross-script homoglyphs. So I see no problem here.

Dotless I case: we were requested to analyze this case due to compatibility issues vis-à-vis IDNA2003, not on grounds of visual confusion. So, all I’m asking is to focus on the ask.

-Dennis

From: Mats Dufberg <mats.dufberg@iis.se>
Date: Friday, April 6, 2018 at 5:21 AM
To: Dennis Tan Tanaka <dtantanaka@verisign.com>, Bill Jouris <bill.jouris@insidethestack.com>
Cc: Mirjana Tasić <mirjana.tasic@rnids.rs>, Latin GP <latingp@icann.org>, Michael Bauland <michael.bauland@knipp.de>
Subject: [EXTERNAL] Re: [Latingp] Draft variants principles document - Dotless I

Dennis,

I believe that a visual test of running text is relevant for Internet identifiers. Internet identifiers are often found in running text. What Bills test shows (which is of no surprise to us) is that people have a hard time noticing minute differences of no relevance.

I think we all agree that homoglyph pairs should be handled by variant rules. And that is because they are visual similar to the extreme.

At the ICANN 61 meeting, the IP actually proposed to the Japanese GP that some pairs of characters from different Unicode scripts (but from a Japanese perspective belonging to the Japanese script) be treated as variants even though those character pairs are not homoglyphs.

Yes, SMALL LETTER I and SMALL LETTER DOTLESS I are interesting because of the complexity of up-casing and down-casing in different locales. But if we are allowed to take upper case into consideration, there are other interesting cases. The upper case of LATIN SMALL LETTER D WITH STROKE (U+0111), LATIN SMALL LETTER ETH (U+00F0) and LATIN SMALL LETTER D WITH TAIL (U+0256) are homoglyphs which opens up for injections of "false" domains.

Mats

---
Mats Dufberg
DNS Specialist, IIS
Mobile: +46 73 065 3899
https://www.iis.se/en/

From: "Tan Tanaka, Dennis" <dtantanaka@verisign.com>
Date: Thursday 5 April 2018 at 20:27
To: Bill Jouris <bill.jouris@insidethestack.com>, Mats Dufberg <mats.dufberg@iis.se>, Michael Bauland <michael.bauland@knipp.de>
Cc: Mirjana Tasić <mirjana.tasic@rnids.rs>, ICANN Latin GP <latingp@icann.org>
Subject: Re: [Latingp] Draft variants principles document - Dotless I

Bill, thanks for this.

I have to question, though, the relevancy of your experiment. Is a visual test of running test relevant for internet identifiers? And on the subject of visual similarity, I believe this has been discussed extensively and this panel has agreed that visual similarity is outside the scope of our work.

The case of the “small dotless I” and “small letter I” is interesting because of the treatment under different locale settings. The focus of our analysis should be on that, taking into account the needs and expectations of different internet users, including the Turkish community.

-Dennis

From: Bill Jouris <bill.jouris@insidethestack.com>
Reply-To: Bill Jouris <bill.jouris@insidethestack.com>
Date: Thursday, April 5, 2018 at 2:05 PM
To: Dennis Tan Tanaka <dtantanaka@verisign.com>, Mats Dufberg <mats.dufberg@iis.se>, Michael Bauland <michael.bauland@knipp.de>
Cc: Mirjana Tasić <mirjana.tasic@rnids.rs>
Subject: [EXTERNAL] Re: [Latingp] Draft variants principles document - Dotless I

I've given some more thought to the Dotless I question.  It occurred to me that there are actually two approaches to the question: analysis and experiment.  So I ran an experiment.  Here are the results:

A dozen subjects were tested.  All were well-educated native speakers of English.  Approximately 1/3 are involved in IT, but none are network experts and none are involved in ICANN.

The subjects were given a paragraph to read (on the subject of variants).  In one word, the lower case I was replaced by a dotless I.  The number of subjects who noticed when reading the paragraph: Zero.

The subjects were then told that the substitution had been made, that it was in the first sentence, and shown the dotless I for information.  Half managed to locate the substitution in 1 or 2 re-reads of the sentence; half took 3 or more tries to spot the substitution – even though they knew what the substitution was and knew that it was there to find.  In short, misreading is the expected result of a substitution.

Accordingly, it is again recommended that U0069 and U0131 be determined to be blocked variants

Happily, the results are the same as the analysis.  I have updated the document with this information.

Bill Jouris
Inside Products
bill.jouris@insidethestack.com
831-659-8360
925-855-9512 (direct)

________________________________
From: "Tan Tanaka, Dennis via Latingp" <latingp@icann.org>
To: "Tan Tanaka, Dennis via Latingp" <latingp@icann.org>
Sent: Thursday, April 5, 2018 9:41 AM
Subject: [Latingp] Draft variants principles document

Need assistance with developing the sections for special cases:

https://docs.google.com/document/d/1IrT_kfildf1SumYUqjkaIkMT-TYx9IRqtuPMV4Yv...

Thanks,
Dennis
_______________________________________________
Latingp mailing list
Latingp@icann.org<mailto:Latingp@icann.org>
https://mm.icann.org/mailman/listinfo/latingp

Re: [Latingp] Draft variants principles document - Dotless I

Tan Tanaka, Dennis

tags

participants (1)