String Similarity Data for top-level domains, available for Public Comment

Oct. 25, 2025

      Dear all

Please see the message below on the availability of Unicode-based String
Similarity Data, to be used for identifying similar top-level strings, for
public comments. The public comment is open until 4 Dec 2025.

With kind regards

satish

---------- Forwarded message ---------
From: Sarmad Hussain <sarmad.hussain@icann.org>
Date: Sat, Oct 25, 2025 at 11:19 AM
Subject: Public comment on String Similarity Data for top-level domains
To: Satish Babu <sbabu@ieee.org>
Cc: Heidi Ullrich <Heidi.Ullrich@icann.org>

Dear Satish,

Following up on discussion right now, please note that ICANN has recently
developed the string similarity data
<https://www.icann.org/sites/default/files/packages/lgr/sse/sse-data-overview...>
 (Section 3) based on Unicode characters to help identify similar strings
in top-level domain applications. This has been published for public comment
<https://www.icann.org/en/public-comment/proceeding/string-similarity-evaluat...>
 (open until 4 December 2025).

The data covers the repertoire of the 27 scripts included in Root Zone
Label Generation Rules
<https://www.icann.org/resources/pages/root-zone-lgr-2015-06-21-en> (RZ-LGR).
The data is strictly based on visual similarity and builds on top of the
“same” characters already identified in RZ-LGR through variant
relationship. The data has been collected by script experts and tagged at a
level of similarity between levels 1-5 (levels 1-3 indicating probable
similarity; 4-5 not considered similar enough).

The data covers the following scripts: Arabic, Armenian, Bangla, Chinese
(Han), Cyrillic, Devanagari, Ethiopic, Georgian, Greek, Gujarati, Gurmukhi,
Hebrew, Japanese (Hiragana, Katakana, and Kanji [Han]), Kannada, Khmer,
Korean (Hangul and Hanja [Han]), Lao, Latin, Malayalam, Myanmar, Oriya,
Sinhala, Tamil, Telugu, Thaana, and Thai.

We would invite the relevant ALAC team to take a look at the data and share
any feedback. We would also request you to help share it in the relevant
networks to seek further input from those interested.

Regards,
Sarmad

Satish Babu

tags

participants (1)