[At-Large] String Similarity Data for top-level domains, available for Public Comment

Oct. 25, 2025

      Dear all

Please see the message below on the availability of Unicode-based String
Similarity Data, to be used for identifying similar top-level strings, for
public comments. The public comment is open until 4 Dec 2025.

With kind regards

satish

---------- Forwarded message ---------
From: Sarmad Hussain <sarmad.hussain@icann.org>
Date: Sat, Oct 25, 2025 at 11:19 AM
Subject: Public comment on String Similarity Data for top-level domains
To: Satish Babu <sbabu@ieee.org>
Cc: Heidi Ullrich <Heidi.Ullrich@icann.org>

Dear Satish,

Following up on discussion right now, please note that ICANN has recently
developed the string similarity data
<https://www.icann.org/sites/default/files/packages/lgr/sse/sse-data-overview...>
 (Section 3) based on Unicode characters to help identify similar strings
in top-level domain applications. This has been published for public comment
<https://www.icann.org/en/public-comment/proceeding/string-similarity-evaluat...>
 (open until 4 December 2025).

The data covers the repertoire of the 27 scripts included in Root Zone
Label Generation Rules
<https://www.icann.org/resources/pages/root-zone-lgr-2015-06-21-en> (RZ-LGR).
The data is strictly based on visual similarity and builds on top of the
“same” characters already identified in RZ-LGR through variant
relationship. The data has been collected by script experts and tagged at a
level of similarity between levels 1-5 (levels 1-3 indicating probable
similarity; 4-5 not considered similar enough).

The data covers the following scripts: Arabic, Armenian, Bangla, Chinese
(Han), Cyrillic, Devanagari, Ethiopic, Georgian, Greek, Gujarati, Gurmukhi,
Hebrew, Japanese (Hiragana, Katakana, and Kanji [Han]), Kannada, Khmer,
Korean (Hangul and Hanja [Han]), Lao, Latin, Malayalam, Myanmar, Oriya,
Sinhala, Tamil, Telugu, Thaana, and Thai.

We would invite the relevant ALAC team to take a look at the data and share
any feedback. We would also request you to help share it in the relevant
networks to seek further input from those interested.

Regards,
Sarmad