String Similarity Data for top-level domains, available for Public Comment
Dear all Please see the message below on the availability of Unicode-based String Similarity Data, to be used for identifying similar top-level strings, for public comments. The public comment is open until 4 Dec 2025. With kind regards satish ---------- Forwarded message --------- From: Sarmad Hussain <sarmad.hussain@icann.org> Date: Sat, Oct 25, 2025 at 11:19 AM Subject: Public comment on String Similarity Data for top-level domains To: Satish Babu <sbabu@ieee.org> Cc: Heidi Ullrich <Heidi.Ullrich@icann.org> Dear Satish, Following up on discussion right now, please note that ICANN has recently developed the string similarity data <https://www.icann.org/sites/default/files/packages/lgr/sse/sse-data-overview...> (Section 3) based on Unicode characters to help identify similar strings in top-level domain applications. This has been published for public comment <https://www.icann.org/en/public-comment/proceeding/string-similarity-evaluat...> (open until 4 December 2025). The data covers the repertoire of the 27 scripts included in Root Zone Label Generation Rules <https://www.icann.org/resources/pages/root-zone-lgr-2015-06-21-en> (RZ-LGR). The data is strictly based on visual similarity and builds on top of the “same” characters already identified in RZ-LGR through variant relationship. The data has been collected by script experts and tagged at a level of similarity between levels 1-5 (levels 1-3 indicating probable similarity; 4-5 not considered similar enough). The data covers the following scripts: Arabic, Armenian, Bangla, Chinese (Han), Cyrillic, Devanagari, Ethiopic, Georgian, Greek, Gujarati, Gurmukhi, Hebrew, Japanese (Hiragana, Katakana, and Kanji [Han]), Kannada, Khmer, Korean (Hangul and Hanja [Han]), Lao, Latin, Malayalam, Myanmar, Oriya, Sinhala, Tamil, Telugu, Thaana, and Thai. We would invite the relevant ALAC team to take a look at the data and share any feedback. We would also request you to help share it in the relevant networks to seek further input from those interested. Regards, Sarmad
participants (1)
-
Satish Babu