Dear all Please see the message below on the availability of Unicode-based String Similarity Data, to be used for identifying similar top-level strings, for public comments. The public comment is open until 4 Dec 2025. With kind regards satish ---------- Forwarded message --------- From: Sarmad Hussain <sarmad.hussain@icann.org> Date: Sat, Oct 25, 2025 at 11:19 AM Subject: Public comment on String Similarity Data for top-level domains To: Satish Babu <sbabu@ieee.org> Cc: Heidi Ullrich <Heidi.Ullrich@icann.org> Dear Satish, Following up on discussion right now, please note that ICANN has recently developed the string similarity data <https://www.icann.org/sites/default/files/packages/lgr/sse/sse-data-overview...> (Section 3) based on Unicode characters to help identify similar strings in top-level domain applications. This has been published for public comment <https://www.icann.org/en/public-comment/proceeding/string-similarity-evaluat...> (open until 4 December 2025). The data covers the repertoire of the 27 scripts included in Root Zone Label Generation Rules <https://www.icann.org/resources/pages/root-zone-lgr-2015-06-21-en> (RZ-LGR). The data is strictly based on visual similarity and builds on top of the “same” characters already identified in RZ-LGR through variant relationship. The data has been collected by script experts and tagged at a level of similarity between levels 1-5 (levels 1-3 indicating probable similarity; 4-5 not considered similar enough). The data covers the following scripts: Arabic, Armenian, Bangla, Chinese (Han), Cyrillic, Devanagari, Ethiopic, Georgian, Greek, Gujarati, Gurmukhi, Hebrew, Japanese (Hiragana, Katakana, and Kanji [Han]), Kannada, Khmer, Korean (Hangul and Hanja [Han]), Lao, Latin, Malayalam, Myanmar, Oriya, Sinhala, Tamil, Telugu, Thaana, and Thai. We would invite the relevant ALAC team to take a look at the data and share any feedback. We would also request you to help share it in the relevant networks to seek further input from those interested. Regards, Sarmad