A Compromise Proposal for String Similarity
Can we have a potential middle ground proposal for the treatment of the now "Reserved Names"? 1. I was doing some thinking and although we have been using .redcross and .rodcross, the reality is that in 2012 those would not have been considered "confusingly similar" under the string similarity review. Why do I say this, because the "e" and the "o" are aurally distinct, and do not have the visual similarity that the evaluation was concerned with. They are not near each other on a keyboard, have different vowel sounds and different meanings. As a result we should not be using that example because it skews the entire discussion. 2. What we are really concerned about are visual homoglyphs in the same script. And the big issue is with the latin script. We are not concerned with .redcross with a cyrillic "d" because that would mix the scripts would not be allowed. We are concerned with things like: a).rédcross (Latin small e with acute) b) .reḋcross (Latin small e with dot above or d with dot) c) .redcròss (o with grave) d).redcrośs (s with acute) or: a) .o1ympic (digit 1 for “l”) b) .0lympic (digit 0 for “o”) c) . olymplc (the letter “l” for the letter “i”) 3. Proposal: Treatment of Reserved Names and Similar Strings * * * Homoglyphs: Applications for Reserved Names, and for any homoglyphs of those names, may be submitted only by the entities expressly eligible for those Reserved Names. Applications from any other party are ineligible. * Confusingly Similar Strings (Non-Homoglyphs): * If a qualified entity applies for a Reserved Name, and another applicant applies for a string that is confusingly similar to that Reserved Name (but not a homoglyph), those applications shall not be placed in the same contention set. Both may be delegated. * If no qualified entity applies for the Reserved Name in the current round, and another party applies for a confusingly similar (non-homoglyph) string, that application may proceed. The delegation of such a string shall not prevent the qualified entity from applying for and obtaining the Reserved Name in a future round. * String Confusion Objections and Blocking: * No String Confusion Objection may be filed or sustained by either the qualified applicant for the Reserved Name or the applicant for the confusingly similar string, in this or any future round. * Accordingly, those applications shall never be placed in the same contention set as a result of a string confusion proceeding. * The approval, delegation, or continued operation of one string shall never block or prevent the approval, delegation, or continued operation of the other string on the basis of string confusion or string similarity. 4. I understand this could be considered New Policy, but it could align with the 2014 PDP's vision of "protecting" the exact matches, while at the same time ensuring that we do not allow these homoglyphs get through because of the no string similarity reviews not being performed. This seems to check the boxes of what we (and ICANN staff in my opinion) are really concerned with (I think) and finds a way that they can co-exist. Thoughts? [8ed357ae-3462-4ba2-8758-18c70a79d71c]
Thanks Jeff, The middle ground proposal is valuable because it distinguishes between homoglyphs (which pose clear user confusion risks and should be strictly limited to eligible entities) and non‑homoglyph similar strings (which may safely coexist without blocking). For smaller not‑for‑profits, this distinction is critical. Overly broad similarity restrictions could unintentionally exclude community‑based organizations from applying for mission‑aligned identifiers. Also it is important to ensure the exception and application processes are simple and accessible, so they do not impose heavy financial or procedural burdens on smaller organizations. JUAN MANUEL ROJAS, M.Sc. Director - MINKA DIGITAL ColombiaNPOC Chair - NCSG/GNSO M.Sc. Information Technology Registered Linux User No.533108. http://www.jmanurojas.com -----BEGIN GEEK CODE BLOCK-----Version: 3.1 GIT d- s: a+ C+++ UL P+ L+++ !E !W+++ !N !o K+++ w-- !O M- V PS+ PE-- Y+ PGP+ t+ 5 X++ R tv+ b+ DI D G e+++(+++)>+++ h+ r++ y+ ------END GEEK CODE BLOCK------ El lunes, 15 de septiembre de 2025, 05:10:05 p.m. GMT-5, Jeffrey J. Neuman via SubPro-IRT <subpro-irt@icann.org> escribió: Can we have a potential middle ground proposal for the treatment of the now "Reserved Names"? 1. I was doing some thinking and although we have been using .redcross and .rodcross, the reality is that in 2012 those would not have been considered "confusingly similar" under the string similarity review. Why do I say this, because the "e" and the "o" are aurally distinct, and do not have the visual similarity that the evaluation was concerned with. They are not near each other on a keyboard, have different vowel sounds and different meanings. As a result we should not be using that example because it skews the entire discussion. 2. What we are really concerned about are visual homoglyphs in the same script. And the big issue is with the latin script. We are not concerned with .redcross with a cyrillic "d" because that would mix the scripts would not be allowed. We are concerned with things like:a).rédcross (Latin small e with acute)b) .reḋcross (Latin small e with dot above or d with dot)c) .redcròss (o with grave)d).redcrośs (s with acute) or: a) .o1ympic (digit 1 for “l”)b) .0lympic (digit 0 for “o”)c) . olymplc (the letter “l” for the letter “i”) 3. Proposal: Treatment of Reserved Names and Similar Strings - - - Homoglyphs: Applications for Reserved Names, and for any homoglyphs of those names, may be submitted only by the entities expressly eligible for those Reserved Names. Applications from any other party are ineligible. - Confusingly Similar Strings (Non-Homoglyphs): - If a qualified entity applies for a Reserved Name, and another applicant applies for a string that is confusingly similar to that Reserved Name (but not a homoglyph), those applications shall not be placed in the same contention set. Both may be delegated. - If no qualified entity applies for the Reserved Name in the current round, and another party applies for a confusingly similar (non-homoglyph) string, that application may proceed. The delegation of such a string shall not prevent the qualified entity from applying for and obtaining the Reserved Name in a future round. - String Confusion Objections and Blocking: - No String Confusion Objection may be filed or sustained by either the qualified applicant for the Reserved Name or the applicant for the confusingly similar string, in this or any future round. - Accordingly, those applications shall never be placed in the same contention set as a result of a string confusion proceeding. - The approval, delegation, or continued operation of one string shall never block or prevent the approval, delegation, or continued operation of the other string on the basis of string confusion or string similarity. 4. I understand this could be considered New Policy, but it could align with the 2014 PDP's vision of "protecting" the exact matches, while at the same time ensuring that we do not allow these homoglyphs get through because of the no string similarity reviews not being performed. This seems to check the boxes of what we (and ICANN staff in my opinion) are really concerned with (I think) and finds a way that they can co-exist. Thoughts? _______________________________________________ SubPro-IRT mailing list -- subpro-irt@icann.org To unsubscribe send an email to subpro-irt-leave@icann.org _______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
Thanks for this, Jeff. My initial thought is that, yes, this, in my view, is creating new policy. Ignoring that for a moment, I have to admit that creating a new review process and a new category of strings (homoglyphs and non-homoglyphs) at this late stage in the process makes me a little nervous. Nevertheless, we will a closer look and get back to you. While we ponder, and just for my benefit, would you explain again which strings in your proposal would go through that would NOT go through if the reserved names were simply in the string similarity evaluation (and I agree with your .rodcross likely not being caught in string sim based on 2012). I am not 100% clear on that. Thank you and best wishes. Lars From: "Jeffrey J. Neuman via SubPro-IRT" <subpro-irt@icann.org> Reply to: "Jeffrey J. Neuman" <jeff@jjnsolutions.com> Date: Tuesday, 16 September 2025 at 00:10 To: "subpro-irt@icann.org" <subpro-irt@icann.org> Subject: [SubPro-IRT] A Compromise Proposal for String Similarity Can we have a potential middle ground proposal for the treatment of the now "Reserved Names"? 1. I was doing some thinking and although we have been using .redcross and .rodcross, the reality is that in 2012 those would not have been considered "confusingly similar" under the string similarity review. Why do I say this, because the "e" and the "o" are aurally distinct, and do not have the visual similarity that the evaluation was concerned with. They are not near each other on a keyboard, have different vowel sounds and different meanings. As a result we should not be using that example because it skews the entire discussion. 2. What we are really concerned about are visual homoglyphs in the same script. And the big issue is with the latin script. We are not concerned with .redcross with a cyrillic "d" because that would mix the scripts would not be allowed. We are concerned with things like: a).rédcross (Latin small e with acute) b) .reḋcross (Latin small e with dot above or d with dot) c) .redcròss (o with grave) d).redcrośs (s with acute) or: a) .o1ympic (digit 1 for “l”) b) .0lympic (digit 0 for “o”) c) . olymplc (the letter “l” for the letter “i”) 3. Proposal: Treatment of Reserved Names and Similar Strings * * * Homoglyphs: Applications for Reserved Names, and for any homoglyphs of those names, may be submitted only by the entities expressly eligible for those Reserved Names. Applications from any other party are ineligible. * Confusingly Similar Strings (Non-Homoglyphs): * If a qualified entity applies for a Reserved Name, and another applicant applies for a string that is confusingly similar to that Reserved Name (but not a homoglyph), those applications shall not be placed in the same contention set. Both may be delegated. * If no qualified entity applies for the Reserved Name in the current round, and another party applies for a confusingly similar (non-homoglyph) string, that application may proceed. The delegation of such a string shall not prevent the qualified entity from applying for and obtaining the Reserved Name in a future round. * String Confusion Objections and Blocking: * No String Confusion Objection may be filed or sustained by either the qualified applicant for the Reserved Name or the applicant for the confusingly similar string, in this or any future round. * Accordingly, those applications shall never be placed in the same contention set as a result of a string confusion proceeding. * The approval, delegation, or continued operation of one string shall never block or prevent the approval, delegation, or continued operation of the other string on the basis of string confusion or string similarity. 4. I understand this could be considered New Policy, but it could align with the 2014 PDP's vision of "protecting" the exact matches, while at the same time ensuring that we do not allow these homoglyphs get through because of the no string similarity reviews not being performed. This seems to check the boxes of what we (and ICANN staff in my opinion) are really concerned with (I think) and finds a way that they can co-exist. Thoughts? [Image removed by sender.]
Lars, First, let start with the premise that I do not believe anyone will apply for a string that would be flagged in a string similarity review for any of the Reserved Names. But, operationally, this is the way it would be done (in my view); All applications undergo a string similarity review against the reserved names. If, and only if, there is a positive match for similarity, at that point, you then look to see if the match is a homoglyph of the reserved name. If the answer is no, it passes and would be allowed to go through even if for some other reason it would be considered confusingly similar. What is an example of something that in theory could slip through.....perhaps .olyrnpic (where there is an "r" followed by an "n"). But again, that would absolutely be challenged by the Olympic Committee for a host of reasons such as a "Legal Rights Objection", not to mention the GAC. And do we really think someone would apply for it? But the other part of the proposal which I think is important is that if there is a .rodcross, it would (a) allow .redcross to apply in the subsequent round and prohibit a string confusion objection by .rodcross for confusing similarity. This way they can both co-exist. For me it is not so much the string similarity review that worries me, it is the perception that .rodcross or .redcross could claim that the strings are confusingly similar through a string confusion objection because of the status of being a reserved name. That would be giving .redcross and the reserved names more rights than what was intended. They can use the legal rights objections, but allowing them to use a string confusion objection based on their status as a reserved name for me is a step to far. Its a different way of thinking about the problem. And determining whether one string is a homoglyph of another in the age of AI is pretty simple. I hope that makes sense and trying to be creative. Sincerely, Jeff On 9/16/2025 7:43:30 AM, Lars Hoffmann <lars.hoffmann@icann.org> wrote: Thanks for this, Jeff. My initial thought is that, yes, this, in my view, is creating new policy. Ignoring that for a moment, I have to admit that creating a new review process and a new category of strings (homoglyphs and non-homoglyphs) at this late stage in the process makes me a little nervous. Nevertheless, we will a closer look and get back to you. While we ponder, and just for my benefit, would you explain again which strings in your proposal would go through that would NOT go through if the reserved names were simply in the string similarity evaluation (and I agree with your .rodcross likely not being caught in string sim based on 2012). I am not 100% clear on that. Thank you and best wishes. Lars From: "Jeffrey J. Neuman via SubPro-IRT" <subpro-irt@icann.org> Reply to: "Jeffrey J. Neuman" <jeff@jjnsolutions.com> Date: Tuesday, 16 September 2025 at 00:10 To: "subpro-irt@icann.org" <subpro-irt@icann.org> Subject: [SubPro-IRT] A Compromise Proposal for String Similarity Can we have a potential middle ground proposal for the treatment of the now "Reserved Names"? 1. I was doing some thinking and although we have been using .redcross and .rodcross, the reality is that in 2012 those would not have been considered "confusingly similar" under the string similarity review. Why do I say this, because the "e" and the "o" are aurally distinct, and do not have the visual similarity that the evaluation was concerned with. They are not near each other on a keyboard, have different vowel sounds and different meanings. As a result we should not be using that example because it skews the entire discussion. 2. What we are really concerned about are visual homoglyphs in the same script. And the big issue is with the latin script. We are not concerned with .redcross with a cyrillic "d" because that would mix the scripts would not be allowed. We are concerned with things like: a).rédcross (Latin small e with acute) b) .reḋcross (Latin small e with dot above or d with dot) c) .redcròss (o with grave) d).redcrośs (s with acute) or: a) .o1ympic (digit 1 for “l”) b) .0lympic (digit 0 for “o”) c) . olymplc (the letter “l” for the letter “i”) 3. Proposal: Treatment of Reserved Names and Similar Strings * * * Homoglyphs: Applications for Reserved Names, and for any homoglyphs of those names, may be submitted only by the entities expressly eligible for those Reserved Names. Applications from any other party are ineligible. * Confusingly Similar Strings (Non-Homoglyphs): * If a qualified entity applies for a Reserved Name, and another applicant applies for a string that is confusingly similar to that Reserved Name (but not a homoglyph), those applications shall not be placed in the same contention set. Both may be delegated. * If no qualified entity applies for the Reserved Name in the current round, and another party applies for a confusingly similar (non-homoglyph) string, that application may proceed. The delegation of such a string shall not prevent the qualified entity from applying for and obtaining the Reserved Name in a future round. * String Confusion Objections and Blocking: * No String Confusion Objection may be filed or sustained by either the qualified applicant for the Reserved Name or the applicant for the confusingly similar string, in this or any future round. * Accordingly, those applications shall never be placed in the same contention set as a result of a string confusion proceeding. * The approval, delegation, or continued operation of one string shall never block or prevent the approval, delegation, or continued operation of the other string on the basis of string confusion or string similarity. 4. I understand this could be considered New Policy, but it could align with the 2014 PDP's vision of "protecting" the exact matches, while at the same time ensuring that we do not allow these homoglyphs get through because of the no string similarity reviews not being performed. This seems to check the boxes of what we (and ICANN staff in my opinion) are really concerned with (I think) and finds a way that they can co-exist. Thoughts? [Image removed by sender.] [Image removed by sender. 8ed357ae-3462-4ba2-8758-18c70a79d71c] [5988a385-1796-4918-a628-5a173810a1a1]
participants (4)
-
Jeff Neuman -
Jeffrey J. Neuman -
Juan Manuel Rojas -
Lars Hoffmann