Scope: only two strings?
Dear all, Many questions about our scope were made clear in the call, but not all. Notably, the limitation to a maximum of two strings was implied (see slide 21) but not explicitly decided as far as I could tell. It sounds like a strange limitation to me, given that there are cases where more than two words, even proper names, differ only in having different diacritics. My earlier examples included ø which was declared out of scope, but there're similar cases with "real" diacritics, e.g., Sjoberg, Sjöberg and Sjóberg. Can we consider cases where someone applies for three or more such at the same time, or where one already has two of them and applies for a third one &c? -- Tapani Tarvainen
Hi Tapani, I don't recall anything in our charter which would limit us to pairs. But if there is, it seems like something we should ask the GNSO Council to change. Because, as you say, it seems an artificial and counterproductive limitation. Bill Jouris Sent from Yahoo Mail on Android On Wed, Mar 26, 2025 at 8:10 AM, Tapani Tarvainen via Gnso-latin-diacritics<gnso-latin-diacritics@icann.org> wrote: Dear all, Many questions about our scope were made clear in the call, but not all. Notably, the limitation to a maximum of two strings was implied (see slide 21) but not explicitly decided as far as I could tell. It sounds like a strange limitation to me, given that there are cases where more than two words, even proper names, differ only in having different diacritics. My earlier examples included ø which was declared out of scope, but there're similar cases with "real" diacritics, e.g., Sjoberg, Sjöberg and Sjóberg. Can we consider cases where someone applies for three or more such at the same time, or where one already has two of them and applies for a third one &c? -- Tapani Tarvainen _______________________________________________ Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org
Hi all, Please excuse my ignorance here, as my expertise lies in Indigenous writing systems and I am still getting my head wrapped around the contexts of GNSO and TLDs. When you say “maximum of two strings”, are you referencing a multi-codepoint character – something like: c̓ (\u0063\u0313) ? If so, just for reference, there are many situations in Indigenous languages in North America where a character is made up of 3 codepoints, a base character + 2 diacritics. A couple of examples of this: x̣ʷ (\u0078\u0323\u02B7) ų̄́ (\u0173\u0304\u0301) There are also examples of using precomposed characters with additional diacritic markers: č̓ (\u010D\u0313) which also may have possible identical decomposed forms: č̓ (\u0063\u030C\u0313) Not sure if this is relevant to this particular conversation, so my apologies if I’m out of scope for what folks are discussing here. Regardless, I just thought I’d flag this as something to note for our conversations moving forward! Cheers, Bridget Chase (they/them) Language Technologist Phone: (250) 718-8937 Website: bchasemadethis.com<http://bchasemadethis.com/> From: Bill Jouris via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> Date: Wednesday, March 26, 2025 at 8:38 AM To: gnso-latin-diacritics@icann.org <gnso-latin-diacritics@icann.org> Subject: [Gnso-latin-diacritics] Re: Scope: only two strings? Hi Tapani, I don't recall anything in our charter which would limit us to pairs. But if there is, it seems like something we should ask the GNSO Council to change. Because, as you say, it seems an artificial and counterproductive limitation. Bill Jouris Sent from Yahoo Mail on Android<https://go.onelink.me/107872968?pid=InProduct&c=Global_Internal_YGrowth_Andr...> On Wed, Mar 26, 2025 at 8:10 AM, Tapani Tarvainen via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> wrote: Dear all, Many questions about our scope were made clear in the call, but not all. Notably, the limitation to a maximum of two strings was implied (see slide 21) but not explicitly decided as far as I could tell. It sounds like a strange limitation to me, given that there are cases where more than two words, even proper names, differ only in having different diacritics. My earlier examples included ø which was declared out of scope, but there're similar cases with "real" diacritics, e.g., Sjoberg, Sjöberg and Sjóberg. Can we consider cases where someone applies for three or more such at the same time, or where one already has two of them and applies for a third one &c? -- Tapani Tarvainen _______________________________________________ Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org<mailto:gnso-latin-diacritics@icann.org> To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org<mailto:gnso-latin-diacritics-leave@icann.org>
Hi Bridget, That's a different issue than the one I was raising, the "two strings" refers to two potential gTLDs like .sjöberg and .sjoberg, but it's fascinating and I don't think it's out of scope for this PDP. Indeed it may even be relevant to that two-or-more strings issue, as multiple diacritics on a single letter makes it more likely that there're multiple strings that only differ in diacritics. Incidentally I believe Vietnamese has the same feature, many instances of two diacritics on a single letter. Regards, Tapani On Wed, Mar 26, 2025 at 04:24:33PM +0000, Bridget Chase via Gnso-latin-diacritics (gnso-latin-diacritics@icann.org) wrote:
Hi all,
Please excuse my ignorance here, as my expertise lies in Indigenous writing systems and I am still getting my head wrapped around the contexts of GNSO and TLDs. When you say “maximum of two strings”, are you referencing a multi-codepoint character – something like:
c̓ (\u0063\u0313) ?
If so, just for reference, there are many situations in Indigenous languages in North America where a character is made up of 3 codepoints, a base character + 2 diacritics. A couple of examples of this:
x̣ʷ (\u0078\u0323\u02B7) ų̄́ (\u0173\u0304\u0301)
There are also examples of using precomposed characters with additional diacritic markers:
č̓ (\u010D\u0313)
which also may have possible identical decomposed forms:
č̓ (\u0063\u030C\u0313)
Not sure if this is relevant to this particular conversation, so my apologies if I’m out of scope for what folks are discussing here. Regardless, I just thought I’d flag this as something to note for our conversations moving forward!
Cheers,
Bridget Chase (they/them) Language Technologist
Phone: (250) 718-8937 Website: bchasemadethis.com<http://bchasemadethis.com/>
From: Bill Jouris via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> Date: Wednesday, March 26, 2025 at 8:38 AM To: gnso-latin-diacritics@icann.org <gnso-latin-diacritics@icann.org> Subject: [Gnso-latin-diacritics] Re: Scope: only two strings? Hi Tapani,
I don't recall anything in our charter which would limit us to pairs. But if there is, it seems like something we should ask the GNSO Council to change. Because, as you say, it seems an artificial and counterproductive limitation.
Bill Jouris
Sent from Yahoo Mail on Android<https://go.onelink.me/107872968?pid=InProduct&c=Global_Internal_YGrowth_Andr...>
On Wed, Mar 26, 2025 at 8:10 AM, Tapani Tarvainen via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> wrote: Dear all,
Many questions about our scope were made clear in the call, but not all. Notably, the limitation to a maximum of two strings was implied (see slide 21) but not explicitly decided as far as I could tell.
It sounds like a strange limitation to me, given that there are cases where more than two words, even proper names, differ only in having different diacritics. My earlier examples included ø which was declared out of scope, but there're similar cases with "real" diacritics, e.g., Sjoberg, Sjöberg and Sjóberg.
Can we consider cases where someone applies for three or more such at the same time, or where one already has two of them and applies for a third one &c?
-- Tapani Tarvainen _______________________________________________ Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org<mailto:gnso-latin-diacritics@icann.org> To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org<mailto:gnso-latin-diacritics-leave@icann.org>
_______________________________________________ Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org
-- Tapani Tarvainen
We are glad to be able to benefit from your expertise, Bridget. Do refer to Tapani's answer about strings, and let me complement that this is a term that in ICANN can be very roughly translated to "name". A string may occur in the TLD part, which is the scope of this working group, or in the domain name part, which is something that is already handled in the current environment, where you can register www.hêe.wiki and www.hee.wiki just fine (where "Hêe" is the indigenous South American guaraní word for "yes"). Our big concern is, when the next round of new TLDs happens, will and interested party ve able to apply for both ".hêe" and ".hee"? Hope this helps a bit. On March 26, 2025 4:24:33 PM UTC, Bridget Chase via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> wrote:
Hi all,
Please excuse my ignorance here, as my expertise lies in Indigenous writing systems and I am still getting my head wrapped around the contexts of GNSO and TLDs. When you say “maximum of two strings”, are you referencing a multi-codepoint character – something like:
c̓ (\u0063\u0313) ?
If so, just for reference, there are many situations in Indigenous languages in North America where a character is made up of 3 codepoints, a base character + 2 diacritics. A couple of examples of this:
x̣ʷ (\u0078\u0323\u02B7) ų̄́ (\u0173\u0304\u0301)
There are also examples of using precomposed characters with additional diacritic markers:
č̓ (\u010D\u0313)
which also may have possible identical decomposed forms:
č̓ (\u0063\u030C\u0313)
Not sure if this is relevant to this particular conversation, so my apologies if I’m out of scope for what folks are discussing here. Regardless, I just thought I’d flag this as something to note for our conversations moving forward!
Cheers,
Bridget Chase (they/them) Language Technologist
Phone: (250) 718-8937 Website: bchasemadethis.com<http://bchasemadethis.com/>
From: Bill Jouris via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> Date: Wednesday, March 26, 2025 at 8:38 AM To: gnso-latin-diacritics@icann.org <gnso-latin-diacritics@icann.org> Subject: [Gnso-latin-diacritics] Re: Scope: only two strings? Hi Tapani,
I don't recall anything in our charter which would limit us to pairs. But if there is, it seems like something we should ask the GNSO Council to change. Because, as you say, it seems an artificial and counterproductive limitation.
Bill Jouris
Sent from Yahoo Mail on Android<https://go.onelink.me/107872968?pid=InProduct&c=Global_Internal_YGrowth_Andr...>
On Wed, Mar 26, 2025 at 8:10 AM, Tapani Tarvainen via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> wrote: Dear all,
Many questions about our scope were made clear in the call, but not all. Notably, the limitation to a maximum of two strings was implied (see slide 21) but not explicitly decided as far as I could tell.
It sounds like a strange limitation to me, given that there are cases where more than two words, even proper names, differ only in having different diacritics. My earlier examples included ø which was declared out of scope, but there're similar cases with "real" diacritics, e.g., Sjoberg, Sjöberg and Sjóberg.
Can we consider cases where someone applies for three or more such at the same time, or where one already has two of them and applies for a third one &c?
-- Tapani Tarvainen _______________________________________________ Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org<mailto:gnso-latin-diacritics@icann.org> To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org<mailto:gnso-latin-diacritics-leave@icann.org>
--- Mark W. Datysgeld from Governance Primer
Dear Bridget, As a reference guided tour, giving context and historic back-ground, attached is the 2022 Proposal for a Latin Script Root Zone. (It presents the information in a much detailed fashion compare to the HTML (version below). Also the HTML Latin Script Root zone LGR, where you will find a succinct LGR to consult regarding the scope and specifics to code points being part of the Latin Script Root Zone LGR : https://www.icann.org/sites/default/files/lgr/rz-lgr-5-latin-script-26may22-... In both documents you will find examplse of multicodepoint diacritic characters example : ɛ̱̈ : U+025B U+0331 U+0308 (LATIN SMALL LETTER OPEN E + COMBINING MACRON BELOW + COMBINING DIAERESIS) or ɨ̃ : U+0289 U+0303 (LATIN SMALL LETTER I WITH STROKE + COMBINING TILDE) As Tapani said, you will find in this Latin RZ LGR code points with two diacritics related to the Vietnamese Language (example U+1EE9 , U+1EED, U+1EEF etc) cheers Claude Le 26 mars 2025 à 12 h 25, Bridget Chase via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> a écrit : Hi all, Please excuse my ignorance here, as my expertise lies in Indigenous writing systems and I am still getting my head wrapped around the contexts of GNSO and TLDs. When you say “maximum of two strings”, are you referencing a multi-codepoint character – something like: c̓ (\u0063\u0313) ? If so, just for reference, there are many situations in Indigenous languages in North America where a character is made up of 3 codepoints, a base character + 2 diacritics. A couple of examples of this: x̣ʷ (\u0078\u0323\u02B7) ų̄́ (\u0173\u0304\u0301) There are also examples of using precomposed characters with additional diacritic markers: č̓ (\u010D\u0313) which also may have possible identical decomposed forms: č̓ (\u0063\u030C\u0313) Not sure if this is relevant to this particular conversation, so my apologies if I’m out of scope for what folks are discussing here. Regardless, I just thought I’d flag this as something to note for our conversations moving forward! Cheers, Bridget Chase (they/them) Language Technologist Phone: (250) 718-8937 Website: bchasemadethis.com From: Bill Jouris via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> Date: Wednesday, March 26, 2025 at 8:38 AM To: gnso-latin-diacritics@icann.org <gnso-latin-diacritics@icann.org> Subject: [Gnso-latin-diacritics] Re: Scope: only two strings? Hi Tapani, I don't recall anything in our charter which would limit us to pairs. But if there is, it seems like something we should ask the GNSO Council to change. Because, as you say, it seems an artificial and counterproductive limitation. Bill Jouris Sent from Yahoo Mail on Android On Wed, Mar 26, 2025 at 8:10 AM, Tapani Tarvainen via Gnso-latin-diacritics <gnso-latin-diacritics@icann.org> wrote: Dear all, Many questions about our scope were made clear in the call, but not all. Notably, the limitation to a maximum of two strings was implied (see slide 21) but not explicitly decided as far as I could tell. It sounds like a strange limitation to me, given that there are cases where more than two words, even proper names, differ only in having different diacritics. My earlier examples included ø which was declared out of scope, but there're similar cases with "real" diacritics, e.g., Sjoberg, Sjöberg and Sjóberg. Can we consider cases where someone applies for three or more such at the same time, or where one already has two of them and applies for a third one &c? -- Tapani Tarvainen _______________________________________________ Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org _______________________________________________ Gnso-latin-diacritics mailing list -- gnso-latin-diacritics@icann.org To unsubscribe send an email to gnso-latin-diacritics-leave@icann.org
Hi Bill, My concern here comes from two sources: First, the charter talks about "a base ASCII gTLD and *the* Latin script diacritic version", in singular. It looks like nobody considered the possibility of multiple diacritic versions. Second, looking at today's slide 21, it says that "This may mean that [...] the solution would be limited to a maximum of two strings." That's "may", so perhaps not, but I think we should clarify that. The language in the charter is ambiguous enough that I think we can decide this without asking the council, but if we do it's probably an easy decision for them. Tapani On Wed, Mar 26, 2025 at 03:37:47PM +0000, Bill Jouris via Gnso-latin-diacritics (gnso-latin-diacritics@icann.org) wrote:
Hi Tapani, I don't recall anything in our charter which would limit us to pairs. But if there is, it seems like something we should ask the GNSO Council to change. Because, as you say, it seems an artificial and counterproductive limitation. Bill Jouris
Sent from Yahoo Mail on Android
On Wed, Mar 26, 2025 at 8:10 AM, Tapani Tarvainen via Gnso-latin-diacritics<gnso-latin-diacritics@icann.org> wrote: Dear all,
Many questions about our scope were made clear in the call, but not all. Notably, the limitation to a maximum of two strings was implied (see slide 21) but not explicitly decided as far as I could tell.
It sounds like a strange limitation to me, given that there are cases where more than two words, even proper names, differ only in having different diacritics. My earlier examples included ø which was declared out of scope, but there're similar cases with "real" diacritics, e.g., Sjoberg, Sjöberg and Sjóberg.
Can we consider cases where someone applies for three or more such at the same time, or where one already has two of them and applies for a third one &c?
-- Tapani Tarvainen
participants (5)
-
Bill Jouris -
Bridget Chase -
Claude Menard -
Mark W. Datysgeld -
Tapani Tarvainen