The Open Dot as a label delimiter in Chinese and Japanese
G’day: The UASG has in the past indicated that good practice is to treat the Open Dot as a label delimiter, just like the traditional full-stop. The ideographic full stop (U+3002 [。]) is used in languages such as Chinese or Japanese to mark the end of a sentence. UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.” We found that some browsers do this. As we go through the Linkification review, we’re not seeing this happen for social media communications apps. Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script? Don Don Hollander Universal Acceptance Steering Group Skype: don_hollander
If you look at any site in Japanese, such as sony.jp, or japantimes http://members.japantimes.co.jp/sub/index_ja.html, et al, and look at articles or any area that has full sentences as opposed to labels or headlines, you will see open dots used alongside Japanese text and no ascii dots. I also see open dots in Japanese tweets https://twitter.com/JN_Japanese So widespread would be an understatement. Is that what you were asking? tex From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Don Hollander Sent: Thursday, November 2, 2017 5:10 PM To: Universal Acceptance Subject: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese G’day: The UASG has in the past indicated that good practice is to treat the Open Dot as a label delimiter, just like the traditional full-stop. The ideographic full stop (U+3002 [。]) is used in languages such as Chinese or Japanese to mark the end of a sentence. UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.” We found that some browsers do this. As we go through the Linkification review, we’re not seeing this happen for social media communications apps. Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script? Don Don Hollander Universal Acceptance Steering Group Skype: don_hollander
Hi Don, This is Zuan Zhang from China. In Chinese, we use "。” rather than “.” to end a sentence. This is common sense. Everywhere, including in social media, as long as it is written in Chinese. Take the ICANN announcements released in Chinese as an example: 1. https://www.icann.org/news/announcement-4-2017-11-02-zh 马拉喀什现被定为 ICANN 2019 年政策论坛会址<https://www.icann.org/news/announcement-4-2017-11-02-zh> www.icann.org 洛杉矶——2017 年 11 月 2 日——摩洛哥马拉喀什市 (Marrakech, Morocco) 现被选定为 ICANN 第 65 届公共会议的会址。该会议将于 2019 年 6 月 24 日至 27 日在非洲地区举行。本届政策论坛将由地中海互联网协会联盟 (Mediterranean Federation of Internet Associations, FMAI)... 2.https://www.icann.org/news/announcement-2-2017-10-19-zh ICANN 历史项目:探寻 ICANN 早年岁月<https://www.icann.org/news/announcement-2-2017-10-19-zh> www.icann.org 洛杉矶——2017 年 10 月 19 日——ICANN 很高兴地宣布 ICANN 历史项目第二主题现已发布——ICANN 早年岁月。 ICANN 历史项目采访了 ICANN 建立之初及后续发展期间的重要人物。我们按照不同主题呈现这一项目,使您能够深入了解您最感兴趣的话题。 第一主题重点关注了 ICANN 与美国政府之间的关系。第二主题则关注 ICANN... Hope this helps. Best Regards Zuan Zhang(Peter Green) ________________________________ 发件人: ua-discuss-bounces@icann.org <ua-discuss-bounces@icann.org> 代表 Tex <textexin@xencraft.com> 发送时间: 2017年11月3日 9:28 收件人: 'Don Hollander'; 'Universal Acceptance' 主题: Re: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese If you look at any site in Japanese, such as sony.jp, or japantimes http://members.japantimes.co.jp/sub/index_ja.html, et al, and look at articles or any area that has full sentences as opposed to labels or headlines, you will see open dots used alongside Japanese text and no ascii dots. I also see open dots in Japanese tweets https://twitter.com/JN_Japanese [https://abs.twimg.com/a/1509045556/img/search/ic_places_foursquare_logo.png]<https://twitter.com/JN_Japanese> The Japan News 日本語 (@JN_Japanese) | Twitter<https://twitter.com/JN_Japanese> twitter.com The latest Tweets from The Japan News 日本語 (@JN_Japanese). 読売新聞英字新聞「The Japan News(ジャパン・ニューズ)」 からの最新 ... So widespread would be an understatement. Is that what you were asking? tex From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Don Hollander Sent: Thursday, November 2, 2017 5:10 PM To: Universal Acceptance Subject: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese G’day: The UASG has in the past indicated that good practice is to treat the Open Dot as a label delimiter, just like the traditional full-stop. The ideographic full stop (U+3002 [。]) is used in languages such as Chinese or Japanese to mark the end of a sentence. UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.” We found that some browsers do this. As we go through the Linkification review, we’re not seeing this happen for social media communications apps. Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script? Don Don Hollander Universal Acceptance Steering Group Skype: don_hollander
Thanks Zuan Zhang and Tex. I the Open Dot widely used in domain names as label separators? D
On 3/11/2017, at 3:05 PM, Peter Green <seekcommunications@hotmail.com> wrote:
Hi Don,
This is Zuan Zhang from China. In Chinese, we use "。” rather than “.” to end a sentence. This is common sense. Everywhere, including in social media, as long as it is written in Chinese.
Take the ICANN announcements released in Chinese as an example:
https://www.icann.org/news/announcement-4-2017-11-02-zh[icann.org] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.icann.org_news_anno...>
马拉喀什现被定为 ICANN 2019 年政策论坛会址[icann.org] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.icann.org_news_anno...> www.icann.org 洛杉矶——2017 年 11 月 2 日——摩洛哥马拉喀什市 (Marrakech, Morocco) 现被选定为 ICANN 第 65 届公共会议的会址。该会议将于 2019 年 6 月 24 日至 27 日在非洲地区举行。本届政策论坛将由地中海互联网协会联盟 (Mediterranean Federation of Internet Associations, FMAI)...
2.https://www.icann.org/news/announcement-2-2017-10-19-zh[icann.org] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.icann.org_news_anno...> ICANN 历史项目:探寻 ICANN 早年岁月[icann.org] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.icann.org_news_anno...> www.icann.org 洛杉矶——2017 年 10 月 19 日——ICANN 很高兴地宣布 ICANN 历史项目第二主题现已发布——ICANN 早年岁月。 ICANN 历史项目采访了 ICANN 建立之初及后续发展期间的重要人物。我们按照不同主题呈现这一项目,使您能够深入了解您最感兴趣的话题。 第一主题重点关注了 ICANN 与美国政府之间的关系。第二主题则关注 ICANN... Hope this helps.
Best Regards Zuan Zhang(Peter Green)
发件人: ua-discuss-bounces@icann.org <ua-discuss-bounces@icann.org> 代表 Tex <textexin@xencraft.com> 发送时间: 2017年11月3日 9:28 收件人: 'Don Hollander'; 'Universal Acceptance' 主题: Re: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese
If you look at any site in Japanese, such as sony.jp, or japantimes http://members.japantimes.co.jp/sub/index_ja.html[members.japantimes.co.jp] <https://urldefense.proofpoint.com/v2/url?u=http-3A__members.japantimes.co.jp...>, et al, and look at articles or any area that has full sentences as opposed to labels or headlines, you will see open dots used alongside Japanese text and no ascii dots.
I also see open dots in Japanese tweets https://twitter.com/JN_Japanese[twitter.com] <https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_JN-5FJapane...> [twitter.com] <https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_JN-5FJapane...> The Japan News 日本語 (@JN_Japanese) | Twitter[twitter.com] <https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_JN-5FJapane...> twitter.com The latest Tweets from The Japan News 日本語 (@JN_Japanese). 読売新聞英字新聞「The Japan News(ジャパン・ニューズ)」 からの最新 ...
So widespread would be an understatement. Is that what you were asking? tex
<> From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Don Hollander Sent: Thursday, November 2, 2017 5:10 PM To: Universal Acceptance Subject: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese
G’day:
The UASG has in the past indicated that good practice is to treat the Open Dot as a label delimiter, just like the traditional full-stop.
The ideographic full stop (U+3002 [。]) is used in languages such as Chinese or Japanese to mark the end of a sentence. UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.” We found that some browsers do this.
As we go through the Linkification review, we’re not seeing this happen for social media communications apps.
Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script?
Don
Don Hollander Universal Acceptance Steering Group Skype: don_hollander
Don Hollander Universal Acceptance Steering Group Skype: don_hollander
The problem is that in the protocol anything other than ASCII dot doesn't work. So some client mapping needs to be done. See RFC 5895 for some suggestions about this. It's up to applications. But comparisons with running text are misleading and confused, because domain names aren't running text. Sent from Blue On Nov 2, 2017, 9:29 PM, at 9:29 PM, Tex <textexin@xencraft.com> wrote:
If you look at any site in Japanese, such as sony.jp, or japantimes http://members.japantimes.co.jp/sub/index_ja.html, et al, and look at articles or any area that has full sentences as opposed to labels or headlines, you will see open dots used alongside Japanese text and no ascii dots.
I also see open dots in Japanese tweets https://twitter.com/JN_Japanese
So widespread would be an understatement. Is that what you were asking?
tex
From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Don Hollander Sent: Thursday, November 2, 2017 5:10 PM To: Universal Acceptance Subject: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese
G’day:
The UASG has in the past indicated that good practice is to treat the Open Dot as a label delimiter, just like the traditional full-stop.
The ideographic full stop (U+3002 [。]) is used in languages such as Chinese or Japanese to mark the end of a sentence. UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.”
We found that some browsers do this.
As we go through the Linkification review, we’re not seeing this happen for social media communications apps.
Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script?
Don
Don Hollander
Universal Acceptance Steering Group
Skype: don_hollander
Hello, In Chinese version of IE, when the full stop is inputed into IE, it will be automatically turned into ASCII dot. In Chinese Input Method which helps to input Chinese Character into computer, when you input Chinese character, the full stop is immediately followed if you want to finish a sentence. The Chinese Input Method can not know whether you want to input Chinese sentence or Chinese domain name. If it is Chinese domain name, it should be ASCII dot. If it is Chinese sentence, it should be the full stop. Usually, Chinese Input Method will choose the full stop for chinese characters. If the user want to input the ASCII dot, he needs to switch to English Input Method. In order to be convenient to users, CNNIC talked with many browsers to push them to support "Chinsed full stop should be automatically turned into ASCII dot in chinese domain name" in address bar of browser. Now almost all browsers with Chinese version support this function. The browser with English version may not support this function "Chinsed full stop should be automatically turned into ASCII dot in chinese domain name". ASCII dot between Chinese character is only useful in Chinse Domain Names. Otherwise, the Chinese full dot should be used. Best Regards. Jiankang Yao From: Don Hollander Date: 2017-11-03 08:10 To: Universal Acceptance Subject: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese G’day: The UASG has in the past indicated that good practice is to treat the Open Dot as a label delimiter, just like the traditional full-stop. The ideographic full stop (U+3002 [。]) is used in languages such as Chinese or Japanese to mark the end of a sentence. UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.” We found that some browsers do this. As we go through the Linkification review, we’re not seeing this happen for social media communications apps. Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script? Don Don Hollander Universal Acceptance Steering Group Skype: don_hollander
Hi Don, To add to Jiankang Yao. Simply put, what we aim to achieve is "���� should be or even must be equivalent to ��.�� in domain names, as is the case with Upper case and Lower case in domain names, e.g. "abc" is equivalent to "ABC" when they are used in domain names; as is the case with Simplified Chinese and Traditional Chinese in domain names, e.g. "�й���(China) is equivalent to "�Ї�" in domian names. I am not a technical guy. If anything wrong, please do correct me. @Jiankang Yao<mailto:yaojk@cnnic.cn> Best Regards Zuan Zhang ________________________________ ������: ua-discuss-bounces@icann.org <ua-discuss-bounces@icann.org> ���� Jiankang Yao <yaojk@cnnic.cn> ����ʱ��: 2017��11��3�� 10:09 �ռ���: Don Hollander; ua-discuss ����: Re: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese Hello, In Chinese version of IE, when the full stop is inputed into IE, it will be automatically turned into ASCII dot. In Chinese Input Method which helps to input Chinese Character into computer, when you input Chinese character, the full stop is immediately followed if you want to finish a sentence. The Chinese Input Method can not know whether you want to input Chinese sentence or Chinese domain name. If it is Chinese domain name, it should be ASCII dot. If it is Chinese sentence, it should be the full stop. Usually, Chinese Input Method will choose the full stop for chinese characters. If the user want to input the ASCII dot, he needs to switch to English Input Method. In order to be convenient to users, CNNIC talked with many browsers to push them to support "Chinsed full stop should be automatically turned into ASCII dot in chinese domain name" in address bar of browser. Now almost all browsers with Chinese version support this function. The browser with English version may not support this function "Chinsed full stop should be automatically turned into ASCII dot in chinese domain name". ASCII dot between Chinese character is only useful in Chinse Domain Names. Otherwise, the Chinese full dot should be used. Best Regards. ________________________________ Jiankang Yao From: Don Hollander<mailto:don.hollander@icann.org> Date: 2017-11-03 08:10 To: Universal Acceptance<mailto:ua-discuss@icann.org> Subject: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese G��day: The UASG has in the past indicated that good practice is to treat the Open Dot as a label delimiter, just like the traditional full-stop. The ideographic full stop (U+3002 [��]) is used in languages such as Chinese or Japanese to mark the end of a sentence. UASG004 states ��We expect software to transform the ��open dot�� to a standard ASCII dot ��.��, thus making use of the already registered domain name.�� We found that some browsers do this. As we go through the Linkification review, we��re not seeing this happen for social media communications apps. Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script? Don Don Hollander Universal Acceptance Steering Group Skype: don_hollander
Hi, I understand why people want these things, but it is not possible and will never be possible to treat label separators in quite the way people want. This is because the label separators don't actually appear in the wire protocol, but the presentation format of domain names is also a kind of protocol. First, upper and lower case _are not_ equivalent in the DNS. The protocol makes them match, but the case is supposed to be preserved. You can observe this on the Internet. Second, mapping of actually different names (like the different spellings of China in Han characters) is quite different to the in-protocol match that the DNS unfortunately did for upper and lower case ASCII. But most importantly, in the wire format there is no separator character. Instead, the length of the label indicates the separation. But in zone files and the like, the separator is there for humans. This is also part of the protocol, so we can't arbitrarily change it. Applications can map it, however. That is not quite the same but it is what idna does generally. A Sent with AquaMail for Android http://www.aqua-mail.com On November 2, 2017 23:16:44 Peter Green <seekcommunications@hotmail.com> wrote:
Hi Don,
To add to Jiankang Yao.
Simply put, what we aim to achieve is "。” should be or even must be equivalent to “.” in domain names, as is the case with Upper case and Lower case in domain names, e.g. "abc" is equivalent to "ABC" when they are used in domain names; as is the case with Simplified Chinese and Traditional Chinese in domain names, e.g. "中国”(China) is equivalent to "中國" in domian names.
I am not a technical guy.
If anything wrong, please do correct me. @Jiankang Yao<mailto:yaojk@cnnic.cn>
Best Regards
Zuan Zhang
________________________________ 发件人: ua-discuss-bounces@icann.org <ua-discuss-bounces@icann.org> 代表 Jiankang Yao <yaojk@cnnic.cn> 发送时间: 2017年11月3日 10:09 收件人: Don Hollander; ua-discuss 主题: Re: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese
Hello,
In Chinese version of IE, when the full stop is inputed into IE, it will be automatically turned into ASCII dot. In Chinese Input Method which helps to input Chinese Character into computer, when you input Chinese character, the full stop is immediately followed if you want to finish a sentence. The Chinese Input Method can not know whether you want to input Chinese sentence or Chinese domain name. If it is Chinese domain name, it should be ASCII dot. If it is Chinese sentence, it should be the full stop. Usually, Chinese Input Method will choose the full stop for chinese characters. If the user want to input the ASCII dot, he needs to switch to English Input Method. In order to be convenient to users, CNNIC talked with many browsers to push them to support "Chinsed full stop should be automatically turned into ASCII dot in chinese domain name" in address bar of browser. Now almost all browsers with Chinese version support this function.
The browser with English version may not support this function "Chinsed full stop should be automatically turned into ASCII dot in chinese domain name".
ASCII dot between Chinese character is only useful in Chinse Domain Names. Otherwise, the Chinese full dot should be used.
Best Regards. ________________________________ Jiankang Yao
From: Don Hollander<mailto:don.hollander@icann.org> Date: 2017-11-03 08:10 To: Universal Acceptance<mailto:ua-discuss@icann.org> Subject: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese G’day:
The UASG has in the past indicated that good practice is to treat the Open Dot as a label delimiter, just like the traditional full-stop.
The ideographic full stop (U+3002 [。]) is used in languages such as Chinese or Japanese to mark the end of a sentence. UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.”
We found that some browsers do this.
As we go through the Linkification review, we’re not seeing this happen for social media communications apps.
Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script?
Don
Don Hollander Universal Acceptance Steering Group Skype: don_hollander
Agreed. I think we can agree that UASG can/should recommend that linkifiers treat open dot and closed dot as equivalent when attempting to detect label separators. But linkification itself is inherently hard in many instances and getting adoption of that reco is app/service specific. From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Andrew Sullivan Sent: Thursday, November 2, 2017 8:58 PM To: Peter Green <seekcommunications@hotmail.com>; Don Hollander <don.hollander@icann.org> Cc: ua-discuss@icann.org Subject: Re: [UA-discuss] 答复: The Open Dot as a label delimiter in Chinese and Japanese Hi, I understand why people want these things, but it is not possible and will never be possible to treat label separators in quite the way people want. This is because the label separators don't actually appear in the wire protocol, but the presentation format of domain names is also a kind of protocol. First, upper and lower case _are not_ equivalent in the DNS. The protocol makes them match, but the case is supposed to be preserved. You can observe this on the Internet. Second, mapping of actually different names (like the different spellings of China in Han characters) is quite different to the in-protocol match that the DNS unfortunately did for upper and lower case ASCII. But most importantly, in the wire format there is no separator character. Instead, the length of the label indicates the separation. But in zone files and the like, the separator is there for humans. This is also part of the protocol, so we can't arbitrarily change it. Applications can map it, however. That is not quite the same but it is what idna does generally. A Sent with AquaMail for Android<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google...> http://www.aqua-mail.com<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.aqua-mail.com&data=02%7C01%7Cmarksv%40microsoft.com%7C8fddc726734a455c719108d5226f4626%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636452783767202795&sdata=jBeBULyA57H4TIcS7ueQlmv7g3n%2FbfXaXGASKE2z9eA%3D&reserved=0> On November 2, 2017 23:16:44 Peter Green <seekcommunications@hotmail.com<mailto:seekcommunications@hotmail.com>> wrote: Hi Don, To add to Jiankang Yao. Simply put, what we aim to achieve is "。” should be or even must be equivalent to “.” in domain names, as is the case with Upper case and Lower case in domain names, e.g. "abc" is equivalent to "ABC" when they are used in domain names; as is the case with Simplified Chinese and Traditional Chinese in domain names, e.g. "中国”(China) is equivalent to "中國" in domian names. I am not a technical guy. If anything wrong, please do correct me. @Jiankang Yao<mailto:yaojk@cnnic.cn> Best Regards Zuan Zhang ________________________________ 发件人: ua-discuss-bounces@icann.org<mailto:ua-discuss-bounces@icann.org> <ua-discuss-bounces@icann.org<mailto:ua-discuss-bounces@icann.org>> 代表 Jiankang Yao <yaojk@cnnic.cn<mailto:yaojk@cnnic.cn>> 发送时间: 2017年11月3日 10:09 收件人: Don Hollander; ua-discuss 主题: Re: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese Hello, In Chinese version of IE, when the full stop is inputed into IE, it will be automatically turned into ASCII dot. In Chinese Input Method which helps to input Chinese Character into computer, when you input Chinese character, the full stop is immediately followed if you want to finish a sentence. The Chinese Input Method can not know whether you want to input Chinese sentence or Chinese domain name. If it is Chinese domain name, it should be ASCII dot. If it is Chinese sentence, it should be the full stop. Usually, Chinese Input Method will choose the full stop for chinese characters. If the user want to input the ASCII dot, he needs to switch to English Input Method. In order to be convenient to users, CNNIC talked with many browsers to push them to support "Chinsed full stop should be automatically turned into ASCII dot in chinese domain name" in address bar of browser. Now almost all browsers with Chinese version support this function. The browser with English version may not support this function "Chinsed full stop should be automatically turned into ASCII dot in chinese domain name". ASCII dot between Chinese character is only useful in Chinse Domain Names. Otherwise, the Chinese full dot should be used. Best Regards. ________________________________ Jiankang Yao From: Don Hollander<mailto:don.hollander@icann.org> Date: 2017-11-03 08:10 To: Universal Acceptance<mailto:ua-discuss@icann.org> Subject: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese G’day: The UASG has in the past indicated that good practice is to treat the Open Dot as a label delimiter, just like the traditional full-stop. The ideographic full stop (U+3002 [。]) is used in languages such as Chinese or Japanese to mark the end of a sentence. UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.” We found that some browsers do this. As we go through the Linkification review, we’re not seeing this happen for social media communications apps. Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script? Don Don Hollander Universal Acceptance Steering Group Skype: don_hollander
Don:
Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script?
Sure. Starting references: <https://en.wikipedia.org/wiki/Chinese_punctuation#Punctuation_marks>, <https://en.wikipedia.org/wiki/Japanese_punctuation#Full_stop>. In my experience developing publishing software and fonts for high-end Japanese typography, I can confirm that U+3002 [。] is routine in Japanese language text. However. In English language orthography, the U+002E FULL STOP [.] is used as a delimiter between fields of structured data, as well as a sentence ending punctuation. Consider a phone number like 212.555.1212, or a date like 3.11.17 . I think it is clearer to understand the U+002E between labels of a domain name as a delimiter rather than as a sentence-ending full stop. In Chinese and Japanese language orthography, what are the marks conventionally used as delimiters in everyday text? And,
UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.”
Do we know what the source is for this expectation? Did it come from perspectives informed about Chinese and Japanese culture? Did this perspective show that U+3002 [。] would be preferred as a delimiter between domain name labels, over U+002E [.] or other punctuation? Or did we at UASG make a guess at the Chinese and Japanese perspective? If we are not confident that U+3002 [。] is preferred as a delimiter by people in those cultures, I think UASG should consider very carefully before advocating its use. Best regards, —Jim DeLaHunt, Vancouver, Canada On 2017-11-02 17:10, Don Hollander wrote:
G’day:
The UASG has in the past indicated that good practice is to treat the Open Dot as a label delimiter, just like the traditional full-stop.
The ideographic full stop (U+3002 [。]) is used in languages such as Chinese or Japanese to mark the end of a sentence. UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.”
We found that some browsers do this.
As we go through the Linkification review, we’re not seeing this happen for social media communications apps.
Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script?
Don
Don Hollander Universal Acceptance Steering Group Skype: don_hollander
-- --Jim DeLaHunt, jdlh@jdlh.com http://blog.jdlh.com/ (http://jdlh.com/) multilingual websites consultant 355-1027 Davie St, Vancouver BC V6E 4L2, Canada Canada mobile +1-604-376-8953
When typing in Chinese using any (probably all) of the common text input methods, hitting the period key on a standard keyboard will always insert an open dot. [cid:image001.png@01D354BF.D813F670] I have two browsers installed on this Windows 10 PC, Chrome and Edge. Both, when typing Chinese into the browser URL bar, insert an ascii dot when the period key is hit. I’d assume just about all contemporary browsers would do this. It’d be crazy annoying to the user, otherwise. [cid:image002.png@01D354C0.08FDEF70] Best, S. Simon Cousins | 夏明 CEO, Allegravita LLC & 北京乐微塔营销咨询有限公司 USA: 32 W 39 St 4th Floor, New York NY 10018 China: 北京市海淀区苏州街55号3层01-A509 simon@allegravita.com<mailto:simon@allegravita.com> | +1 347 850-3360 | +86 139 1010-5401 From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Jim DeLaHunt Sent: Friday, November 3, 2017 4:17 PM To: ua-discuss@icann.org Subject: Re: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese Don:
Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script?
Sure. Starting references: <https://en.wikipedia.org/wiki/Chinese_punctuation#Punctuation_marks>, <https://en.wikipedia.org/wiki/Japanese_punctuation#Full_stop>. In my experience developing publishing software and fonts for high-end Japanese typography, I can confirm that U+3002 [。] is routine in Japanese language text. However. In English language orthography, the U+002E FULL STOP [.] is used as a delimiter between fields of structured data, as well as a sentence ending punctuation. Consider a phone number like 212.555.1212, or a date like 3.11.17 . I think it is clearer to understand the U+002E between labels of a domain name as a delimiter rather than as a sentence-ending full stop. In Chinese and Japanese language orthography, what are the marks conventionally used as delimiters in everyday text? And,
UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.”
Do we know what the source is for this expectation? Did it come from perspectives informed about Chinese and Japanese culture? Did this perspective show that U+3002 [。] would be preferred as a delimiter between domain name labels, over U+002E [.] or other punctuation? Or did we at UASG make a guess at the Chinese and Japanese perspective? If we are not confident that U+3002 [。] is preferred as a delimiter by people in those cultures, I think UASG should consider very carefully before advocating its use. Best regards, —Jim DeLaHunt, Vancouver, Canada On 2017-11-02 17:10, Don Hollander wrote: G’day: The UASG has in the past indicated that good practice is to treat the Open Dot as a label delimiter, just like the traditional full-stop. The ideographic full stop (U+3002 [。]) is used in languages such as Chinese or Japanese to mark the end of a sentence. UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.” We found that some browsers do this. As we go through the Linkification review, we’re not seeing this happen for social media communications apps. Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script? Don Don Hollander Universal Acceptance Steering Group Skype: don_hollander -- --Jim DeLaHunt, jdlh@jdlh.com<mailto:jdlh@jdlh.com> http://blog.jdlh.com/ (http://jdlh.com/) multilingual websites consultant 355-1027 Davie St, Vancouver BC V6E 4L2, Canada Canada mobile +1-604-376-8953
I advocated open dot equivalency based on UTS#46 when writing UASG007. From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Simon Cousins Sent: Friday, November 3, 2017 1:23 PM To: Jim DeLaHunt <jfrom.uasg@jdlh.com>; ua-discuss@icann.org Subject: Re: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese When typing in Chinese using any (probably all) of the common text input methods, hitting the period key on a standard keyboard will always insert an open dot. [cid:image001.png@01D35712.971303F0] I have two browsers installed on this Windows 10 PC, Chrome and Edge. Both, when typing Chinese into the browser URL bar, insert an ascii dot when the period key is hit. I’d assume just about all contemporary browsers would do this. It’d be crazy annoying to the user, otherwise. [cid:image002.png@01D35712.971303F0] Best, S. Simon Cousins | 夏明 CEO, Allegravita LLC & 北京乐微塔营销咨询有限公司 USA: 32 W 39 St 4th Floor, New York NY 10018 China: 北京市海淀区苏州街55号3层01-A509 simon@allegravita.com<mailto:simon@allegravita.com> | +1 347 850-3360 | +86 139 1010-5401 From: ua-discuss-bounces@icann.org<mailto:ua-discuss-bounces@icann.org> [mailto:ua-discuss-bounces@icann.org] On Behalf Of Jim DeLaHunt Sent: Friday, November 3, 2017 4:17 PM To: ua-discuss@icann.org<mailto:ua-discuss@icann.org> Subject: Re: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese Don:
Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script?
Sure. Starting references: <https://en.wikipedia.org/wiki/Chinese_punctuation#Punctuation_marks<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FChinese_punctuation%23Punctuation_marks&data=02%7C01%7Cmarksv%40microsoft.com%7C8d419c38ce284eb2e43108d522f8c30f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636453374225236452&sdata=ZFgnt3TdVzU1W80MyIs%2B6tzai2ZUAxfsr2%2FjMFArn54%3D&reserved=0>>, <https://en.wikipedia.org/wiki/Japanese_punctuation#Full_stop<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FJapanese_punctuation%23Full_stop&data=02%7C01%7Cmarksv%40microsoft.com%7C8d419c38ce284eb2e43108d522f8c30f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636453374225236452&sdata=yPIWa2HFbh4iltGvScYIEyhee%2FBjESGwSRJ3YfcZYqo%3D&reserved=0>>. In my experience developing publishing software and fonts for high-end Japanese typography, I can confirm that U+3002 [。] is routine in Japanese language text. However. In English language orthography, the U+002E FULL STOP [.] is used as a delimiter between fields of structured data, as well as a sentence ending punctuation. Consider a phone number like 212.555.1212, or a date like 3.11.17 . I think it is clearer to understand the U+002E between labels of a domain name as a delimiter rather than as a sentence-ending full stop. In Chinese and Japanese language orthography, what are the marks conventionally used as delimiters in everyday text? And,
UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.”
Do we know what the source is for this expectation? Did it come from perspectives informed about Chinese and Japanese culture? Did this perspective show that U+3002 [。] would be preferred as a delimiter between domain name labels, over U+002E [.] or other punctuation? Or did we at UASG make a guess at the Chinese and Japanese perspective? If we are not confident that U+3002 [。] is preferred as a delimiter by people in those cultures, I think UASG should consider very carefully before advocating its use. Best regards, —Jim DeLaHunt, Vancouver, Canada On 2017-11-02 17:10, Don Hollander wrote: G’day: The UASG has in the past indicated that good practice is to treat the Open Dot as a label delimiter, just like the traditional full-stop. The ideographic full stop (U+3002 [。]) is used in languages such as Chinese or Japanese to mark the end of a sentence. UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.” We found that some browsers do this. As we go through the Linkification review, we’re not seeing this happen for social media communications apps. Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script? Don Don Hollander Universal Acceptance Steering Group Skype: don_hollander -- --Jim DeLaHunt, jdlh@jdlh.com<mailto:jdlh@jdlh.com> http://blog.jdlh.com/<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.jdlh.com%2F&data=02%7C01%7Cmarksv%40microsoft.com%7C8d419c38ce284eb2e43108d522f8c30f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636453374225236452&sdata=wB7AvvGI%2Bc7vWjHrf0WPk3MxR3O3gutJTrscokifXSo%3D&reserved=0> (http://jdlh.com/<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fjdlh.com%2F&data=02%7C01%7Cmarksv%40microsoft.com%7C8d419c38ce284eb2e43108d522f8c30f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636453374225236452&sdata=0Kh%2Bbj0hMpXvyjX2ACfXvLfWTWH0oAIjz1G6RtU15Oc%3D&reserved=0>) multilingual websites consultant 355-1027 Davie St, Vancouver BC V6E 4L2, Canada Canada mobile +1-604-376-8953
Correction, UTS#46 and RFC5895 From: Mark Svancarek Sent: Monday, November 6, 2017 3:19 PM To: 'Simon Cousins' <simon@allegravita.com>; Jim DeLaHunt <jfrom.uasg@jdlh.com>; ua-discuss@icann.org Subject: RE: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese I advocated open dot equivalency based on UTS#46 when writing UASG007. From: ua-discuss-bounces@icann.org<mailto:ua-discuss-bounces@icann.org> [mailto:ua-discuss-bounces@icann.org] On Behalf Of Simon Cousins Sent: Friday, November 3, 2017 1:23 PM To: Jim DeLaHunt <jfrom.uasg@jdlh.com<mailto:jfrom.uasg@jdlh.com>>; ua-discuss@icann.org<mailto:ua-discuss@icann.org> Subject: Re: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese When typing in Chinese using any (probably all) of the common text input methods, hitting the period key on a standard keyboard will always insert an open dot. [cid:image001.png@01D35715.DA2B5900] I have two browsers installed on this Windows 10 PC, Chrome and Edge. Both, when typing Chinese into the browser URL bar, insert an ascii dot when the period key is hit. I’d assume just about all contemporary browsers would do this. It’d be crazy annoying to the user, otherwise. [cid:image002.png@01D35715.DA2B5900] Best, S. Simon Cousins | 夏明 CEO, Allegravita LLC & 北京乐微塔营销咨询有限公司 USA: 32 W 39 St 4th Floor, New York NY 10018 China: 北京市海淀区苏州街55号3层01-A509 simon@allegravita.com<mailto:simon@allegravita.com> | +1 347 850-3360 | +86 139 1010-5401 From: ua-discuss-bounces@icann.org<mailto:ua-discuss-bounces@icann.org> [mailto:ua-discuss-bounces@icann.org] On Behalf Of Jim DeLaHunt Sent: Friday, November 3, 2017 4:17 PM To: ua-discuss@icann.org<mailto:ua-discuss@icann.org> Subject: Re: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese Don:
Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script?
Sure. Starting references: <https://en.wikipedia.org/wiki/Chinese_punctuation#Punctuation_marks<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FChinese_punctuation%23Punctuation_marks&data=02%7C01%7Cmarksv%40microsoft.com%7C8d419c38ce284eb2e43108d522f8c30f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636453374225236452&sdata=ZFgnt3TdVzU1W80MyIs%2B6tzai2ZUAxfsr2%2FjMFArn54%3D&reserved=0>>, <https://en.wikipedia.org/wiki/Japanese_punctuation#Full_stop<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FJapanese_punctuation%23Full_stop&data=02%7C01%7Cmarksv%40microsoft.com%7C8d419c38ce284eb2e43108d522f8c30f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636453374225236452&sdata=yPIWa2HFbh4iltGvScYIEyhee%2FBjESGwSRJ3YfcZYqo%3D&reserved=0>>. In my experience developing publishing software and fonts for high-end Japanese typography, I can confirm that U+3002 [。] is routine in Japanese language text. However. In English language orthography, the U+002E FULL STOP [.] is used as a delimiter between fields of structured data, as well as a sentence ending punctuation. Consider a phone number like 212.555.1212, or a date like 3.11.17 . I think it is clearer to understand the U+002E between labels of a domain name as a delimiter rather than as a sentence-ending full stop. In Chinese and Japanese language orthography, what are the marks conventionally used as delimiters in everyday text? And,
UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.”
Do we know what the source is for this expectation? Did it come from perspectives informed about Chinese and Japanese culture? Did this perspective show that U+3002 [。] would be preferred as a delimiter between domain name labels, over U+002E [.] or other punctuation? Or did we at UASG make a guess at the Chinese and Japanese perspective? If we are not confident that U+3002 [。] is preferred as a delimiter by people in those cultures, I think UASG should consider very carefully before advocating its use. Best regards, —Jim DeLaHunt, Vancouver, Canada On 2017-11-02 17:10, Don Hollander wrote: G’day: The UASG has in the past indicated that good practice is to treat the Open Dot as a label delimiter, just like the traditional full-stop. The ideographic full stop (U+3002 [。]) is used in languages such as Chinese or Japanese to mark the end of a sentence. UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.” We found that some browsers do this. As we go through the Linkification review, we’re not seeing this happen for social media communications apps. Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script? Don Don Hollander Universal Acceptance Steering Group Skype: don_hollander -- --Jim DeLaHunt, jdlh@jdlh.com<mailto:jdlh@jdlh.com> http://blog.jdlh.com/<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.jdlh.com%2F&data=02%7C01%7Cmarksv%40microsoft.com%7C8d419c38ce284eb2e43108d522f8c30f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636453374225236452&sdata=wB7AvvGI%2Bc7vWjHrf0WPk3MxR3O3gutJTrscokifXSo%3D&reserved=0> (http://jdlh.com/<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fjdlh.com%2F&data=02%7C01%7Cmarksv%40microsoft.com%7C8d419c38ce284eb2e43108d522f8c30f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636453374225236452&sdata=0Kh%2Bbj0hMpXvyjX2ACfXvLfWTWH0oAIjz1G6RtU15Oc%3D&reserved=0>) multilingual websites consultant 355-1027 Davie St, Vancouver BC V6E 4L2, Canada Canada mobile +1-604-376-8953
Mark: Thank you for these citations. I will make a note of them. So, RFC5895 "Mapping Characters for Internationalized Domain Names in Applications (IDNA) 2008" <https://www.rfc-editor.org/rfc/rfc5895.txt>, section 2 "The General Procedure", says, 4. If an implementation of this mapping is also performing the step of separation of the parts of a domain name into labels by using the FULL STOP character (U+002E), the IDEOGRAPHIC FULL STOP character (U+3002) can be mapped to the FULL STOP before label separation occurs. There are other characters that are used as "full stops" that one could consider mapping as label separators, but their use as such has not been investigated thoroughly. This step was chosen because some input mechanisms do not allow the user to easily enter proper label separators. Only the IDEOGRAPHIC FULL STOP character (U+3002) is added in this mapping because the authors have not fully investigated the applicability of other characters and the environments where they should and should not be considered domain name label separators. And UTS #46 "Unicode IDNA Compatibility Processing" <http://www.unicode.org/reports/tr46/>, section 2.3 "Notation", says, In this document, a label is a substring of a domain name. That substring is bounded on both sides by either the start or the end of the string, or any of the following characters, called label-separators: 1. U+002E ( . ) FULL STOP 2. U+FF0E ( . ) FULLWIDTH FULL STOP 3. U+3002 ( 。 ) IDEOGRAPHIC FULL STOP 4. U+FF61 ( 。 ) HALFWIDTH IDEOGRAPHIC FULL STOP From my point of view as a UASG explainer, this is good an sufficient grounding for a recommendation that apps treat U+3002 as a label separator. I would go further and warn people that this list might grow; that U+FF0E and U+FF61 may be on their way. It would good to have a footnote somewhere linking our recommendation to those documents. I see UASG007 cites RFC5895 in general (I don't see a citation to UTS #46 in UASG007). Actually, this probably belongs more in a wiki somewhere, a list of the things UASG recommends and why we recommend them. This will help us as we bring more UA explainers up to speed. (Interesting, I just noticed that UASG007 also recommends treating the Arabic full stop character “۔” (U+06D4) as a label separator. UTS #46 and RFC5885 don't mention that.) Thank you for the citations, Mark. Best regards, —Jim DeLaHunt, Vancouver, Canada On 2017-11-06 15:42, Mark Svancarek wrote:
Correction, UTS#46 _and_ RFC5895
*From:*Mark Svancarek *Sent:* Monday, November 6, 2017 3:19 PM *To:* 'Simon Cousins' <simon@allegravita.com>; Jim DeLaHunt <jfrom.uasg@jdlh.com>; ua-discuss@icann.org *Subject:* RE: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese
I advocated open dot equivalency based on UTS#46 when writing UASG007.
*From:*ua-discuss-bounces@icann.org <mailto:ua-discuss-bounces@icann.org> [mailto:ua-discuss-bounces@icann.org] *On Behalf Of *Simon Cousins *Sent:* Friday, November 3, 2017 1:23 PM *To:* Jim DeLaHunt <jfrom.uasg@jdlh.com <mailto:jfrom.uasg@jdlh.com>>; ua-discuss@icann.org <mailto:ua-discuss@icann.org> *Subject:* Re: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese
When typing in Chinese using any (probably all) of the common text input methods, hitting the period key on a standard keyboard will always insert an open dot.
I have two browsers installed on this Windows 10 PC, Chrome and Edge. Both, when typing Chinese into the browser URL bar, insert an ascii dot when the period key is hit. I’d assume just about all contemporary browsers would do this. It’d be crazy annoying to the user, otherwise.
Best, S.
Simon Cousins | 夏明
CEO, Allegravita LLC & 北京乐微塔营销咨询有限公司
USA: 32 W 39 St 4^th Floor, New York NY 10018
China: 北京市海淀区苏州街55号3层01-A509
simon@allegravita.com <mailto:simon@allegravita.com>| +1 347 850-3360 | +86 139 1010-5401
*From:*ua-discuss-bounces@icann.org <mailto:ua-discuss-bounces@icann.org> [mailto:ua-discuss-bounces@icann.org] *On Behalf Of *Jim DeLaHunt *Sent:* Friday, November 3, 2017 4:17 PM *To:* ua-discuss@icann.org <mailto:ua-discuss@icann.org> *Subject:* Re: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese
Don:
Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script?
Sure. Starting references: <https://en.wikipedia.org/wiki/Chinese_punctuation#Punctuation_marks <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FChinese_punctuation%23Punctuation_marks&data=02%7C01%7Cmarksv%40microsoft.com%7C8d419c38ce284eb2e43108d522f8c30f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636453374225236452&sdata=ZFgnt3TdVzU1W80MyIs%2B6tzai2ZUAxfsr2%2FjMFArn54%3D&reserved=0>>, <https://en.wikipedia.org/wiki/Japanese_punctuation#Full_stop <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FJapanese_punctuation%23Full_stop&data=02%7C01%7Cmarksv%40microsoft.com%7C8d419c38ce284eb2e43108d522f8c30f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636453374225236452&sdata=yPIWa2HFbh4iltGvScYIEyhee%2FBjESGwSRJ3YfcZYqo%3D&reserved=0>>.
In my experience developing publishing software and fonts for high-end Japanese typography, I can confirm that U+3002 [。] is routine in Japanese language text.
However.
In English language orthography, the U+002E FULL STOP [.] is used as a delimiter between fields of structured data, as well as a sentence ending punctuation. Consider a phone number like 212.555.1212, or a date like 3.11.17 . I think it is clearer to understand the U+002E between labels of a domain name as a delimiter rather than as a sentence-ending full stop.
In Chinese and Japanese language orthography, what are the marks conventionally used as delimiters in everyday text?
And,
UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.”
Do we know what the source is for this expectation? Did it come from perspectives informed about Chinese and Japanese culture? Did this perspective show that U+3002 [。] would be preferred as a delimiter between domain name labels, over U+002E [.] or other punctuation? Or did we at UASG make a guess at the Chinese and Japanese perspective?
If we are not confident that U+3002 [。] is preferred as a delimiter by people in those cultures, I think UASG should consider very carefully before advocating its use.
Best regards,
—Jim DeLaHunt, Vancouver, Canada
On 2017-11-02 17:10, Don Hollander wrote:
G’day:
The UASG has in the past indicated that good practice is to treat the Open Dot as a label delimiter, just like the traditional full-stop.
The ideographic full stop (U+3002 [。]) is used in languages such as Chinese or Japanese to mark the end of a sentence. UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.”
We found that some browsers do this.
As we go through the Linkification review, we’re not seeing this happen for social media communications apps.
Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script?
Don
Don Hollander
Universal Acceptance Steering Group
Skype: don_hollander
-- --Jim DeLaHunt,jdlh@jdlh.com <mailto:jdlh@jdlh.com> http://blog.jdlh.com/ <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.jdlh.co...> (http://jdlh.com/ <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fjdlh.com%2F&...>) multilingual websites consultant 355-1027 Davie St, Vancouver BC V6E 4L2, Canada Canada mobile +1-604-376-8953
-- --Jim DeLaHunt, jdlh@jdlh.com http://blog.jdlh.com/ (http://jdlh.com/) multilingual websites consultant 355-1027 Davie St, Vancouver BC V6E 4L2, Canada Canada mobile +1-604-376-8953
Haha, I we added that in after discussing open dot for so long 😉 definitely supports your conclusion that the list might grow. (Interesting, I just noticed that UASG007 also recommends treating the Arabic full stop character “۔” (U+06D4) as a label separator. UTS #46 and RFC5885 don't mention that.) From: Jim DeLaHunt [mailto:jfrom.uasg@jdlh.com] Sent: Monday, November 6, 2017 4:24 PM To: Mark Svancarek <marksv@microsoft.com>; ua-discuss@icann.org Cc: Simon Cousins <simon@allegravita.com> Subject: Re: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese Mark: Thank you for these citations. I will make a note of them. So, RFC5895 "Mapping Characters for Internationalized Domain Names in Applications (IDNA) 2008"<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rfc-edi...>, section 2 "The General Procedure", says, 4. If an implementation of this mapping is also performing the step of separation of the parts of a domain name into labels by using the FULL STOP character (U+002E), the IDEOGRAPHIC FULL STOP character (U+3002) can be mapped to the FULL STOP before label separation occurs. There are other characters that are used as "full stops" that one could consider mapping as label separators, but their use as such has not been investigated thoroughly. This step was chosen because some input mechanisms do not allow the user to easily enter proper label separators. Only the IDEOGRAPHIC FULL STOP character (U+3002) is added in this mapping because the authors have not fully investigated the applicability of other characters and the environments where they should and should not be considered domain name label separators. And UTS #46 "Unicode IDNA Compatibility Processing"<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.unicode....>, section 2.3 "Notation", says, In this document, a label is a substring of a domain name. That substring is bounded on both sides by either the start or the end of the string, or any of the following characters, called label-separators: 1. U+002E ( . ) FULL STOP 2. U+FF0E ( . ) FULLWIDTH FULL STOP 3. U+3002 ( 。 ) IDEOGRAPHIC FULL STOP 4. U+FF61 ( 。 ) HALFWIDTH IDEOGRAPHIC FULL STOP From my point of view as a UASG explainer, this is good an sufficient grounding for a recommendation that apps treat U+3002 as a label separator. I would go further and warn people that this list might grow; that U+FF0E and U+FF61 may be on their way. It would good to have a footnote somewhere linking our recommendation to those documents. I see UASG007 cites RFC5895 in general (I don't see a citation to UTS #46 in UASG007). Actually, this probably belongs more in a wiki somewhere, a list of the things UASG recommends and why we recommend them. This will help us as we bring more UA explainers up to speed. (Interesting, I just noticed that UASG007 also recommends treating the Arabic full stop character “۔” (U+06D4) as a label separator. UTS #46 and RFC5885 don't mention that.) Thank you for the citations, Mark. Best regards, —Jim DeLaHunt, Vancouver, Canada On 2017-11-06 15:42, Mark Svancarek wrote: Correction, UTS#46 and RFC5895 From: Mark Svancarek Sent: Monday, November 6, 2017 3:19 PM To: 'Simon Cousins' <simon@allegravita.com><mailto:simon@allegravita.com>; Jim DeLaHunt <jfrom.uasg@jdlh.com><mailto:jfrom.uasg@jdlh.com>; ua-discuss@icann.org<mailto:ua-discuss@icann.org> Subject: RE: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese I advocated open dot equivalency based on UTS#46 when writing UASG007. From: ua-discuss-bounces@icann.org<mailto:ua-discuss-bounces@icann.org> [mailto:ua-discuss-bounces@icann.org] On Behalf Of Simon Cousins Sent: Friday, November 3, 2017 1:23 PM To: Jim DeLaHunt <jfrom.uasg@jdlh.com<mailto:jfrom.uasg@jdlh.com>>; ua-discuss@icann.org<mailto:ua-discuss@icann.org> Subject: Re: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese When typing in Chinese using any (probably all) of the common text input methods, hitting the period key on a standard keyboard will always insert an open dot. [cid:image001.png@01D3571C.04573040] I have two browsers installed on this Windows 10 PC, Chrome and Edge. Both, when typing Chinese into the browser URL bar, insert an ascii dot when the period key is hit. I’d assume just about all contemporary browsers would do this. It’d be crazy annoying to the user, otherwise. [cid:image002.png@01D3571C.04573040] Best, S. Simon Cousins | 夏明 CEO, Allegravita LLC & 北京乐微塔营销咨询有限公司 USA: 32 W 39 St 4th Floor, New York NY 10018 China: 北京市海淀区苏州街55号3层01-A509 simon@allegravita.com<mailto:simon@allegravita.com> | +1 347 850-3360 | +86 139 1010-5401 From: ua-discuss-bounces@icann.org<mailto:ua-discuss-bounces@icann.org> [mailto:ua-discuss-bounces@icann.org] On Behalf Of Jim DeLaHunt Sent: Friday, November 3, 2017 4:17 PM To: ua-discuss@icann.org<mailto:ua-discuss@icann.org> Subject: Re: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese Don:
Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script?
Sure. Starting references: <https://en.wikipedia.org/wiki/Chinese_punctuation#Punctuation_marks<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FChinese_punctuation%23Punctuation_marks&data=02%7C01%7Cmarksv%40microsoft.com%7C8d419c38ce284eb2e43108d522f8c30f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636453374225236452&sdata=ZFgnt3TdVzU1W80MyIs%2B6tzai2ZUAxfsr2%2FjMFArn54%3D&reserved=0>>, <https://en.wikipedia.org/wiki/Japanese_punctuation#Full_stop<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FJapanese_punctuation%23Full_stop&data=02%7C01%7Cmarksv%40microsoft.com%7C8d419c38ce284eb2e43108d522f8c30f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636453374225236452&sdata=yPIWa2HFbh4iltGvScYIEyhee%2FBjESGwSRJ3YfcZYqo%3D&reserved=0>>. In my experience developing publishing software and fonts for high-end Japanese typography, I can confirm that U+3002 [。] is routine in Japanese language text. However. In English language orthography, the U+002E FULL STOP [.] is used as a delimiter between fields of structured data, as well as a sentence ending punctuation. Consider a phone number like 212.555.1212, or a date like 3.11.17 . I think it is clearer to understand the U+002E between labels of a domain name as a delimiter rather than as a sentence-ending full stop. In Chinese and Japanese language orthography, what are the marks conventionally used as delimiters in everyday text? And,
UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.”
Do we know what the source is for this expectation? Did it come from perspectives informed about Chinese and Japanese culture? Did this perspective show that U+3002 [。] would be preferred as a delimiter between domain name labels, over U+002E [.] or other punctuation? Or did we at UASG make a guess at the Chinese and Japanese perspective? If we are not confident that U+3002 [。] is preferred as a delimiter by people in those cultures, I think UASG should consider very carefully before advocating its use. Best regards, —Jim DeLaHunt, Vancouver, Canada On 2017-11-02 17:10, Don Hollander wrote: G’day: The UASG has in the past indicated that good practice is to treat the Open Dot as a label delimiter, just like the traditional full-stop. The ideographic full stop (U+3002 [。]) is used in languages such as Chinese or Japanese to mark the end of a sentence. UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.” We found that some browsers do this. As we go through the Linkification review, we’re not seeing this happen for social media communications apps. Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script? Don Don Hollander Universal Acceptance Steering Group Skype: don_hollander -- --Jim DeLaHunt, jdlh@jdlh.com<mailto:jdlh@jdlh.com> http://blog.jdlh.com/<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.jdlh.com%2F&data=02%7C01%7Cmarksv%40microsoft.com%7C8d419c38ce284eb2e43108d522f8c30f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636453374225236452&sdata=wB7AvvGI%2Bc7vWjHrf0WPk3MxR3O3gutJTrscokifXSo%3D&reserved=0> (http://jdlh.com/<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fjdlh.com%2F&data=02%7C01%7Cmarksv%40microsoft.com%7C8d419c38ce284eb2e43108d522f8c30f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636453374225236452&sdata=0Kh%2Bbj0hMpXvyjX2ACfXvLfWTWH0oAIjz1G6RtU15Oc%3D&reserved=0>) multilingual websites consultant 355-1027 Davie St, Vancouver BC V6E 4L2, Canada Canada mobile +1-604-376-8953 -- --Jim DeLaHunt, jdlh@jdlh.com<mailto:jdlh@jdlh.com> http://blog.jdlh.com/<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.jdlh.com%2F&data=02%7C01%7Cmarksv%40microsoft.com%7C7f16bcc27c0a4257dd5008d52575d27b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636456110338458307&sdata=CgzaZG%2BG3TO3MjuR%2Fwkfdpz43RKeKri4qfVXWArg7uU%3D&reserved=0> (http://jdlh.com/<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fjdlh.com%2F&data=02%7C01%7Cmarksv%40microsoft.com%7C7f16bcc27c0a4257dd5008d52575d27b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636456110338458307&sdata=fqjuw8WROKPtU93PE5i18NiZYMZhikTNv441RaPVk4k%3D&reserved=0>) multilingual websites consultant 355-1027 Davie St, Vancouver BC V6E 4L2, Canada Canada mobile +1-604-376-8953
Just did a search - https://codepoints.net/search?q=full+stop and found several full stops characters used in different scripts. and whilst I am here A few years a go I did raise the issue for URLs of replacing / U+FF0F fullwidth solidus with / Solidus U+002F ...and also.. i18n is a well established numeronym. I propose a new numeronym: i15d meaning internationalized. Basically because I am getting fed up with typing internationalised😀 André Schappo On 7 Nov 2017, at 00:26, Mark Svancarek via UA-discuss <ua-discuss@icann.org<mailto:ua-discuss@icann.org>> wrote: Haha, I we added that in after discussing open dot for so long 😉 definitely supports your conclusion that the list might grow. (Interesting, I just noticed that UASG007 also recommends treating the Arabic full stop character “۔” (U+06D4) as a label separator. UTS #46 and RFC5885 don't mention that.) From: Jim DeLaHunt [mailto:jfrom.uasg@jdlh.com] Sent: Monday, November 6, 2017 4:24 PM To: Mark Svancarek <marksv@microsoft.com<mailto:marksv@microsoft.com>>; ua-discuss@icann.org<mailto:ua-discuss@icann.org> Cc: Simon Cousins <simon@allegravita.com<mailto:simon@allegravita.com>> Subject: Re: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese Mark: Thank you for these citations. I will make a note of them. So, RFC5895 "Mapping Characters for Internationalized Domain Names in Applications (IDNA) 2008"<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rfc-edi...>, section 2 "The General Procedure", says, 4. If an implementation of this mapping is also performing the step of separation of the parts of a domain name into labels by using the FULL STOP character (U+002E), the IDEOGRAPHIC FULL STOP character (U+3002) can be mapped to the FULL STOP before label separation occurs. There are other characters that are used as "full stops" that one could consider mapping as label separators, but their use as such has not been investigated thoroughly. This step was chosen because some input mechanisms do not allow the user to easily enter proper label separators. Only the IDEOGRAPHIC FULL STOP character (U+3002) is added in this mapping because the authors have not fully investigated the applicability of other characters and the environments where they should and should not be considered domain name label separators. And UTS #46 "Unicode IDNA Compatibility Processing"<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.unicode....>, section 2.3 "Notation", says, In this document, a label is a substring of a domain name. That substring is bounded on both sides by either the start or the end of the string, or any of the following characters, called label-separators: 1. U+002E ( . ) FULL STOP 2. U+FF0E ( . ) FULLWIDTH FULL STOP 3. U+3002 ( 。 ) IDEOGRAPHIC FULL STOP 4. U+FF61 ( 。 ) HALFWIDTH IDEOGRAPHIC FULL STOP From my point of view as a UASG explainer, this is good an sufficient grounding for a recommendation that apps treat U+3002 as a label separator. I would go further and warn people that this list might grow; that U+FF0E and U+FF61 may be on their way. It would good to have a footnote somewhere linking our recommendation to those documents. I see UASG007 cites RFC5895 in general (I don't see a citation to UTS #46 in UASG007). Actually, this probably belongs more in a wiki somewhere, a list of the things UASG recommends and why we recommend them. This will help us as we bring more UA explainers up to speed. (Interesting, I just noticed that UASG007 also recommends treating the Arabic full stop character “۔” (U+06D4) as a label separator. UTS #46 and RFC5885 don't mention that.) Thank you for the citations, Mark. Best regards, —Jim DeLaHunt, Vancouver, Canada On 2017-11-06 15:42, Mark Svancarek wrote: Correction, UTS#46 and RFC5895 From: Mark Svancarek Sent: Monday, November 6, 2017 3:19 PM To: 'Simon Cousins' <simon@allegravita.com><mailto:simon@allegravita.com>; Jim DeLaHunt <jfrom.uasg@jdlh.com><mailto:jfrom.uasg@jdlh.com>; ua-discuss@icann.org<mailto:ua-discuss@icann.org> Subject: RE: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese I advocated open dot equivalency based on UTS#46 when writing UASG007. From: ua-discuss-bounces@icann.org<mailto:ua-discuss-bounces@icann.org> [mailto:ua-discuss-bounces@icann.org] On Behalf Of Simon Cousins Sent: Friday, November 3, 2017 1:23 PM To: Jim DeLaHunt <jfrom.uasg@jdlh.com<mailto:jfrom.uasg@jdlh.com>>; ua-discuss@icann.org<mailto:ua-discuss@icann.org> Subject: Re: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese When typing in Chinese using any (probably all) of the common text input methods, hitting the period key on a standard keyboard will always insert an open dot. <image001.png> I have two browsers installed on this Windows 10 PC, Chrome and Edge. Both, when typing Chinese into the browser URL bar, insert an ascii dot when the period key is hit. I’d assume just about all contemporary browsers would do this. It’d be crazy annoying to the user, otherwise. <image002.png> Best, S. Simon Cousins | 夏明 CEO, Allegravita LLC & 北京乐微塔营销咨询有限公司 USA: 32 W 39 St 4th Floor, New York NY 10018 China: 北京市海淀区苏州街55号3层01-A509 simon@allegravita.com<mailto:simon@allegravita.com> | +1 347 850-3360 | +86 139 1010-5401 From: ua-discuss-bounces@icann.org<mailto:ua-discuss-bounces@icann.org> [mailto:ua-discuss-bounces@icann.org] On Behalf Of Jim DeLaHunt Sent: Friday, November 3, 2017 4:17 PM To: ua-discuss@icann.org<mailto:ua-discuss@icann.org> Subject: Re: [UA-discuss] The Open Dot as a label delimiter in Chinese and Japanese Don:
Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script?
Sure. Starting references: <https://en.wikipedia.org/wiki/Chinese_punctuation#Punctuation_marks<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FChinese_punctuation%23Punctuation_marks&data=02%7C01%7Cmarksv%40microsoft.com%7C8d419c38ce284eb2e43108d522f8c30f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636453374225236452&sdata=ZFgnt3TdVzU1W80MyIs%2B6tzai2ZUAxfsr2%2FjMFArn54%3D&reserved=0>>, <https://en.wikipedia.org/wiki/Japanese_punctuation#Full_stop<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FJapanese_punctuation%23Full_stop&data=02%7C01%7Cmarksv%40microsoft.com%7C8d419c38ce284eb2e43108d522f8c30f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636453374225236452&sdata=yPIWa2HFbh4iltGvScYIEyhee%2FBjESGwSRJ3YfcZYqo%3D&reserved=0>>. In my experience developing publishing software and fonts for high-end Japanese typography, I can confirm that U+3002 [。] is routine in Japanese language text. However. In English language orthography, the U+002E FULL STOP [.] is used as a delimiter between fields of structured data, as well as a sentence ending punctuation. Consider a phone number like 212.555.1212, or a date like 3.11.17 . I think it is clearer to understand the U+002E between labels of a domain name as a delimiter rather than as a sentence-ending full stop. In Chinese and Japanese language orthography, what are the marks conventionally used as delimiters in everyday text? And,
UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.”
Do we know what the source is for this expectation? Did it come from perspectives informed about Chinese and Japanese culture? Did this perspective show that U+3002 [。] would be preferred as a delimiter between domain name labels, over U+002E [.] or other punctuation? Or did we at UASG make a guess at the Chinese and Japanese perspective? If we are not confident that U+3002 [。] is preferred as a delimiter by people in those cultures, I think UASG should consider very carefully before advocating its use. Best regards, —Jim DeLaHunt, Vancouver, Canada On 2017-11-02 17:10, Don Hollander wrote: G’day: The UASG has in the past indicated that good practice is to treat the Open Dot as a label delimiter, just like the traditional full-stop. The ideographic full stop (U+3002 [。]) is used in languages such as Chinese or Japanese to mark the end of a sentence. UASG004 states “We expect software to transform the ‘open dot’ to a standard ASCII dot “.”, thus making use of the already registered domain name.” We found that some browsers do this. As we go through the Linkification review, we’re not seeing this happen for social media communications apps. Does anyone have reference or even perception to how widely used the Open Dot is in Chinese, Japanese and/or other script? Don Don Hollander Universal Acceptance Steering Group Skype: don_hollander -- --Jim DeLaHunt, jdlh@jdlh.com<mailto:jdlh@jdlh.com> http://blog.jdlh.com/<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.jdlh.com%2F&data=02%7C01%7Cmarksv%40microsoft.com%7C8d419c38ce284eb2e43108d522f8c30f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636453374225236452&sdata=wB7AvvGI%2Bc7vWjHrf0WPk3MxR3O3gutJTrscokifXSo%3D&reserved=0> (http://jdlh.com/<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fjdlh.com%2F&data=02%7C01%7Cmarksv%40microsoft.com%7C8d419c38ce284eb2e43108d522f8c30f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636453374225236452&sdata=0Kh%2Bbj0hMpXvyjX2ACfXvLfWTWH0oAIjz1G6RtU15Oc%3D&reserved=0>) multilingual websites consultant 355-1027 Davie St, Vancouver BC V6E 4L2, Canada Canada mobile +1-604-376-8953 -- --Jim DeLaHunt, jdlh@jdlh.com<mailto:jdlh@jdlh.com> http://blog.jdlh.com/<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.jdlh.com%2F&data=02%7C01%7Cmarksv%40microsoft.com%7C7f16bcc27c0a4257dd5008d52575d27b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636456110338458307&sdata=CgzaZG%2BG3TO3MjuR%2Fwkfdpz43RKeKri4qfVXWArg7uU%3D&reserved=0> (http://jdlh.com/<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fjdlh.com%2F&data=02%7C01%7Cmarksv%40microsoft.com%7C7f16bcc27c0a4257dd5008d52575d27b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636456110338458307&sdata=fqjuw8WROKPtU93PE5i18NiZYMZhikTNv441RaPVk4k%3D&reserved=0>) multilingual websites consultant 355-1027 Davie St, Vancouver BC V6E 4L2, Canada Canada mobile +1-604-376-8953 🌏 🌍 🌎 André Schappo https://schappo.blogspot.co.uk https://twitter.com/andreschappo https://weibo.com/andreschappo https://groups.google.com/forum/#!forum/computer-science-curriculum-internat...
On Tue, Nov 07, 2017 at 10:52:53AM +0000, Andre Schappo wrote:
Just did a search - https://codepoints.net/search?q=full+stop and found several full stops characters used in different scripts.
Many of them aren't actually full stops, however, but contain "full stop" in their description. I hope we can agree, for instance, that U+0589 would be a bad idea to start mapping. UIs are not going to be competent at managing the entire list of possible full stops and the ways those mappings can go wrong, with is why RFC 5895 doesn't recommend "every full stop should just be mapped to U+002E".
A few years a go I did raise the issue for URLs of replacing / U+FF0F fullwidth solidus with / Solidus U+002F
But neither of these are permitted in LDH names to begin with. If you're making an argument about UIs for URLs, then you need to join the chorus decrying the state of IRIs (which turned out to be a terrible thing). I don't think this SG is anywhere close to being competent to recommending how to fix that series of terrible, terrible problems.
i18n is a well established numeronym. I propose a new numeronym: i15d meaning internationalized. Basically because I am getting fed up with typing internationalised😀
I've seen it used before, but it's not in RFC 6365. Perhaps a note to the authors :) A -- Andrew Sullivan ajs@anvilwalrusden.com
Hi, On Mon, Nov 06, 2017 at 04:23:50PM -0800, Jim DeLaHunt wrote:
So, RFC5895 "Mapping Characters for Internationalized Domain Names in Applications (IDNA) 2008" <https://www.rfc-editor.org/rfc/rfc5895.txt>, section 2 "The General Procedure", says,
4. If an implementation of this mapping is also performing the step of separation of the parts of a domain name into labels by using the FULL STOP character (U+002E), the IDEOGRAPHIC FULL STOP character (U+3002) can be mapped to the FULL STOP before label separation occurs. There are other characters that are used as "full stops" that one could consider mapping as label separators, but their use as such has not been investigated thoroughly.
Yes.
And UTS #46 "Unicode IDNA Compatibility Processing" <http://www.unicode.org/reports/tr46/>, section 2.3 "Notation", says,
In this document, a label is a substring of a domain name. That substring is bounded on both sides by either the start or the end of the string, or any of the following characters, called label-separators:
1. U+002E ( . ) FULL STOP 2. U+FF0E ( . ) FULLWIDTH FULL STOP 3. U+3002 ( 。 ) IDEOGRAPHIC FULL STOP 4. U+FF61 ( 。 ) HALFWIDTH IDEOGRAPHIC FULL STOP
Note that these would be covered by RFC 5895 too, by step 2 (where the fillwidth and halfwidth characters are decomposed), but it's a more general mechanism than that outlined by UTS #46. I think it is worth pointing out that UTS #46 is a pretty serious burr under the saddle in the relationship between the UTC and the IETF. This is partly because UTS #46 explicitly permits a number of labels that are clearly not permitted under IDNA2008 (see e.g., "For transitional use, the Compatibility Processing also allows domain names containing symbols and punctuation that were valid in IDNA2003, such as √.com (which has an associated web page). Such domain names containing symbols will gradually disappear as registries shift to IDNA2008.") In the IETF, when we have transition mechanisms we are generally required to specify how they work, or else they are regarded as hand-waving. There is basically no mechanism for such transition in UTS#46 ("registries shift to IDNA2008" is the very same transition as "implement IDNA2008", so it's not a mapping at all). The UTC is plainly the expert in the relevant character encodings and how that all functions within applications, but it is also plainly deficient in expertise in the area of network protocols, and the gap shows. The fact that the UTC and the IETF have been so far incapable of collaborating on this topic is IMO a problem. Part of the disagreement comes from a different stance: the IETF's general belief is that, if you're going to fail, declare failure early and then replace the bad protocol (and break stuff if you have to). UTC's approach maximises stability, which means that once something is out in the world you're more or less stuck with it (with a few limited exceptions). INDA2008 was intended to break certain cases early on the grounds that we could already see they were a problem; the most obvious ones were nailing the protocol to a version of Unicode and the expansion of the repertoire beyond LDH analogues. UTS#46's approach is, alas, delaying the reckoning with that damage, and may well have put it off forever (the WHATWG's approach to all of this hasn't helped).
From my point of view as a UASG explainer, this is good an sufficient grounding for a recommendation that apps treat U+3002 as a label separator. I would go further and warn people that this list might grow; that U+FF0E and U+FF61 may be on their way.
That's reasonable, yes, but I would not go too far. It's worth remembering that domain names are, at bottom, protocol elements. There's only so much munging one can do to protocol elements without introducing ambiguities that can be exploited by attackers.
(Interesting, I just noticed that UASG007 also recommends treating the Arabic full stop character “۔” (U+06D4) as a label separator. UTS #46 and RFC5885 don't mention that.)
Yeah, it hadn't been generally studied at the time, and I'm still not sure that the recommendation is ideal. I have heard but am not sure that in some Arabic-using writing systems (not the majority ones), there is some problem in the handling of that code point. I'm not clear on the details, but the population of languages that use Arabic characters for non-Arabic languages is way larger than the Han case. Best regards, A -- Andrew Sullivan ajs@anvilwalrusden.com
participants (10)
-
Andre Schappo -
Andrew Sullivan -
Andrew Sullivan -
Don Hollander -
Jiankang Yao -
Jim DeLaHunt -
Mark Svancarek -
Peter Green -
Simon Cousins -
Tex