UASG010 - Quick Guide to Linkification
A quick update on Linkification We have published an updated Quick Guide to Linkification https://uasg.tech/wp-content/uploads/2017/06/UASG010-Quick-Guide-to-Linkific... <https://uasg.tech/wp-content/uploads/2017/06/UASG010-Quick-Guide-to-Linkific...> This builds on discussions we had post the UASG meeting in Seattle in April. We are also working on an evaluation of Linkification in major Social Media Communication applications. (Here’s the link to the Help Wanted advertisement - Help Wanted: Linkification Evaluation <https://uasg.tech/wp-content/uploads/2016/11/Help-Wanted%E2%80%A6-Linkificat...>) This evaluation is being built on a replicable testing platform so that we can readily repeat the process in the future. While early days, we expect to provide a preliminary report during the ICANN60 meeting. As we go through the testing it is raising some additional questions about our Good Practice guide and expectations. We fully expect that once the evaluation is completed we’ll again review UASG010 based on real world experiences. Don Don Hollander Universal Acceptance Steering Group Skype: don_hollander
Hi, Some questions: 1) Do I understand correctly, that the recommendation to not linkify highly restrictive strings means that <mailto:tex@普遍接受-测试.世界> tex@普遍接受-测试.世界 would not become a link? Or http:// 普遍接受-测试.世界.com? Highly restrictive means that latin cannot be mixed with Chinese or Japanese characters. 2) I do not understand “Linkification should be determined by the implied intent of the user's entry” Is this intended to mean that the scheme (http, mailto, etc) should be added to form the link? Or some other determination of intent? If the former, it should be stated more clearly. tex From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Don Hollander Sent: Thursday, September 28, 2017 11:19 AM To: Universal Acceptance Subject: [UA-discuss] UASG010 - Quick Guide to Linkification A quick update on Linkification We have published an updated Quick Guide to Linkification https://uasg.tech/wp-content/uploads/2017/06/UASG010-Quick-Guide-to-Linkific... This builds on discussions we had post the UASG meeting in Seattle in April. We are also working on an evaluation of Linkification in major Social Media Communication applications. (Here’s the link to the Help Wanted advertisement - <https://uasg.tech/wp-content/uploads/2016/11/Help-Wanted%E2%80%A6-Linkificat...> Help Wanted: Linkification Evaluation) This evaluation is being built on a replicable testing platform so that we can readily repeat the process in the future. While early days, we expect to provide a preliminary report during the ICANN60 meeting. As we go through the testing it is raising some additional questions about our Good Practice guide and expectations. We fully expect that once the evaluation is completed we’ll again review UASG010 based on real world experiences. Don Don Hollander Universal Acceptance Steering Group Skype: don_hollander
On 9/29/2017 2:26 PM, Tex Texin wrote:
Hi,
Some questions:
1)Do I understand correctly, that the recommendation to not linkify highly restrictive strings means that tex@普遍接受-测试.世界 <mailto:tex@%E6%99%AE%E9%81%8D%E6%8E%A5%E5%8F%97-%E6%B5%8B%E8%AF%95.%E4%B8%96%E7%95%8C> would not become a link? Or http:// 普遍接受-测试.世界.com?
Highly restrictive means that latin cannot be mixed with Chinese or Japanese characters.
Some script mixing *within* a label should be restricted as it is a security risk. Script mixing across a FQDN or between local part and host seem to be rather likely scenarios instead. For certain scripts, ASCII admixture (just ASCII, not all of Latin) would be common practice in the writing system and it may be common enough/benign enough to allow it. However, you might also want to address European digits for those scripts where native digits exist and are widely / predominantly used, vs. scripts where the native digits are more of historic/cultural interest. (In Arabic you have both, depending on the region). Mixing digit sets in the same label should be a no-no and indicated something's not well-formed.
2)I do not understand “Linkification should be determined by the implied intent of the user's entry” Is this intended to mean that the scheme (http, mailto, etc) should be added to form the link? Or some other determination of intent? If the former, it should be stated more clearly.
My naive interpretation had to do with things like tables or data records where the purpose of a particular field would be a URL.
tex
*From:*ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] *On Behalf Of *Don Hollander *Sent:* Thursday, September 28, 2017 11:19 AM *To:* Universal Acceptance *Subject:* [UA-discuss] UASG010 - Quick Guide to Linkification
A quick update on Linkification
We have published an updated Quick Guide to Linkification https://uasg.tech/wp-content/uploads/2017/06/UASG010-Quick-Guide-to-Linkific... This builds on discussions we had post the UASG meeting in Seattle in April.
We are also working on an evaluation of Linkification in major Social Media Communication applications. (Here’s the link to the Help Wanted advertisement - Help Wanted: Linkification Evaluation <https://uasg.tech/wp-content/uploads/2016/11/Help-Wanted%E2%80%A6-Linkificat...>)
This evaluation is being built on a replicable testing platform so that we can readily repeat the process in the future. While early days, we expect to provide a preliminary report during the ICANN60 meeting. As we go through the testing it is raising some additional questions about our Good Practice guide and expectations. We fully expect that once the evaluation is completed we’ll again review UASG010 based on real world experiences.
Don
Don Hollander
Universal Acceptance Steering Group
Skype: don_hollander
Hmm, I don’t recall approving that principle (hopefully it was added while I was out on leave, and not just because I carelessly failed to notice it was being added). I mention that because it seems the opposite of what we could recommend i.e. we SHOULD allow use of Highly Restrictive and continue to discourage Moderately Restrictive. Do we need to revisit this? Sorry if I am just confused. Note that, as Asmus points out, our concern is about script-mixing within a label, not use of different scripts in different labels. Tex’s examples are all the latter, and should linkify cleanly by UA-ready SW. /marksv From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Asmus Freytag Sent: Friday, September 29, 2017 2:40 PM To: ua-discuss@icann.org Subject: Re: [UA-discuss] UASG010 - Quick Guide to Linkification On 9/29/2017 2:26 PM, Tex Texin wrote: Hi, Some questions: 1. Do I understand correctly, that the recommendation to not linkify highly restrictive strings means that tex@普遍接受-测试.世界<mailto:tex@%E6%99%AE%E9%81%8D%E6%8E%A5%E5%8F%97-%E6%B5%8B%E8%AF%95.%E4%B8%96%E7%95%8C> would not become a link? Or http:// 普遍接受-测试.世界.com? Highly restrictive means that latin cannot be mixed with Chinese or Japanese characters. Some script mixing *within* a label should be restricted as it is a security risk. Script mixing across a FQDN or between local part and host seem to be rather likely scenarios instead. For certain scripts, ASCII admixture (just ASCII, not all of Latin) would be common practice in the writing system and it may be common enough/benign enough to allow it. However, you might also want to address European digits for those scripts where native digits exist and are widely / predominantly used, vs. scripts where the native digits are more of historic/cultural interest. (In Arabic you have both, depending on the region). Mixing digit sets in the same label should be a no-no and indicated something's not well-formed. 1. I do not understand “Linkification should be determined by the implied intent of the user's entry” Is this intended to mean that the scheme (http, mailto, etc) should be added to form the link? Or some other determination of intent? If the former, it should be stated more clearly. My naive interpretation had to do with things like tables or data records where the purpose of a particular field would be a URL. tex From: ua-discuss-bounces@icann.org<mailto:ua-discuss-bounces@icann.org> [mailto:ua-discuss-bounces@icann.org] On Behalf Of Don Hollander Sent: Thursday, September 28, 2017 11:19 AM To: Universal Acceptance Subject: [UA-discuss] UASG010 - Quick Guide to Linkification A quick update on Linkification We have published an updated Quick Guide to Linkification https://uasg.tech/wp-content/uploads/2017/06/UASG010-Quick-Guide-to-Linkification.pdf<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuasg.tech%2Fwp-content%2Fuploads%2F2017%2F06%2FUASG010-Quick-Guide-to-Linkification.pdf&data=02%7C01%7Cmarksv%40microsoft.com%7Ce0ff7a322ef44f31a64208d507829df2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636423179950153243&sdata=gGQ5xhFTuJLf1kkFXWsOnfGdOH%2FMb0XbhK7tLiJfqzQ%3D&reserved=0> This builds on discussions we had post the UASG meeting in Seattle in April. We are also working on an evaluation of Linkification in major Social Media Communication applications. (Here’s the link to the Help Wanted advertisement - Help Wanted: Linkification Evaluation<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuasg.tech%2...>) This evaluation is being built on a replicable testing platform so that we can readily repeat the process in the future. While early days, we expect to provide a preliminary report during the ICANN60 meeting. As we go through the testing it is raising some additional questions about our Good Practice guide and expectations. We fully expect that once the evaluation is completed we’ll again review UASG010 based on real world experiences. Don Don Hollander Universal Acceptance Steering Group Skype: don_hollander
Thanks Mark and Asmus. I agree about the distinction of script mixing within a label. The guide should clarify this. Also support Asmus clarification regarding ASCII vs all of Latin. For the attention to mixing digits within a label, I agree although I would need to review if I can easily know which digits are widely used vs of historical interest. I don’t believe that being a bit broad in linkification acceptance is a problem. The domain registry (and perhaps servers) should be more restrictive to not allow domains that could represent spoofing. (I know there are problems with reliance on registries). Being too restrictive in linkification could hurt users that need to enter a legitimate URL and can’t. And it seems we need to clarify our implied intent for the guidance about the “implied intent of user’s entry”. J (I couldn’t resist any longer.) Tex From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Mark Svancarek via UA-discuss Sent: Friday, September 29, 2017 5:04 PM To: Asmus Freytag; ua-discuss@icann.org Subject: Re: [UA-discuss] UASG010 - Quick Guide to Linkification Hmm, I don’t recall approving that principle (hopefully it was added while I was out on leave, and not just because I carelessly failed to notice it was being added). I mention that because it seems the opposite of what we could recommend i.e. we SHOULD allow use of Highly Restrictive and continue to discourage Moderately Restrictive. Do we need to revisit this? Sorry if I am just confused. Note that, as Asmus points out, our concern is about script-mixing within a label, not use of different scripts in different labels. Tex’s examples are all the latter, and should linkify cleanly by UA-ready SW. /marksv From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Asmus Freytag Sent: Friday, September 29, 2017 2:40 PM To: ua-discuss@icann.org Subject: Re: [UA-discuss] UASG010 - Quick Guide to Linkification On 9/29/2017 2:26 PM, Tex Texin wrote: Hi, Some questions: 1. Do I understand correctly, that the recommendation to not linkify highly restrictive strings means that <mailto:tex@%E6%99%AE%E9%81%8D%E6%8E%A5%E5%8F%97-%E6%B5%8B%E8%AF%95.%E4%B8%96%E7%95%8C> tex@普遍接受-测试.世界 would not become a link? Or http:// 普遍接受-测试.世界.com? Highly restrictive means that latin cannot be mixed with Chinese or Japanese characters. Some script mixing *within* a label should be restricted as it is a security risk. Script mixing across a FQDN or between local part and host seem to be rather likely scenarios instead. For certain scripts, ASCII admixture (just ASCII, not all of Latin) would be common practice in the writing system and it may be common enough/benign enough to allow it. However, you might also want to address European digits for those scripts where native digits exist and are widely / predominantly used, vs. scripts where the native digits are more of historic/cultural interest. (In Arabic you have both, depending on the region). Mixing digit sets in the same label should be a no-no and indicated something's not well-formed. 2. I do not understand “Linkification should be determined by the implied intent of the user's entry” Is this intended to mean that the scheme (http, mailto, etc) should be added to form the link? Or some other determination of intent? If the former, it should be stated more clearly. My naive interpretation had to do with things like tables or data records where the purpose of a particular field would be a URL. tex From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Don Hollander Sent: Thursday, September 28, 2017 11:19 AM To: Universal Acceptance Subject: [UA-discuss] UASG010 - Quick Guide to Linkification A quick update on Linkification We have published an updated Quick Guide to Linkification https://uasg.tech/wp-content/uploads/2017/06/UASG010-Quick-Guide-to-Linkific... <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuasg.tech%2...> This builds on discussions we had post the UASG meeting in Seattle in April. We are also working on an evaluation of Linkification in major Social Media Communication applications. (Here’s the link to the Help Wanted advertisement - <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuasg.tech%2...> Help Wanted: Linkification Evaluation) This evaluation is being built on a replicable testing platform so that we can readily repeat the process in the future. While early days, we expect to provide a preliminary report during the ICANN60 meeting. As we go through the testing it is raising some additional questions about our Good Practice guide and expectations. We fully expect that once the evaluation is completed we’ll again review UASG010 based on real world experiences. Don Don Hollander Universal Acceptance Steering Group Skype: don_hollander
On Fri, Sep 29, 2017 at 05:28:50PM -0700, Tex Texin wrote:
The domain registry (and perhaps servers) should be more restrictive to not allow domains that could represent spoofing.
The problem is that that is not a well-specified criterion. Do you want registration rules about l vs 1? A -- Andrew Sullivan ajs@anvilwalrusden.com
Andrew, I only meant that the registry be more restrictive with issues such as identifying common vs historic native digits, not the more general problem of visually similar characters. If the list of common vs historic digits is well-defined and easily known then this is a non-issue, since it can be universally applied. If not, then we shouldn't prescribe that clients guess at the list. Asmus message indicated it wasn't well prescribed and varied regionally, unless I misunderstood his meaning. tex -----Original Message----- From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Andrew Sullivan Sent: Friday, September 29, 2017 5:32 PM To: ua-discuss@icann.org Subject: Re: [UA-discuss] UASG010 - Quick Guide to Linkification On Fri, Sep 29, 2017 at 05:28:50PM -0700, Tex Texin wrote:
The domain registry (and perhaps servers) should be more restrictive to not allow domains that could represent spoofing.
The problem is that that is not a well-specified criterion. Do you want registration rules about l vs 1? A -- Andrew Sullivan ajs@anvilwalrusden.com
On 9/29/2017 5:28 PM, Tex Texin wrote:
Thanks Mark and Asmus.
I agree about the distinction of script mixing within a label. The guide should clarify this.
Also support Asmus clarification regarding ASCII vs all of Latin.
For the attention to mixing digits within a label, I agree although I would need to review if I can easily know which digits are widely used vs of historical interest. I don’t believe that being a bit broad in linkification acceptance is a problem. The domain registry (and perhaps servers) should be more restrictive to not allow domains that could represent spoofing. (I know there are problems with reliance on registries). Being too restrictive in linkification could hurt users that need to enter a legitimate URL and can’t.
Digits come in sets that are specified as such in the Unicode Standard (although implicitly: the members of such sets have a property "decimal digit" and Unicode follows the convention of encoding these in complete sets from 0-9). Therefore, not linkifying something that contains a mixture of these sets can be implemented deterministically (although regex syntax leads to particularly grim expressions for specifying this constraint, it can be done). Realistically, only the modern set of about 30 scripts is of practical importance, so a scheme that does not track the addition of future historic alphabets in Unicode would be adequate.... Where native digits are (largely) historic holdovers, we wouldn't need them at all, but linkification isn't a good place to filter those. Some reluctance on automatic conversions of "risky" URLs would be a benefit; it's along the same line as not linkifying something not under the author's control: the risk for mischief is just too great. Forum software that I have been a user of tended to implement three restrictions that are not related to new TLDs or IDN TLDs: 1) limit file names by extension (e.g. if the link entered was supposed to be for an image, do not allow it to link to something that doesn't have a common image file extension). 2) disallow any link with a "?" in it - rationale: it's not a static link and who knows what will be served later (including risky stuff) 3) require http://, etc., even in text spans that are marked as being URLs or in link attributes. In some cases these restrictions were deliberate decisions by forum operators - part of reigning in certain kinds of forum spam. I feel we need to be cognizant of the needs for limiting the risk profile of certain operations - in particular where the result then winds up online to an open audience (as opposed to just sharing something in a private message). The alternative for an operator is to simply blacklist specific TLDs and domains (and most of those will be any IDNs that are not local to the operator...).
And it seems we need to clarify our implied intent for the guidance about the “implied intent of user’s entry”. J(I couldn’t resist any longer.)
:)
Tex
*From:*ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] *On Behalf Of *Mark Svancarek via UA-discuss *Sent:* Friday, September 29, 2017 5:04 PM *To:* Asmus Freytag; ua-discuss@icann.org *Subject:* Re: [UA-discuss] UASG010 - Quick Guide to Linkification
Hmm, I don’t recall approving that principle (hopefully it was added while I was out on leave, and not just because I carelessly failed to notice it was being added).
I mention that because it seems the opposite of what we could recommend i.e. we SHOULD allow use of Highly Restrictive and continue to discourage Moderately Restrictive. Do we need to revisit this? Sorry if I am just confused.
Note that, as Asmus points out, our concern is about script-mixing within a label, not use of different scripts in different labels. Tex’s examples are all the latter, and should linkify cleanly by UA-ready SW.
/marksv
*From:*ua-discuss-bounces@icann.org <mailto:ua-discuss-bounces@icann.org> [mailto:ua-discuss-bounces@icann.org] *On Behalf Of *Asmus Freytag *Sent:* Friday, September 29, 2017 2:40 PM *To:* ua-discuss@icann.org <mailto:ua-discuss@icann.org> *Subject:* Re: [UA-discuss] UASG010 - Quick Guide to Linkification
On 9/29/2017 2:26 PM, Tex Texin wrote:
Hi,
Some questions:
1. Do I understand correctly, that the recommendation to not linkify highly restrictive strings means that tex@普遍接受-测试.世界 <mailto:tex@%E6%99%AE%E9%81%8D%E6%8E%A5%E5%8F%97-%E6%B5%8B%E8%AF%95.%E4%B8%96%E7%95%8C> would not become a link? Or http:// 普遍接受-测试.世界.com?
Highly restrictive means that latin cannot be mixed with Chinese or Japanese characters.
Some script mixing *within* a label should be restricted as it is a security risk. Script mixing across a FQDN or between local part and host seem to be rather likely scenarios instead.
For certain scripts, ASCII admixture (just ASCII, not all of Latin) would be common practice in the writing system and it may be common enough/benign enough to allow it.
However, you might also want to address European digits for those scripts where native digits exist and are widely / predominantly used, vs. scripts where the native digits are more of historic/cultural interest. (In Arabic you have both, depending on the region).
Mixing digit sets in the same label should be a no-no and indicated something's not well-formed.
2. I do not understand “Linkification should be determined by the implied intent of the user's entry” Is this intended to mean that the scheme (http, mailto, etc) should be added to form the link? Or some other determination of intent? If the former, it should be stated more clearly.
My naive interpretation had to do with things like tables or data records where the purpose of a particular field would be a URL.
tex
*From:*ua-discuss-bounces@icann.org <mailto:ua-discuss-bounces@icann.org> [mailto:ua-discuss-bounces@icann.org] *On Behalf Of *Don Hollander *Sent:* Thursday, September 28, 2017 11:19 AM *To:* Universal Acceptance *Subject:* [UA-discuss] UASG010 - Quick Guide to Linkification
A quick update on Linkification
We have published an updated Quick Guide to Linkification https://uasg.tech/wp-content/uploads/2017/06/UASG010-Quick-Guide-to-Linkific... <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuasg.tech%2...> This builds on discussions we had post the UASG meeting in Seattle in April.
We are also working on an evaluation of Linkification in major Social Media Communication applications. (Here’s the link to the Help Wanted advertisement - Help Wanted: Linkification Evaluation <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuasg.tech%2...>)
This evaluation is being built on a replicable testing platform so that we can readily repeat the process in the future. While early days, we expect to provide a preliminary report during the ICANN60 meeting. As we go through the testing it is raising some additional questions about our Good Practice guide and expectations. We fully expect that once the evaluation is completed we’ll again review UASG010 based on real world experiences.
Don
Don Hollander
Universal Acceptance Steering Group
Skype: don_hollander
Asmus, I can understand why forum operators would apply those restrictions, but as you indicate, they don’t apply to new links in other contexts. However, coming back to linkification, the guidelines don’t address the query and fragment portions of a URL. As with the distinction to apply the script set rules to labels, it is worth pointing out in the guidelines that those rules do not apply to the portion after the “?” unless perhaps the query portion in turn is a URL. Example 1 http://domain.com/?refer= http://newdomain.com Example 2 http://domain.com/?title=script1 <http://domain.com/?title=script1&author=script2"e=script1+script2+script...> &author=script2"e=script1+script2+script3 The guidelines doesn’t address escapes. Should linkification attempt to unescape escaped characters? That might make the process much more complex. However, ignoring escapes might also lead to very inconsistent results. tex From: Asmus Freytag (c) [mailto:asmusf@ix.netcom.com] Sent: Friday, September 29, 2017 7:41 PM To: Tex Texin; 'Mark Svancarek' Cc: 'Universal Acceptance' Subject: Re: [UA-discuss] UASG010 - Quick Guide to Linkification On 9/29/2017 5:28 PM, Tex Texin wrote: Thanks Mark and Asmus. I agree about the distinction of script mixing within a label. The guide should clarify this. Also support Asmus clarification regarding ASCII vs all of Latin. For the attention to mixing digits within a label, I agree although I would need to review if I can easily know which digits are widely used vs of historical interest. I don’t believe that being a bit broad in linkification acceptance is a problem. The domain registry (and perhaps servers) should be more restrictive to not allow domains that could represent spoofing. (I know there are problems with reliance on registries). Being too restrictive in linkification could hurt users that need to enter a legitimate URL and can’t. Digits come in sets that are specified as such in the Unicode Standard (although implicitly: the members of such sets have a property "decimal digit" and Unicode follows the convention of encoding these in complete sets from 0-9). Therefore, not linkifying something that contains a mixture of these sets can be implemented deterministically (although regex syntax leads to particularly grim expressions for specifying this constraint, it can be done). Realistically, only the modern set of about 30 scripts is of practical importance, so a scheme that does not track the addition of future historic alphabets in Unicode would be adequate.... Where native digits are (largely) historic holdovers, we wouldn't need them at all, but linkification isn't a good place to filter those. Some reluctance on automatic conversions of "risky" URLs would be a benefit; it's along the same line as not linkifying something not under the author's control: the risk for mischief is just too great. Forum software that I have been a user of tended to implement three restrictions that are not related to new TLDs or IDN TLDs: 1) limit file names by extension (e.g. if the link entered was supposed to be for an image, do not allow it to link to something that doesn't have a common image file extension). 2) disallow any link with a "?" in it - rationale: it's not a static link and who knows what will be served later (including risky stuff) 3) require http://, etc., even in text spans that are marked as being URLs or in link attributes. In some cases these restrictions were deliberate decisions by forum operators - part of reigning in certain kinds of forum spam. I feel we need to be cognizant of the needs for limiting the risk profile of certain operations - in particular where the result then winds up online to an open audience (as opposed to just sharing something in a private message). The alternative for an operator is to simply blacklist specific TLDs and domains (and most of those will be any IDNs that are not local to the operator...). And it seems we need to clarify our implied intent for the guidance about the “implied intent of user’s entry”. J (I couldn’t resist any longer.) :) Tex From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Mark Svancarek via UA-discuss Sent: Friday, September 29, 2017 5:04 PM To: Asmus Freytag; ua-discuss@icann.org Subject: Re: [UA-discuss] UASG010 - Quick Guide to Linkification Hmm, I don’t recall approving that principle (hopefully it was added while I was out on leave, and not just because I carelessly failed to notice it was being added). I mention that because it seems the opposite of what we could recommend i.e. we SHOULD allow use of Highly Restrictive and continue to discourage Moderately Restrictive. Do we need to revisit this? Sorry if I am just confused. Note that, as Asmus points out, our concern is about script-mixing within a label, not use of different scripts in different labels. Tex’s examples are all the latter, and should linkify cleanly by UA-ready SW. /marksv From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Asmus Freytag Sent: Friday, September 29, 2017 2:40 PM To: ua-discuss@icann.org Subject: Re: [UA-discuss] UASG010 - Quick Guide to Linkification On 9/29/2017 2:26 PM, Tex Texin wrote: Hi, Some questions: 1. Do I understand correctly, that the recommendation to not linkify highly restrictive strings means that <mailto:tex@%E6%99%AE%E9%81%8D%E6%8E%A5%E5%8F%97-%E6%B5%8B%E8%AF%95.%E4%B8%96%E7%95%8C> tex@普遍接受-测试.世界 would not become a link? Or http:// 普遍接受-测试.世界.com? Highly restrictive means that latin cannot be mixed with Chinese or Japanese characters. Some script mixing *within* a label should be restricted as it is a security risk. Script mixing across a FQDN or between local part and host seem to be rather likely scenarios instead. For certain scripts, ASCII admixture (just ASCII, not all of Latin) would be common practice in the writing system and it may be common enough/benign enough to allow it. However, you might also want to address European digits for those scripts where native digits exist and are widely / predominantly used, vs. scripts where the native digits are more of historic/cultural interest. (In Arabic you have both, depending on the region). Mixing digit sets in the same label should be a no-no and indicated something's not well-formed. 2. I do not understand “Linkification should be determined by the implied intent of the user's entry” Is this intended to mean that the scheme (http, mailto, etc) should be added to form the link? Or some other determination of intent? If the former, it should be stated more clearly. My naive interpretation had to do with things like tables or data records where the purpose of a particular field would be a URL. tex From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Don Hollander Sent: Thursday, September 28, 2017 11:19 AM To: Universal Acceptance Subject: [UA-discuss] UASG010 - Quick Guide to Linkification A quick update on Linkification We have published an updated Quick Guide to Linkification https://uasg.tech/wp-content/uploads/2017/06/UASG010-Quick-Guide-to-Linkific... <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuasg.tech%2...> This builds on discussions we had post the UASG meeting in Seattle in April. We are also working on an evaluation of Linkification in major Social Media Communication applications. (Here’s the link to the Help Wanted advertisement - <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuasg.tech%2...> Help Wanted: Linkification Evaluation) This evaluation is being built on a replicable testing platform so that we can readily repeat the process in the future. While early days, we expect to provide a preliminary report during the ICANN60 meeting. As we go through the testing it is raising some additional questions about our Good Practice guide and expectations. We fully expect that once the evaluation is completed we’ll again review UASG010 based on real world experiences. Don Don Hollander Universal Acceptance Steering Group Skype: don_hollander
I think we have not also considered this RFC 3896 for URL normalization. https://tools.ietf.org/html/rfc3986 P Let's also consider and make it part of recommendation. Now on mix script. :- When URL is containing absolute page and parameter http://www.example.com/page.html?parameter=1 Or http://www.example.com/पेज.html?parameter=1 Or http://www.example.com/पेज/page.html?parameter=1 Notice path and page.html is in Hindi and allowed . So, Tex may be we need to consider only till TLD , after that it can be mix script and may have to be allowed. Wish wishes Ajay On 30 September 2017 08:52:47 GMT+05:30, Tex Texin <textexin@xencraft.com> wrote:
Asmus, I can understand why forum operators would apply those restrictions, but as you indicate, they don’t apply to new links in other contexts.
However, coming back to linkification, the guidelines don’t address the query and fragment portions of a URL. As with the distinction to apply the script set rules to labels, it is worth pointing out in the guidelines that those rules do not apply to the portion after the “?” unless perhaps the query portion in turn is a URL.
Example 1 http://domain.com/?refer= http://newdomain.com
Example 2 http://domain.com/?title=script1 <http://domain.com/?title=script1&author=script2"e=script1+script2+script...> &author=script2"e=script1+script2+script3
The guidelines doesn’t address escapes. Should linkification attempt to unescape escaped characters? That might make the process much more complex. However, ignoring escapes might also lead to very inconsistent results.
tex
From: Asmus Freytag (c) [mailto:asmusf@ix.netcom.com] Sent: Friday, September 29, 2017 7:41 PM To: Tex Texin; 'Mark Svancarek' Cc: 'Universal Acceptance' Subject: Re: [UA-discuss] UASG010 - Quick Guide to Linkification
On 9/29/2017 5:28 PM, Tex Texin wrote:
Thanks Mark and Asmus.
I agree about the distinction of script mixing within a label. The guide should clarify this.
Also support Asmus clarification regarding ASCII vs all of Latin.
For the attention to mixing digits within a label, I agree although I would need to review if I can easily know which digits are widely used vs of historical interest. I don’t believe that being a bit broad in linkification acceptance is a problem. The domain registry (and perhaps servers) should be more restrictive to not allow domains that could represent spoofing. (I know there are problems with reliance on registries). Being too restrictive in linkification could hurt users that need to enter a legitimate URL and can’t.
Digits come in sets that are specified as such in the Unicode Standard (although implicitly: the members of such sets have a property "decimal digit" and Unicode follows the convention of encoding these in complete sets from 0-9). Therefore, not linkifying something that contains a mixture of these sets can be implemented deterministically (although regex syntax leads to particularly grim expressions for specifying this constraint, it can be done).
Realistically, only the modern set of about 30 scripts is of practical importance, so a scheme that does not track the addition of future historic alphabets in Unicode would be adequate....
Where native digits are (largely) historic holdovers, we wouldn't need them at all, but linkification isn't a good place to filter those.
Some reluctance on automatic conversions of "risky" URLs would be a benefit; it's along the same line as not linkifying something not under the author's control: the risk for mischief is just too great.
Forum software that I have been a user of tended to implement three restrictions that are not related to new TLDs or IDN TLDs:
1) limit file names by extension (e.g. if the link entered was supposed to be for an image, do not allow it to link to something that doesn't have a common image file extension).
2) disallow any link with a "?" in it - rationale: it's not a static link and who knows what will be served later (including risky stuff)
3) require http://, etc., even in text spans that are marked as being URLs or in link attributes.
In some cases these restrictions were deliberate decisions by forum operators - part of reigning in certain kinds of forum spam.
I feel we need to be cognizant of the needs for limiting the risk profile of certain operations - in particular where the result then winds up online to an open audience (as opposed to just sharing something in a private message).
The alternative for an operator is to simply blacklist specific TLDs and domains (and most of those will be any IDNs that are not local to the operator...).
And it seems we need to clarify our implied intent for the guidance about the “implied intent of user’s entry”. J (I couldn’t resist any longer.)
:)
Tex
From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Mark Svancarek via UA-discuss Sent: Friday, September 29, 2017 5:04 PM To: Asmus Freytag; ua-discuss@icann.org Subject: Re: [UA-discuss] UASG010 - Quick Guide to Linkification
Hmm, I don’t recall approving that principle (hopefully it was added while I was out on leave, and not just because I carelessly failed to notice it was being added).
I mention that because it seems the opposite of what we could recommend i.e. we SHOULD allow use of Highly Restrictive and continue to discourage Moderately Restrictive. Do we need to revisit this? Sorry if I am just confused.
Note that, as Asmus points out, our concern is about script-mixing within a label, not use of different scripts in different labels. Tex’s examples are all the latter, and should linkify cleanly by UA-ready SW.
/marksv
From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Asmus Freytag Sent: Friday, September 29, 2017 2:40 PM To: ua-discuss@icann.org Subject: Re: [UA-discuss] UASG010 - Quick Guide to Linkification
On 9/29/2017 2:26 PM, Tex Texin wrote:
Hi,
Some questions:
1. Do I understand correctly, that the recommendation to not linkify highly restrictive strings means that <mailto:tex@%E6%99%AE%E9%81%8D%E6%8E%A5%E5%8F%97-%E6%B5%8B%E8%AF%95.%E4%B8%96%E7%95%8C> tex@普遍接受-测试.世界 would not become a link? Or http:// 普遍接受-测试.世界..com?
Highly restrictive means that latin cannot be mixed with Chinese or Japanese characters.
Some script mixing *within* a label should be restricted as it is a security risk. Script mixing across a FQDN or between local part and host seem to be rather likely scenarios instead.
For certain scripts, ASCII admixture (just ASCII, not all of Latin) would be common practice in the writing system and it may be common enough/benign enough to allow it.
However, you might also want to address European digits for those scripts where native digits exist and are widely / predominantly used, vs. scripts where the native digits are more of historic/cultural interest. (In Arabic you have both, depending on the region).
Mixing digit sets in the same label should be a no-no and indicated something's not well-formed.
2. I do not understand “Linkification should be determined by the implied intent of the user's entry” Is this intended to mean that the scheme (http, mailto, etc) should be added to form the link? Or some other determination of intent? If the former, it should be stated more clearly.
My naive interpretation had to do with things like tables or data records where the purpose of a particular field would be a URL.
tex
From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Don Hollander Sent: Thursday, September 28, 2017 11:19 AM To: Universal Acceptance Subject: [UA-discuss] UASG010 - Quick Guide to Linkification
A quick update on Linkification
We have published an updated Quick Guide to Linkification https://uasg.tech/wp-content/uploads/2017/06/UASG010-Quick-Guide-to-Linkific... <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuasg.tech%2...> This builds on discussions we had post the UASG meeting in Seattle in April.
We are also working on an evaluation of Linkification in major Social Media Communication applications. (Here’s the link to the Help Wanted advertisement - <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuasg.tech%2...> Help Wanted: Linkification Evaluation)
This evaluation is being built on a replicable testing platform so that we can readily repeat the process in the future. While early days, we expect to provide a preliminary report during the ICANN60 meeting. As we go through the testing it is raising some additional questions about our Good Practice guide and expectations. We fully expect that once the evaluation is completed we’ll again review UASG010 based on real world experiences.
Don
Don Hollander
Universal Acceptance Steering Group
Skype: don_hollander
-- Sent from my Android device with BharatSync Communicator.
participants (7)
-
AJAY DATA -
Andrew Sullivan -
Asmus Freytag -
Asmus Freytag (c) -
Don Hollander -
Mark Svancarek -
Tex Texin