Probably Mark, we shall look more deeper on this page https://shkspr.mobi/blog/2018/11/domain-hacks-with-unusual-unicode-character... I copy the link by right clicking on the hyper link, I get https://edent.tel/ and .tel is represented as three characters, not as one symbol.When I inspect and copy the element from chrome browser I see https://edent.℡/ and here I find ℡ as one symbol not three characters. Is there any encoding/decoding method like punycode for these special symbols , which browsers are following. What makes browser map these symbols to three different characters.. ?thanks Dr. Ajay DATA | Founder & CEO Get email id like अजय@डाटा.भारत in your own language,visit www.xgenplus.com From: "Mark Svancarek (CELA) via UA-discuss" MailId : [86168589]To: Andrew Sullivan ,"ua-discuss@icann.org" Subject: Re: [UA-discuss] Fun with UnicodeDate: 10 Nov 2018 05:52:03 AM Should we file bugs against browsers with these behaviors?-----Original Message-----From: Andrew Sullivan Sent: Friday, November 9, 2018 4:19 PMTo: Mark Svancarek (CELA) ua-discuss@icann.orgSubject: Re: [UA-discuss] Fun with UnicodeMany of these are invalid under 2008. Yet another example where sticking to the old protocol or UTS#46 hurts us.A--Please excuse my clumbsy thums----------On November 10, 2018 04:19:35 "Mark Svancarek \(CELA\) via UA-discuss" wrote:> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fshksp> r.mobi%2Fblog%2F2018%2F11%2Fdomain-hacks-with-unusual-unicode-characte> rs%2F&data=02%7C01%7Cmarksv%40microsoft.com%7C25ef81dd6f7c415d3fb0> 08d646a22738%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636774059725> 382049&sdata=4tKv6zQWKCHBTTMrwz24D4d90iom%2B%2BRcFrWk00uclew%3D&am> preserved=0>> Actually, the remapping by browsers seems badDo not Remove:[HID]20181110055203988[-HID] [XGENFOOTER] [-XGENFOOTER]
On 11/10/2018 1:11 AM, Dr Ajay Data wrote:
Is there any encoding/decoding method like punycode for these special symbols , which browsers are following. What makes browser map these symbols to three different characters.. ?
Unicode *compatibility* decomposition. Probably the browsers are applying normalization form NF*K*C to the input data. That normalization form is defined as applying compatibility decomposition followed by *canonical* composition. As a result of NFKC the data is in NFC. Likewise you will find browsers do accept uppercase strings for IDNs and apply case folding to lower case before resolving. This allows users to enter IDNs in uppercase, even though IDNs are only lowercase per IDNA 2008. A./
participants (2)
-
Asmus Freytag -
Dr Ajay Data