Yes, bidi is hard but fascinating.

 

From my work with text stacks, my understanding is that the assumption that something that is rtl.rtl.ltr has a predetermined rendering order is incorrect. It really will depend upon what is seen as the first strongly typed character in the first domain name. The Arabic/Hebrew/N’ko scripts all have an RTL script order within the RTL text direction for each language. Arabic and Hebrew both have characters commonly used (Unicode common) that the BiDi algorithm is required to treat as strongly typed LRT script order. Because of that, I doubt it’s enough to specify just the text direction for each element.

 

From: ua-discuss-bounces@icann.org [mailto:ua-discuss-bounces@icann.org] On Behalf Of Richard Merdinger
Sent: Wednesday, August 9, 2017 1:31 PM
To: Andrew Sullivan <ajs@anvilwalrusden.com>; ua-discuss@icann.org
Subject: Re: [UA-discuss] Programming Language Hacks - UA103

 

Makes sense to me; I like mentioning the major-use writing system to make the point, but it also makes it clear that it is broader than a single case.

 

--Rich

 

Richard Merdinger

VP, Domains - GoDaddy

rmerdinger@godaddy.com

 

 

 

From: <ua-discuss-bounces@icann.org> on behalf of Andrew Sullivan <ajs@anvilwalrusden.com>
Date: Wednesday, August 9, 2017 at 3:19 PM
To: "ua-discuss@icann.org" <ua-discuss@icann.org>
Subject: Re: [UA-discuss] Programming Language Hacks - UA103

 

On Wed, Aug 09, 2017 at 04:13:35PM +0000, Mark Svancarek via UA-discuss wrote:

Actually, we recently discovered an Edge bug (via the browser review) where the order of labels in a RTL.RTL.ASCII domain name were transposed during rendering.  So I like calling it out explicitly.

 

This has been a regularly-recurring bug in various rendering engines

since at least 2008, because I recall the demonstrations of it during

the idnabis WG, and then seeing it in a completely different context

during the VIP work for ICANN in 2011 or '12.  It's not always only

Arabic: at least one of the examples was reproducible in any bidi

context.  I seem to recall one example where the wire order

 

    [firstlabel]RTL[secondlabel]RTL[thirdlabel]LTR[fourthlabel]NULL

 

got rendered as

 

    RTL.LTR.RTL

 

Which I thought was a pretty cool bug.  I have no idea how it happened

that way, though I recall walking mysef through the bidi algorithm at

the time and figuring out what the problem must have been.  Bidi is

hard.

 

I therefore think it wise not to call out Arabic especially -- but

maybe point out that Arabic is perhaps the most prominent writing

system that uses RTL, so that programmers aren't tempted to dismiss

the problem as a "corner case".  Big corner, the Arabic-using

population!

 

Best regards,

 

A

 

--

Andrew Sullivan

ajs@anvilwalrusden.com