Hi Asmus,
Thanks for the quick response. Comments inline below.
On Tuesday, November 5, 2019, 08:06:08 PM GMT-5, Asmus Freytag <asmusf@ix.netcom.com> wrote:
Bill,
in continuation of our discussion, here's how I would have replied if you had asked this in the session:
First, you write: "The ideal solution, of course, would be for the Unicode folks to create new pre-composed code points for these problem cases. But I suspect there is little chance of them doing so before our report is due. So, we will have to figure out an alternate approach to recommend."
Unicode has an explicit policy of not adding any more precomposed code points for the kinds of combinations considered. So there's a definite answer that such will not happen. Ever.
>> Good to know what Unicode's policy is on this. I wonder why, given that they have a bunch of pre-composed code points which are not used in any of what we fondly believe are the major languages using the Latin alphabet. Presumably they had their reasons for choosing the ones that they did.>> If those reasons include indications that we missed a major language or three that we should have included, that would be useful to know ASAP.
==> The policy is simply not to encode anything that would get
a canonical decomposition. Details in either the Unicode Standard
Core Specification or UAX#15. There may also be an FAQ out there
on Normalization. This forces vendors to support combinations if
they want to support certain languages and you see that effect.
The technology continues to get better.
I would say that Courier New is perhaps an unfortunate choice of reference font. Some other people may have more details, or actual knowledge of what MSFT's plan is for that font, but it is my impression that the Courier New font was state of the art in the past, and it certainly looks like has not been maintained actively to cover more languages (and frankly, I can't recall seeing it much recently).
>> We'll need to discuss whether to shift to a different mono-width font for out analysis. On one hand, there would be a lot of work to consider redoing -- which would take time that we probably don't have.
==> Correct, you do *not* have the time.
==> Correct, you can substitute a font in ongoing analysis (but you don't have much time for that, either - and the IP, being somewhat knowledgable in the Latin script, do not anticipate a large set of in-script variants. Most cases would surely be "confusables" and can be documented separately (such informative documentation can also be prepared during the public comment period).On the other hand, there's something to be said for using the same set of fonts throughout the analysis. Perhaps we can decide that one exception, for the non-pre-composed cases, is the least bad solution. As I say, we'll have to thrash it out.
There are other more recent monowidth fonts such as Lucida Console. See screen shot at the end. I've also appended the results for Segoe UI which is the font used in my browser (Firefox on Windows7).
>> Clearly we have been handicapped by none of us being expert in which fonts are growing obsolete and which are more current. As you say, the universe of Latin fonts is enormous. Clearly we couldn't look at anything like the whole. For example, we totally ignored all the cursive-based fonts -- which would have, among other things, generated a bunch more variants. But we are where we are at the moment.
==> If you had told us that you were looking at cursive fonts,
we would have probably had something to say about - in our view
that is taking the issue too far. by far.
There's a near infinite universe of Latin-script fonts, and many do not attempt to cover the entire script. If we include hyperlinks in text (those showing the URL) there is no way we can predict which fonts a user will see a domain name in.
We have three choices here:
(1) remove from the Latin LGR all code points/sequences not rendered reliably in any font(2) remove from the Latin LGR all code points/sequences not rendered reliably in any "well-known" font(3) remove from the Latin LGR all code points/sequences not rendered reliably in common user interface fonts: Windows, iOS, Android and all browsers if they don't use platform fonts (latest version)
Because of the way Latin-script fonts tend to subset, there's no way that (1) is a reasonable choice in my view.
>> Absolutely agree. Or even possible.
==> Good.
The problem with (2) is that some "well-known" fonts are tied to early versions of a given platform and they *may* not be maintained any longer - while some of them are still widely used, they have been replaced for UI purposes by more modern / more capable fonts. Effectively, they may be retained as legacy - so that you can still view and edit documents that were created in them. Less well-known fonts (such as Arial Unicode MS) may not have made the cut and aren't routinely available any more. So the fact that a font is well-known increases the likelihood that it is a legacy font. Taken together, these considerations would argue against (2).
>> As noted, we would need outside advice on which fonts are both "well-known" and modern (i.e. not legacy) in order to attempt 2.
==> In that case, I think it's good we are not doing (2).
That leaves (3) as a "reasonable" choice for making a cut. I know you'd appreciate that choice of term :). It is also effectively forward-looking, because more support tends to be added to newer fonts/systems and that process looks like it would only continue.
By all indication, more modern text fonts like Calibri, and modern UI fonts like Segoe UI do not have issues with these code points, and I simply can't imagine Google's Noto fonts would either.
>> I'm not quite clear what you are recommending that we do here. Are you suggesting that we go back and redo using these three fonts? Or something else?
==> I think your absolute and unquestioning first priority is finishing a public draft. You have done enough work and have the feedback from the IP (and will get some more next week) to complete that task.
==> If, during public comment, somebody can demonstrate an issue using a recent phone, browser or OS, you can take corrective action in the final draft and remove some code points before publication. I don't expect you will find any cases, because the modern OSs and their users interface fonts are very good in handling combining sequences.
==> The recommendation follows from the way (3) is worded.
It's not worded as "start a research project to find possible
issues in an unspecified list of fonts, but conversely: act, if
and when you have intelligence (from whatever source) of a clear
defect."
Looking at the screen shot in context with the reasoning above, it seems to me that we are good, but if the Latin GP wants to document the issue (that many Latin fonts do subset the range of code points/glyphs/combinations that they support), that would be OK (if it doesn't otherwise delay the project).
>> OK, notwithstanding the above, I'm reading this as saying that1) we can stick with the fonts we are using, and2) we can continue including the combination glyphs that I was concerned about, regardless of their issues in Courier New
>> If that is not a correct understanding, please let me know.
==> Correct; at the moment you have no indication that the combining sequences should be rejected (item (3)) and for a quick check for variants your 3 fonts should be fine. If you could add Segoe UI as a 4th column in your tests, and can do that cheaply and quickly, go for it. But please come back telling use that there are very few in-script variants beyond shwa and underlining. :)
>> In Word (Windows 10), I get ɛ̱̈e rendered as problematic even in Lucida Console -- although it renders fine in Firefox for email. Just FYI. Inconsistency among word processing softwares is a real pain, but one we will probably never get away from.
==> Looks fine in Word in Windows 7. Go figure.
==> While Lucida Console is newer than Courier New, the use of monowidth fonts is problematic for so many reasons that basing the LGR design on their shortcomings is not something I would contemplate.
==> All we can do is hope that the technology is going forward.
==> Let's focus on getting this wrapped up as expeditiously as
possible.
A./
A./PS: I have blind copied the other IP members
Screenshot:Instead of Arial, the screenshot shows Calibri in the left column, d