At 13:08 03/09/2011, Cary Karp wrote:
I urgently suggest that we expand our descriptive terminology with the term "homoglyph" to designate situations such as the one used in the Cyrillic/Latin illustration above.
No! :-) Cary, I fully understand your point and I support it. But not at the price of an added confusion. Glyphs are definitely out of our scope. Characters are signs that are graphed. The way they are graphed is not an issue for computer protocols and registries. The problem we meet here is that we use Unicode/ISO 10646 which distinguish between the graphed signs on non sign, non graph related premises. ISO10646/Unicode have cons and pros. One of these cons is to introduce a confusion in the use of some signs. To address this "unisoconfusable" characters issue we need an anti-homographic canonalization algorithm. This algorithm may based on unigraph (graphic signs) or unisign (general semiotic) tables or correspondances or on any other idea you might have. In the current IUse work, we start from 63.000+ 16x16 or 16x8 bitmaps on an excel table. An immediate sort shows around 10.000 strictly unisoconfusable graphs (same bitmap). Our problem is to find a complete code point description table, fill it with bitmaps representations, work on their positionning (for exemple all of them locked in one of the four corners and centered), comparabilities from human indications, etc. and come-up with different tables corresponding to degrees of confusability and check the results from real operations experimentation. Then the confusability algorithm should be amended from the experimentation inputs. Once we have obtained this, string confusability should be added through human inputs to IANA. this is why the happiana mailing list is concerned. The resulting registry may be quite important in size (and therefore in term of traffic) and the registration/validation process will be an industry issue and probably a perpetual battle if confusables are not also displayed in a cultural appropriate manner. This method should then also be applicable to check logo confusability, etc. jfc