In article <2189ae2a-adcd-119d-9d0d-d19d16544110@ix.netcom.com> you write:Making a start: * How to handle illegal sequences in complex scripts?Don't Do That.
Guidance might consist of telling people that there are things
like illegal sequences.
(For examples see chapter 12 of the Unicode Standard and look for "Do not use" - those particular ones are cut&dried)
* How to handle non-standard sequences in complex scripts?Don't Do That, either.
Guidance might consist to telling people what such sequences are.
(Beyond the "do not use" there is another layer of context rules
for complex scripts like the ones you find in RZ-LGR-3 that would
rule out additional sequences that structurally invalid, of if you
want, by definition meaningless for a given script.)
In some cases, alternative sequences exist for legacy reasons and for mail boxes, might be appropriate for fuzzy matching.
Guidance might consist of identifying these (or give examples
plus sources for more complete info).
We know that it is technically valid to put a close paren in an ASCII mailbox name, but no sensible person expects that to work, and I can't get too upset at web forms or whatever that don't allow them.* I think we can avoid a lot of pain by starting with the assumption that EAI addresses will be something that looks plausible in some natural language.
Tricky one, that. You don't want to go into spelling rules. Like some languages don't use "w", but then, people who speak that language may have names that start with W. For complex scripts, certain pairings make no sense, because they don't correspond to any realistic sound in the languages that use it, but are in fact no worse than "nonsense" words in ASCII. I'm not sure whether I would call "phhhbzz@example.com" plausible in any language, but we can agree it still works as mnemonic for users of English.
ph)xxt@example.com is indeed more marginal.
Worse if you plant an accent mark on it.
That kind of thing approaches the structural issues seen in complex scripts.
A./
R's, John * - on the other hand, hyphens and + signs are fine