* How to handle illegal sequences in complex scripts? Don't Do That.
Guidance might consist of telling people that there are things like illegal sequences.
Sure. I don't want to be cruel, I want to encourage people to know when something isn't likely to work so reject it early.
we can avoid a lot of pain by starting with the assumption that EAI addresses will be something that looks plausible in some natural language.
Tricky one, that. You don't want to go into spelling rules. Like some languages don't use "w", but then, people who speak that language may have names that start with W. For complex scripts, certain pairings make no sense, because they don't correspond to any realistic sound in the languages that use it, but are in fact no worse than "nonsense" words in ASCII. I'm not sure whether I would call "phhhbzz@example.com" plausible in any language, but we can agree it still works as mnemonic for users of English.
Plausible may not be the right word, but I think you get the idea. Perhaps say it's something that a speaker of a language would be able to type and to read or spell over the phone without undue effort.
ph)xxt@example.com is indeed more marginal.
Actually, it's invalid because local parts that contain parens need to be quoted. On the other hand, braces don't need to be quoted so this is valid but I wouldn't be too upset if a web form didn't accept it: ph}xxt@example.com
Worse if you plant an accent mark on it.
Now we're into fuzzy match territory. I think that for robustness an MTA should accept local part characters with wrong or missing diacriticals, and mail system admins shouldn't assign names that differ only in diacriticals. Regards, John Levine, john.levine@standcore.com Standcore LLC