Siavash, Daniel,
 
Thank you for your clarifications, I will research both problems further. I've also contacted my former professor of linguistics to ask for her advice and guidance on terminology and available data. I hope to access some formalised knowledge on variants in different languages and scripts and will naturally share any discoveries and data with the mailing list.
 
Regards,
Nadya


2011/6/23 Siavash Shahshahani <shahshah@irnic.ir>
Hello Nadya,
Just one remark about a part of your comments (since you asked to be
corrected about this):

On Thu, 23 Jun 2011 11:46:17 +0100, Nadya Morozova
<nad.morozova@gmail.com>
wrote:

> I don’t speak Arabic ...

Nor do I and some 300 million other people who use the so-called Arabic
script, but not the Arabic language. And that is a major part of the
variant problem in Arabic script. Just as Latin script is used by a number
of varied languages, so is Arabic script.

>From what I see, most “variants” in Arabic scripts stem from the optional
> tashkeel diacritics modifying consonant letters to show which vowels to
> read them with. ...

No, there are more vital problems. There are even consonants with
different Unicodes that look exactly the same when they occur in certain
parts of the word (e.g., at the beginning and middle, but not at the end of
a word). And what is more, some dozen country names considered as IDN TLDs
contain one or more of these characters, hence the need for variant TLDs.

Regards,
Siavash