Dear colleagues,

so that clarifies that question - thanks Abdeslam.

Coming back to your questions Chris - I believe combining marks could be excluded, as was done in the case of Arabic LGR. Meanwhile case like ij could be declared variants with a sequence of i + j, provided we see a need for including the former.

If ligatures are no part of MSR-2, then I assume the problem has solved itself.

Best,

Meikal


2016-05-11 22:27 GMT+02:00 Abdeslam Nasri <abdeslam.nasri@gmail.com>:
Dear Chris and Colleagues,


Digraphs or more generally sequences of code points, can be specified as variants of a single code point.

An excerpt from the LAGER specification :

" A sequence of multiple code points can be specified as a variant of a
   single code point.  For example, the sequence of LATIN SMALL LETTER O
   (U+006F) then LATIN SMALL LETTER E (U+0065) might hypothetically be
   specified as a variant for an LATIN SMALL LETTER O WITH DIAERESIS
   (U+00F6) as follows:

       <char cp="00F6">
           <var cp="006F 0065"/>
       </char>
"

In the typical case of digraphs these are named precomposed versus decomposed formats of a single letter. Normalization should exist in Unicode in order to allow these variants, or otherwise block them.


Kind Regards,
Abdeslam NASRI



2016-05-09 15:43 GMT+02:00 Dillon, Chris <c.dillon@ucl.ac.uk>:

Dear Meikal,

 

Thank you for your thoughts on digraphs.

 

In that case, we would have blocked variants like i, dotless i  and iota, where application for a label containing one, would block applications for labels containing any of the others.

 

We would also have blocked variants, digraphs like ij, which could never be allocated at all. If we need to do this, it will be necessary to describe variants for ligature code points we have not yet analysed in the Latin ranges, as they aren’t in MSR2.

 

(This distinction is what I was finding difficult during the face-to-face meeting in Marrakech.)

 

Incidentally, I’m fairly sure two code points could be a variant of one. ( I wonder what happens with the Arabic ligature of laam and alif that looks like Greek gamma; in Urdu the two do not combine so closely, if at all.)

 

Regards,

 

Chris.

--

Research Associate in Linguistic Computing, Centre for Digital Humanities, UCL, Gower St, London WC1E 6BT Tel +44 20 7679 1599 (int 31599) www.ucl.ac.uk/dis/people/chrisdillon

 

From: Meikal Mumin [mailto:meikal.mumin@uni-koeln.de]
Sent: 09 May 2016 09:38
To: Dillon, Chris <c.dillon@ucl.ac.uk>
Cc: latingp@icann.org
Subject: Re: [Latingp] Digraphs

 

Dear Chris and colleagues,

 

apologies for the late reply. I believe we don't need to exclude digraphs. We could simply set them up as variants, e.g.  ij as equivalent of i + j. It could be useful to verify with IP, if it is possible to declare a sequence of two code-points as a variant of one - we had not encountered such a case with Arabic script.

 

Best wishes,

 

Meikal

 

2016-03-29 9:54 GMT+02:00 Dillon, Chris <c.dillon@ucl.ac.uk>:

Dear colleagues,

 

Mirjana’s recent research on Montenegrin has raised some interesting issues.

 

One of them is diagraphs.

Currently we have digraphs like æ and œ in our repertoire, but Dutch ij (U+0133) as in vijf ‘five’ is white in MSR-2 (not compatible with IDNA 2008). Certainly many digraphs, including ij are visually similar to their component letters. We could consider adding all digraphs to the list of criteria for exclusion, or adding them with exceptions (less good from a usability point of view). Incidentally, ß and & are probably excluded for other reasons, Longevity Principle and Punctuation, respectively.

 

What do you think?

 

Français: Qu’est-ce qu’on devrait faire avec les digraphs dans notre répertoire – les permettre ou pas?

 

Regards,

 

Chris.


_______________________________________________
Latingp mailing list
Latingp@icann.org
https://mm.icann.org/mailman/listinfo/latingp




--
Cordialement,
Abdeslam NASRI