Hi Michael, All,

>> I assume that we will need to define the context those combining marks are allowed. ... But I guess Sarmad will know for certain.

This discussion is published by IP in Section 4.5. Non-Spacing Combining Marks of the Overview and Rationale document released as part of MSR-2. I encourage you all to review it. It is noted in this section that:

The actual set of combining marks allowable in the LGR will be smaller than the set included in the MSR, because it will be limited to those marks that are actually required for at least one combining sequence not expressible in NFC. In addition, where the number of such attested sequences is known and limited, GPs are encouraged to enumerate the sequences where feasible, rather than adding the “bare” combining mark to the repertoire. This would serve to prevent such marks from combining with every other allowed code point in the GP’s repertoire.

This suggests that where pre-composed form is not encoded directly, it is preferred to include the combining mark only with the desired code point(s) as a sequence to prevent over-generation.

Regards,
Sarmad

-----Original Message-----
From: latingp-bounces@icann.org [mailto:latingp-bounces@icann.org] On Behalf Of Michael Bauland
Sent: Tuesday, January 17, 2017 1:55 PM
To: latingp@icann.org
Subject: Re: [Latingp] How should combining diacritic marks be handled?

Hi Mats, hi all,

On 16.01.2017 17:21, Mats Dufberg wrote:

> MSR2 contains a number of combining diacritic marks, e.g. U+0323

> COMBINING DOT BELOW. It might be that we find that some of the

> languages that should be supported requires that code point in

> combination with, say, "n", i.e. "U+006E U+0323". Let us assume that

> there is no pre-composed equivalent code point.

> We can then justify the inclusion of U+0323. Will then the Integration

> Panel accept that code point in any context, or just in the specific

> context?

I assume that we will need to define the context those combining marks are allowed. At least we did this for middle dot of the "ela geminada"

in the Catalan language tables (see, e.g., https://urldefense.proofpoint.com/v2/url?u=http-3A__www.iana.org_domains_idn-2Dtables_tables_sap-5Fca-5F1.0.txt&d=DwICAg&c=FmY1u3PJp6wrcrwll3mSVzgfkbPSS6sJms7xcl4I5cM&r=KTETvEaGPwPcawI-QmNa-kiv-ZBvdgyyLm-mxd028M4&m=dq_XzWWgeHkFr9WfH9IXK7WNh6-cBi-Ri4_TEqaDJkc&s=reu-ntqXFnNrZeDRDBLwSxLIFdJo9q4_2kUN7aPCiP8&e= ). But I guess Sarmad will know for certain.

> If the IP requires that we justify combining diacritic marks for every

> context it will be allowed for, then we have to go language by

> language to find all combinations to support.

> If the IP accepts to include a combining diacritic mark for any

> context as long as it is justified for one language, then we can go

> code point by code point as long as we can find justification for all

> Latin code points in MSR2 and we assume no more code points are needed.

> If the purpose of our work is to create a Latin IDN table that

> supports all listed languages (EGIDS value 4 or 5 as decided) then I

> cannot see how we can achieve that without inspecting all those languages.

Going by language instead of going by character also has the advantage that we will be able to distribute the languages to members of the group. Then everybody can work with a certain sub-set of all languages.

If we distributed the characters, everybody would have to get acquainted with every single language.

Cheers,

Michael

____________________________________________________________________

| |

| knipp | Knipp Medien und Kommunikation GmbH

------- Technologiepark

Martin-Schmeisser-Weg 9

44227 Dortmund

Germany

Dipl.-Informatiker Fon: +49 231 9703-284

Fax: +49 231 9703-200

Dr. Michael Bauland SIP: Michael.Bauland@knipp.de

Software Development E-mail: Michael.Bauland@knipp.de

Amtsgericht Dortmund, HRB 13728

Chief Executive Officers:

Dietmar Knipp, Elmar Knipp _______________________________________________

Latingp mailing list

Latingp@icann.org

https://mm.icann.org/mailman/listinfo/latingp