Re: [Cyrillicgp] - Ukrainian character set

March 22, 2016

      On 19 марта 2016, at 16:40, Dusan Stojicevic <dusan@dukes.in.rs> wrote:
...
Dear Dmitry, all,
Sorry for being late on response, I had a big DIDS event in Belgrade
during this week.
So, first of all, some minutes from Cyrillic GP meeting on ICANN 55 in
Marrakesh.
No problem and thank you!
...
with one point of action. Until 21. march, Iliya, Dmitry and me (and all
of you who wanna help), we will try to finish first step - creation of a
full set of national scripts, which will be the base table for our work.
According to this action item, in attach You can find .txt files,
extracted from IANA tables.
@Dmitry, @Iliya - format is ok?
Format is ok - I see there are some extra characters in Ukrainian table.  Let me provide proper character set.
They are, in order (I use uppercase variants here):

Base Cyrillic set:
0410 to 0429 (26 letters, A to Shcha), 042C (soft sign), 042E (YU), 042F (YA) - total 29, excludes 042A, 042B, 042D.

Cyrillic extensions:
0404 (Ukrainian IE), 0406 (Ukrainian-Belarusian I), 0407 (Ukrainian YI) - total 3 letters.

Extended Cyrillic:
0490 - Cyrillic Ghe with upturn - 1 letter.

Total is 29+2+1=33 letters (excluding modifying apostrophe letter, 02BC - not part of MSR).

See also:
https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode
https://en.wikipedia.org/wiki/Ukrainian_alphabet
https://tools.ietf.org/html/rfc2319 (KOI8-U encoding for Internet use)

I must also note that RFC 5992 provides incorrect information about Ukrainian:
https://tools.ietf.org/html/rfc5992#section-2 says:

<<<
2.9.  Ukrainian

   The character list for modern Ukrainian has apparently not completely
   stabilized.  Some references claim 31 characters and therefore an
   additional 8 characters to the Base Cyrillic set of 23.  Others claim
   33, adding U+0438 and U+0439 and replacing U+044A (Hard Sign) with
   U+044C (Soft Sign), for a total of an additional 11 characters as
   compared to the Base Cyrillic set.  Unless better information is
   available, the prudent registry should probably assume that all 34
   characters are in use, i.e., the Base Cyrillic set plus U+0438,
   U+0439, U+0454, U+0456, U+0457, U+0491, U+0449, U+044A, U+044C,
   U+044E, U+044F.
...
...
...
Per RFC 4992, Base Cyrillic refers to:

<<<
"Base Cyrillic" consists of the following Unicode code points (names
   associated with these code points and those below appear in

Appendix A
): U+0430, U+0431, U+0432, U+0433, U+0434, U+0435, U+0436,
   U+0437, U+043A, U+043B, U+043C, U+043D, U+043E, U+043F, U+0440,
   U+0441, U+0442, U+0443, U+0444, U+0445, U+0446, U+0447, U+0448.
...
...
...
So, advice is to use 430 to 448 range inclusive, and 454, 456, 457, 491, 449, 44A, 44C, 44E, 44F.

I don't know which sources authors consulted.  Since at least 1992 (independence of Ukraine),
Ukrainian alphabet had 33 letters and apostrophe (total 34 characters).  in Soviet era, 0490 was excluded (banned.)
Sometimes, soft sign was considered not a letter but a modifier, and was placed at end of alphabet (but still part of it.)

The table at https://en.wikipedia.org/wiki/Ukrainian_alphabet has color-coded area of Unicode Cyrillic block.
...
Regards,
Dusan
Please, feel free to check and add scripts as
On 17.3.2016 21:56, Dmitry Kohmanyuk wrote:
...
On 3 Ð¼Ð°ÑÑÐ° 2016, at 20:29, Dusan Stojicevic <dusan@dukes.in.rs> wrote:
...
Dear all,
Let me remind You about the work done in some friendly organizations>
https://tools.ietf.org/html/rfc5992
Also, can we start to send tables?
I am ready - which format should be used?  Text - one line per character -
A 0x0401 
...
works for me :)
We need to get this all together ASAP - I can collate tables together once we have raw data (which would be soon, right?)
so it would be like
Russian = Set 1 + Set 2
Ukrainian = Set 1 + Set 3
Kyrgyz = Set 1 + Set 2 + Set 4 
...
which can simplify our work later.  It is not very necessary, a "columnar table" (letter - wjich languages use it) would also work.
...
On 3.3.2016 10:34, Dmitry Kohmanyuk wrote:
...
Minorities or not, languages should be represented (so if there is a
majority in another country it may be covering a minority in another, as
Dusan said - but it is not a universal situation.)
Case in point: Tatar language.
-- dk@
On 2 Ð¼Ð°ÑÑÐ° 2016, at 17:30, Dmitry Belyavsky <beldmit@gmail.com
<mailto:beldmit@gmail.com>> wrote:
...
Dear Dusan,
I think that if we narrow the task not taking into account the
minorities with own Cyrillic scripts, it will be better. 
For now it seems a reasonable enough simplification.
Thank you!
On Mon, Feb 15, 2016 at 5:28 PM, Dusan Stojicevic <dusan@dukes.in.rs
<mailto:dusan@dukes.in.rs>> wrote:
Dear all,
According to the Proposal... and working plan (0.2 and 0.3), let me
  suggest first stage of the work: creation of a full set of national
  scripts, which will be the base table for our work.
Please send Your national Cyrillic script table with Unicode labels.
  If You have a minority in Your country using their own Cyrillic
  script,
  please send them too on the list, but check first EGIDS level
  (https://www.ethnologue.com/about/language-status) which have to be
  smaller than 4. Also, check the Proposal...
One thing> there is no need, for example, for Serbian
  representative to
  send Bulgarian script because of the Bulgarian minority in Serbia, we
  already have Bulgarian representatives...
Do You agree with this first step? Any suggestion?
If yes, let us set first deadline - 7 days from now, or Monday, 21
  March
  2016.
Regards,
  Dusan
---
  Ova e-poÅ¡ta je provjerena na viruse Avast protuvirusnim programom.
  https://www.avast.com/antivirus
_______________________________________________
  Cyrillicgp mailing list
  Cyrillicgp@icann.org <mailto:Cyrillicgp@icann.org>
  https://mm.icann.org/mailman/listinfo/cyrillicgp
-- 
SY, Dmitry Belyavsky
_______________________________________________
Cyrillicgp mailing list
Cyrillicgp@icann.org <mailto:Cyrillicgp@icann.org>
https://mm.icann.org/mailman/listinfo/cyrillicgp
---
Ova e-poÅ¡ta je provjerena na viruse Avast protuvirusnim programom.
https://www.avast.com/antivirus
---
Ova e-pošta je provjerena na viruse Avast protuvirusnim programom.
https://www.avast.com/antivirus
<Bulgaria.txt><Macedonia.txt><Russia.txt><Serbia.txt><Ukraine.txt>