Dear Professor Zhang,

Thank you for your summary of issues that we will face in the mid to long term.

· Seems, fortunately, to be a smaller problem than I would have expected, but there seems to be no solution.

· I suspect that a solution for (3) could be based on what happens with labels containing the sorts of variants listed in (2). However, (3) would be a new departure for labels; for example, such situations as English www.pictures.com, www.photo.com, German www.foto.com have traditionally been regarded as probably being different sites (I haven’t checked this example ?C if I’m unlucky perhaps some of them are the same company!).

· (4) may well be desirable but difficult for all panels.

2. I have fallen into a trap as I have no list of variant issues dealt with at font level. I’m sure I’ve seen such a list (I did find a short list which I have attached) and would be grateful if any colleague has a better list. In this case the Chinese Simsun font has one dot and the Korean Malgun font has two.

I do hope we meet in Seoul.

Regards,

Chris.

Research Associate in Linguistic Computing, Centre for Digital Humanities, UCL, Gower St, London WC1E 6BT Tel +44 20 7679 1599 (int 31599) www.ucl.ac.uk/dis/people/chrisdillon

From: Joe Zhang [mailto:joezhang43@hotmail.com]
Sent: 28 April 2015 04:28
To: Dillon, Chris; hotta@jprs.co.jp; KoreanGP@icann.org; ChineseGP@icann.org; JapaneseGP@icann.org
Subject: 答复: [ChineseGP] [Koreangp] Proposed Action items before Seoul meeting

Dear Chris,

I will deeply study the ToDo list, and your comments, which may take some more time.

Before finishing my homework, I would like to make two points for consideration:

1. Up to now, what we are working on is limited to character (and its variants)- based label generation, we have not defined string/word based rule yet. Especial, we have not define the language and/or context sensitive string/word yet. It seems quite complicated, but is inevitable to work on it in the next stage. For examples,

(1) Exception treatment, such as 发-?l-?k and发/ ??

(2) C-J sensitive : 滨-?I-浜，艺-??-芸，

(3) 同义词 Synonym as label（word based variant string）: ?滴换?-数字化，激光-镭射，便当-弁当

(4) Rules like: No simplified/traditional/variant mixing in labels. (important for CGP)

2. Some visible differences amongst Hanzi/Hanja/Kanji would be so-called Z-difference in Unicode/UniHan, say,

“I have also been looking for differences between Traditional Chinese characters and Korean hanja. So far I have found one: characters with the progression radical tend to start with two dots in hanja: 逃 and only one in Traditional Chinese: 逃.”

Actually, both are encoded at U+9003, but rendered in different fonts.

If I may participate the coming Seoul meeting, we may discuss in detail.

Looking forward to seeing you there.

Regards,

Zhang

发件人: chinesegp-bounces@icann.org [mailto:chinesegp-bounces@icann.org] 代表 Dillon, Chris
发送时间: 2015年4月27日 20:43
收件人: hotta@jprs.co.jp; KoreanGP@icann.org; ChineseGP@icann.org; JapaneseGP@icann.org
主题: Re: [ChineseGP] [Koreangp] Proposed Action items before Seoul meeting

Dear colleagues,

Here are some comments, as requested by Hiro.

I reckon I have now caught up after missing the Dallas meeting.

I believe Mr Yoneya’s algorithm will work.

I have spent some amount of time looking for exceptions to various statements in it e.g. Slide 5 “there exists at least one identical ideograph”. (No exception found.)

It is fortunate that ?C ’machine’ / 机 ’desk’ and ?k ’send’ / ?? ‘hair’ seem to be the only cases where (at least commonly used) different characters in Japanese are the same character in Simplified Chinese. (I haven’t spent as much time with looking for characters that are separate in Chinese but brought together in Japanese. 弁 replaces at least three characters in Chinese, but I think none are common. I can imagine a . 弁当 TLD, so that may be good news for bento companies.)

I note the options for the disposition of variants not defined in the LGR-1s (Slide 6), i.e.:

- Blocked if the variant is not in the LGR-1 / Allocatable otherwise

- Blocked if the variant is not in the LGR-1 / Inherit its original disposition in the LGR-1 (Allocatable/Simp/Trad/Both)

Both case studies are most interesting. I note that there are some labels, e.g. 予园 (with the first character, I think used only in Japan and the second only in Simplified Chinese) that perhaps we would prefer not to see allocatable in the ideal world, but suspect that blocking them would involve adding horrendous complexity.

I note that it is difficult to understand Japanese LGR-1, as the characters are not visible.

I have also been looking for differences between Traditional Chinese characters and Korean hanja. So far I have found one: characters with the progression radical tend to start with two dots in hanja: 逃 and only one in Traditional Chinese: 逃.

Looking forward to Seoul,

Regards,

Chris.