terminology: traditional Chinese chars?
Dear Messrs Wang and Huang: How are you doing? I have a question about the term " traditional chinese chars". 1) In ISO/IEC 10646, there are about 56000 Chinese chars. 2) There are 2235 simplified Chinese chars and 2261 chars corresponding to these 2235 simplified Chinese chars. 3) My questions are: 3-1) By the term "traditional Chinese chars", do you refer to just 2261 chars? Or 56000 - 2235 = 53000+ (roughly) chars? 3-2) I wonder what terms Chinese experts use to distinguish between a) 2261 chars? and b) 56000 - 2235 = 53000+ chars? Thanks in advance. KIM, K. -- 김 경석 KIM, Kyongsok
Dear Prof. Kim, ISO10646 doesn't have classifier to distinguish Traditional Chinese (TC) or Simplified Chinese (SC). The principle is that same glyphs will be the same code point, therefore it is possible that one code point could represent both TC and SC. The basic classification technique we used is checking data source from http://www.unicode.org/charts/unihan.html. As long as there is a TSource in a glyphs, we consider it is Traditional Chinese. Such as 曾 http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=66FE&useutf8=true which embeded TSource, therefore it is considered as TC initially. As I mention early, it is possible that both TC/SC existed in the same code point. The method is used for preliminary stage of developing TC/SC table. There are many language experts working on the table from one glyphs to another via human validation. I can only say there is no simple algorithm or method to produce the result. FYI Kenny Huang On 30 September 2015 at 14:40, KIM Kyongsok <gimgs0@gmail.com> wrote:
Dear Messrs Wang and Huang:
How are you doing? I have a question about the term " traditional chinese chars".
1) In ISO/IEC 10646, there are about 56000 Chinese chars.
2) There are 2235 simplified Chinese chars and 2261 chars corresponding to these 2235 simplified Chinese chars.
3) My questions are:
3-1) By the term "traditional Chinese chars", do you refer to just 2261 chars? Or 56000 - 2235 = 53000+ (roughly) chars?
3-2) I wonder what terms Chinese experts use to distinguish between a) 2261 chars? and b) 56000 - 2235 = 53000+ chars?
Thanks in advance.
KIM, K.
-- 김 경석 KIM, Kyongsok
Dear Mr. Huang: Thanks for your explanation. On Wed, Sep 30, 2015 at 7:08 PM, Kenny Huang, Ph.D. <huangksh@gmail.com> wrote:
Dear Prof. Kim,
ISO10646 doesn't have classifier to distinguish Traditional Chinese (TC) or Simplified Chinese (SC). The principle is that same glyphs will be the same code point, therefore it is possible that one code point could represent both TC and SC.
The basic classification technique we used is checking data source from http://www.unicode.org/charts/unihan.html. As long as there is a TSource in a glyphs, we consider it is Traditional Chinese.
I see. So.. there will be about 50,000+ Traditional Chinese char, right? That's what I want to know. Thanks again. Rgds, KIM, K.
Such as 曾 http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=66FE&useutf8=true which embeded TSource, therefore it is considered as TC initially. As I mention early, it is possible that both TC/SC existed in the same code point.
The method is used for preliminary stage of developing TC/SC table. There are many language experts working on the table from one glyphs to another via human validation. I can only say there is no simple algorithm or method to produce the result.
FYI
Kenny Huang
On 30 September 2015 at 14:40, KIM Kyongsok <gimgs0@gmail.com> wrote:
Dear Messrs Wang and Huang:
How are you doing? I have a question about the term " traditional chinese chars".
1) In ISO/IEC 10646, there are about 56000 Chinese chars.
2) There are 2235 simplified Chinese chars and 2261 chars corresponding to these 2235 simplified Chinese chars.
3) My questions are:
3-1) By the term "traditional Chinese chars", do you refer to just 2261 chars? Or 56000 - 2235 = 53000+ (roughly) chars?
3-2) I wonder what terms Chinese experts use to distinguish between a) 2261 chars? and b) 56000 - 2235 = 53000+ chars?
Thanks in advance.
KIM, K.
-- 김 경석 KIM, Kyongsok
-- 김 경석 KIM, Kyongsok
Dear Prof. Kim, Please see my comment below:
I see. So.. there will be about 50,000+ Traditional Chinese char, right? That's what I want to know.
It can be right from the source you identified. It is 19520 TC characters from the source of IANA IDN Repository http://www.iana.org/domains/idn-tables/tables/tw_zh-tw_4.0.1.html. It reflects different number of TC chars from different sources. Such as CGP final proposal also adjust the total number of TC/SC characters which specifically for Root Zone LGR. Regards Kenny Huang
participants (2)
-
Kenny Huang, Ph.D. -
KIM Kyongsok