º~¦r¦r½X»P¸ê®Æ®w°ê»Ú¬ã°Q·|¡A¨Ê³£¡EªF¨Ê 1996¦~10¤ë4¤é A Descriptive Method for Re-engineering Hanzi |
Page 2 of 9 |
2. The Representation of Hanzi Knowledge (1)Previous Studies (a) A Fundamental Character Set for Computer Use Around the year of 1970, the study of using computer to process Hanzi information have been launched in Taiwan. During these days, the statistical knowledge of characters and words is not enough to support research need. Therefore, Mr. Su Lin of the Computer and Control Engineering Department of National Chiao-Tung University conducted a research on finding ¡§ A Fundamental Character Set for Computer Use¡¨ with the support of the Wang Laboratories. The research started in October of 1971, spent more than 2000 man-days, and a draft report was published in March, 1972. Some highlights of this research are listed as follows.
The entropy of the system is 9.60. The accumulated frequency of the most frequently used 500 characters are listed in ¡eTable 3¡f. Up to now, this survey is still the most comprehensive one and we use it as a foundation of our study.
(b)The Chiao-Tung Root System The earliest study of glyph structure in Taiwan was carried out in National Chiao-Tung University. Thus, the system developed was named as the Chiao-Tung Root System. In 1972, a master dissertation of Mr. Ù¯Õ,¡m¤¤°ê¤å¦r¤§µ²ºc¼Ò¦¡¤Î¨ä¤ÀªR¡n analyzed 16 different compositional operators of glyph and found that only three of them, namely horizontal composition, vertical composition and contain composition, are frequently used and can be assembled as an effective system for representing the structure of glyphs. A formal representation of the system in Bakcus Normal Form is shown in ¡eTable 4¡f.
Mathematically speaking, the system in ¡eTable 4¡f is a production system. That means the possible produced outcomes, such as character, glyph and component usually are far more than we may accepted. Whether the outcome is a legal item or not is up to our choice. Thus, this system has the property of expandability that just fit our need. In this productive system, the structure of a glyph is expressed as an expression of roots. Root is a basic component which will not be decomposed further. An example of glyph structure is shown in ¡eFigure 1¡f. The term component is usually used to reference the intermediate parts between a glyph and its roots. In ¡eFigure 1¡f, ÆWandÅsare glyphs,‰µis a component, ¤}¡N¨¥¡N¨tare glyphs and are roots also, „Eis a root¡C The work of Mr.Ù¯Õwere done parallel with the work of¡m¤¤¤å¹q¸£°ò¥»¥Î¦r¶°¡nby Mr. Su Lin, and the set of Chiao-Tung Roots was found by them. This root set has a unique figure that it is obtained by three times iteration of an optimization procedure. The optimization procedure is derived from a mathematical calculation of optimizing a polynomial expression of the total number of the roots and the averaged number of roots per glyph. In general, the less the number of roots, the longer the decomposition of glyph. The optimization produce an criteria as follows : a glyph should not be decomposed if its frequency of usage is over 0.3758%, should be decomposed into no more than 2 roots if its frequency is from 0.1879% to 0.3758%, no more than three roots from 0.1236% to 0.1879%, and no more than 4 roots from 0.0939% to 0.1236%¡i1¡j. This criteria determined the bottom line of decomposition. According to the 9132 glyphs in¡m¤¤¤å¹q¸£°ò¥»¥Î¦rªí¡n, 496 roots are obtained as listed in ¡eTable 5¡f. In this root set, 305 roots are characters and the frequency of usage of them exceed 50% of the total usage. The accumulated frequency of the most frequently used 25 roots is 30%, for 50 roots increased to 49%, 100 roots to 66,7%, 200 roots to 84.9%, and 300 roots up to 95%. ¡i2¡j As a result of optimization, the weighted average of number of roots per glyph is only 1.9¡CAnd the power of this system can be illustrated by showing that while checking against the 49905 characters of¡m¤¤¤å¤jÃã¨å¡n, 48713 characters can be expressed by the Chiao-Tung Root system. The remaining 1129 characters areÃ³¤å, ½f¦r, or some ancient ¥j¤å¡N¤Ï¤å¡N¹ÏÄËetc. If needed, these 1129 glyphs can be included into the Chiao-Tung Root System at any time without difficulty. A pictorial illustration of this result is shown in ¡eFigure 2¡f. The Chiao-Tung Root System is developed according to the ·¢®Ñfont. Therefore, it does not include the font of½f¡NÁõ¡N¦æ¡N¯ó, etc. It does not taking account of calligraphic variants , print fonts and the modern artistic fonts, neither. All it can provide is the common structure of glyph which is the basis of every font design. Figure2. 496 roots obtained from 9129 glyphs can produce 48713 glyphs and more.(A study of 1972 by¡e1¡f) ¡i1¡j¦¹Ãä»Ú®Ä¥Î¤§pºâ¡A½Ð°Ñ¦Ò¡GÁÂ²M«T¡N¶À¥Ã¤å¡NªL¾ð¡A¡m¤¤¤å¦r®Ú¤§¤ÀªR¡n¥æ¤j¾Ç¥Z¡A²Ä¤»¨÷¡D²Ä¤@´Á¡A1973¦~2¤ë ¡i2¡jÃö©ó9129Ó¦r§Î¤§¤À¸Ñ¤Î¦r®Ú¤§¨Ï¥ÎÀW«×¤§¸ê®Æ¡A½Ð°Ñ¾\¼B¹F¤H¡N§ù±Ó¤å¡NÁÂ²M«T¡N±i¥ò³³¡N½²¤¤¤t¡NªL¾ð¡mº~¦rºî¦X¯Á¤Þ¦r¨å¡nAsian Associates, Bedford, New York 1979 |
Page 2 of 9 |