漢字字碼與資料庫國際研討會,京都•東京 1996年10月4日 從缺字問題,談漢字交換碼的重新設計──第二部分 A Descriptive Method for Re-engineering Hanzi |
Page 5 of 9 |
2. The Representation of Hanzi
Knowledge
(4)Markup Tags for Missing Character and Variant
Applying SGML tag to markup missing character was first used by Wittern and App.【3】 The technique they developed is called Kanji Placeholder (漢字位標, 簡稱位標)。Kanji Placeholder starts with ”&” and closes with “;” , and there are two fields in-between. The first field is an code identifier and the second field is a code word at which the missing glyph was found. Kanji Placeholder provides a linkage crossing Interchange Codes to share glyphs. As an example,“&U4AB5;” represent a missing glyph found at location 4AB5 of Unicode which is identified by the field “U”. Kanji Placeholder is helpful to share glyphs collected by various Interchange Codes provided that there are a collection of Interchange Codes accessible be user. Wittern and App do have a Kanji Base on Internet which collects many Interchange Codes, such as CNS of more than 45 thousands glyphs and Unicode of approximately 22 thousands glyphs, etc. Besides, they also build a bank for missing characters which can not be found in any of the Interchange Codes they collected. 【3】 Christian Wittern and Urs App.〈IRIZ Kanji Base : A New Strategy for Dealing with Missing Chinese Characters 〉世界電子佛典會議(EBTI)台北, 1996年4月
Kanji Placeholder assigns SGML tag as the carrier of a pointer indicating the location of missing glyph. We extended their idea by applying SGML tag to show the structure of the missing glyph and use it as the identifier of the missing glyph also. The technique is called “Kanji Glyphholder”. In order to avoid using any graphic symbol of code word as control symbol, special symbols and are created to represent open delimiter and close delimiter of the glyphholder tag, respectively. In the carrier formed by glyph holder tags is the glyph expression of missing glyph. For the convenience of use, component sequence is allowed to replace glyph expression as long as there is no ambiguous happened. For example, in Buddhist Canon 阿門佛can be expressed as阿門人人人佛, or阿門 佛. The glyph holder can also be used to represent a variant by applying the glyph code of the “three segment coding scheme” of a character described in the later section. For example, “芍藥•3” represent “芍葯”, if葯is the third variant of藥. Glyphholder and placeholder are compatible with each other, because they share the same tagging structure. The two fields of the placeholder can also be accepted in glyphholder if a parser is designed to recognized them. The glyphholder has some interesting properties that the placeholder can not provide. For instance, the glyphholder is more readable than placeholder, the user is not required to looking for the missing glyph elsewhere in order to obtain an identifier or a code word for the missing character, the user’s front end is not required to equipped with a data base of collecting various glyphs and variants, and finally, the glyphholder can express the mapping relation between character and glyph. (5)Attributes The attributes we collected so far for a character are listed in the following table. 表六 文字屬性欄位表 (註:打”*”者, 可以重複) 甲、缺字屬性表
乙、字形結構屬性表
|
Page 5 of 9 |