UNIPEN and Kanji, Katakana, Hiragana

UNIPEN is also suited for Kanji. The following image is grabbed from a window of the upWorks application:

Here are the GIF images of these characters:


An annotation coding proposal

The character .SEGMENT entries can be coded as follows (example), using hexadecimal Shift-JIS codes:

Terse annotation:
.SEGMENT CHARACTER 1-12 OK "JISx8d95"

The syntax is:              JISx .......fixed tag
                                ffff ...four hexadecimal numbers

More on JIS Kanji codes is here. For UNIPEN "Shift-JIS", which is more modern and Microsoft compatible than JIS, is probably the preferred choice. Shift-JIS is the variant that modern WWW browsers like Netscape will accept if you choose the proper font. Thus, JIS in this UNIPEN format proposal implies Shift-JIS or SJIS.

Here is the original UNIPEN file: masaaki-kurosu.dat

Extended annotation:
.SEGMENT CHARACTER 1-12 OK "JISx8d95_'Kuro'"

The extended syntax is:     JISx ................fixed tag
                                ffff ............four hexadecimal numbers
                                    _ ...........fixed separator 
                                     'string' ...English pronunciation 
                                                 in the given context  
                                                 (or one possible and 
                                                 common English  
                                                 pronunciation)  

More information on the UNIPEN format is here.

Rationale

Since the start of UNIPEN, there have been several discussions on how to incorporate other scripts. This is a bold proposal for the use of UNIPEN for Kanji on-line samples. The extended annotation allows for a user-friendly development of ASCII-based software. The system is by no means the ultimate solution, but will probably solve the majority of problems. We have waited very long for Unicode-based standardization, and it still has not evolved naturally. Therefore, the proposed JIS-based hexadecimal ASCII representation plus additional and optional phonetic tag - a mnemonic for Western researchers - is likely to fullfil an existing need. This proposal does not solve the complex problem of mappings (one character may map to different pronunciations and meanings; one meaning may map to different characters), but at this moment, there is not much else available to get Western researchers easily involved in the exciting world of Kanji character recognition, using software on a wide range of platforms.

The image above shows this handwritten sample together with the vertical pen-tip displacement time function. Sampling frequency was 100 Hz: each sample point is taken at 10 ms intervals.

The characters were kindly written by Prof. Masaaki Kurosu of Shizuoka University.

Here is another sample


And what about Hanzi for Chinese users of UNIPEN?

The HZ format and protocol already allows for embedding Chinese characters within ASCII. Here are some examples of proper names:
   ~{3BO~~6+~}
   ~{N:QG9p~}
Of course HZ software is needed for visualisation, as is JIS software for Kanji. Examples for Unix are the hzterm (Hanzi) and kterm (Kanji) variants of xterm. In the latter case, a conversion of the JISxFFFF hexcodes to binary JIS is required for software for processing and visualizing UNIPEN data. The hzterm and kterm software may be found easily using a WWW search engine.


schomaker@nici.kun.nl