UNIPEN appetizers: data and software

UNIPEN is an international project for the collection of on-line handwriting data and benchmarking, in which a large number of universities and companies take place. The project had a closed stage, in which the about 40 donators wanted to perform training and benchmark tests. After this stage, the data now are available from the International Unipen Foundation. The UNIPEN data format is gaining in popularity. Even if you are not involved in the official UNIPEN benchmarking process, you may consider starting to use this format for on-line handwriting data, like many others already did. Although not particularly condensed, the format is in legible ASCII, and very flexible (gzip will do a good job).

Several companies and universities provided UNIPEN on-line handwriting samples. There are a number of screen dumps from the browser Upview, which is a Unix/X11 program written in C. The images should give you an idea of the type of data which is being collected:

New screen dumps of UNIPEN data with Upview, showing PAGE, PARAGRAPH, LINE, WORD, CHARACTER and STROKE levels

Old screen dumps of UNIPEN data (Upview/Uplib)


UNIPEN FTP Site

The NICI/UNIPEN FTP site contains:


  • X11/Sun browser in Lisp (browse.l))

  • A UNIPEN file parser (syntax checker) (in AWK)

  • XCSTK Another X11 tool for UNIPEN files (from the research group of Colin Higgins)

    The UNIPEN toolkit upTools 3

    At NICI we are currently working very hard on UNIPEN toolkit version V3.0. Some features of this next release are:

    Below is a screendump of a window in
    upworks, which is the main application in the uptools3 package


    UNIPEN and Kanji

    There is an increased interest in the recognition of on-line Asian handwriting script types. However, the coding (or rather, the visualisation) of, e.g., Kanji, is still difficult in Western laboratories. Thus, although many researchers may in principle be interested in developing recognition algorithms for on-line Asian handwriting styles, the lack of visualisation tools frustrates such an interest. To partly solve this problem, and in order to raise your interest, we have put a collection op Kanji GIF images on the WWW, in a Kanji dictionary which is based on Jim Breen's KANJIDIC. In the near future, we may expect more and more software which is based on Unicode, and which allows for visualization of the non-ASCII characters represented by the standard.


    More on data collection


    Back to the UNIPEN home page at NICI


    schomaker@computer.org