UNIPEN appetizers: data and software
UNIPEN is an international project for the collection of on-line
handwriting data and benchmarking, in which a large number of
universities and companies take place. The project had a closed stage,
in which the about 40 donators wanted to perform training and benchmark tests.
After this stage, the data now are available from the International Unipen
The UNIPEN data format is gaining in popularity. Even if you are not
involved in the official UNIPEN benchmarking process, you may consider
starting to use this format for on-line handwriting data, like many others
already did. Although not particularly condensed, the format is in legible
ASCII, and very flexible (gzip will do a good job).
Several companies and universities
provided UNIPEN on-line
handwriting samples. There are a number of screen dumps from the browser
Upview, which is a Unix/X11 program written in C. The images should give you
an idea of the type of data which is being collected:
New screen dumps of UNIPEN data with Upview, showing PAGE, PARAGRAPH, LINE, WORD, CHARACTER and STROKE levels
Old screen dumps of UNIPEN data (Upview/Uplib)
UNIPEN FTP Site
- The current
UNIPEN format definition
- UNIPEN Data format examples
Uptools3 refers to a software distribution of
UNIPEN browsers and libraries. It is the follow-up of Uptools2.1.
- We do not aim at DOS compatibility anymore.
- Win'95 is not supported. An experienced C programmer can do it, however
in two weeks, using GNU-WIN. We have a cookbook for this port
A library on top of uplib, providing C-routines for extracting signals
from UNIPEN ascii files. This was anticipated in
Unipen Scrawl, Issue 4.
Upsiglib does not yet contain all possible signal types (equidistant time,
equidistant space, pixels) but it is a start. If you have some signal processing functions
and are willing to add them to upsiglib, please do so and contact us. We are anxious to hear about
your experiences and we are willing to add your code to our libraries.
The X11-based UNIPEN browser upview has several
interactive options, formerly only selectable via the command line, as
well as some new graphics options. Upview will be supported in the
future as it is the fastest way to quickly visualize a UNIPEN file. However,
new developments will also be based on Tcl/Tk (see below).
In the uptools3 release, a totally new tool using Tcl/Tk and
X11 is contained, called
It provides a user-interface for scrolling
through UNIPEN .SEGMENTS and editing them. It also
allows highlighting sub-hierarchies in a .SEGMENT, like the
characters in a word.
There are now 10 sample sources showing how to use the library routines provided by uplib, upsiglib and
- Tcl/Tk interfaces.
Demos showing how to load 'uptools' as a shared library in your Tcl/Tk applications.
See some examples of screen dumps here.
- Getting Binaries
- Getting Sources
- UNIPEN software tools directory
X11/Sun browser in Lisp (browse.l))
A UNIPEN file parser (syntax checker) (in AWK)
XCSTK Another X11 tool for UNIPEN files (from the research group of Colin Higgins)
The UNIPEN toolkit upTools 3
At NICI we are currently working very hard on UNIPEN toolkit version V3.0.
Some features of this next release are:
Below is a screendump of a window in upworks,
which is the main application in the uptools3 package
- a fully integrated ink widget written in C and callable from Tcl/Tk
- the possibility of viewing several hierarchy levels at the same time in different windows
- some basic .SEGMENT manipulation (adjust and add segments)
- the displaying of signal time series, smoothing of the coordinates etc.,
including cursor control and inspection of numerical values
UNIPEN and Kanji
There is an increased interest in the recognition of on-line
Asian handwriting script types. However, the coding (or rather, the visualisation)
of, e.g., Kanji, is still difficult in Western laboratories.
Thus, although many researchers may in principle be interested in developing
recognition algorithms for on-line Asian handwriting styles, the lack of
visualisation tools frustrates such an interest. To partly solve this problem,
and in order to raise your interest,
we have put a collection op Kanji GIF images on the WWW, in a
Kanji dictionary which is based on
Jim Breen's KANJIDIC.
In the near future, we may expect more and more software which is based on
which allows for visualization of the non-ASCII characters
represented by the standard.
More on data collection
Back to the UNIPEN home page at NICI