This is the site for the team that builds openMind/Handwriting

The following text describes the general idea of openMind/Handwriting. People are good at reading handwriting. Computers (algorithms) are not. Humans can train the machine by providing examples. Normally this is done in each research lab individually by collecting handwriting, and using dedicated software for manually or semi-automatically attaching ASCII labels to ink. After that you train a neural network or hidden-Markov model with the pattern/label (i.e.: I/O) pairs.

In this project we will focus on 'on-line' handwriting, which means, pen-tip coordinates (x,y,z) are collected in time. z contains penup/pendown information or pen pressure.

As a first example we set up a 'character provider'. This is a dedicated http server which reads characters from the UNIPEN database (http://unipen.org). You may want to try that at: http://hclus.cogsci.kun.nl:5001/GET-NEW-CHAR

These (x,y,z) coordinates can be visualized in a java (1.x) applet, as can be seen in similar examples.

The idea is to visualize the character, together with a label, e.g., "A" or a question mark "?" in case the system does not have a clue at all. The user can then confirm the label or enter another one. The output is reported back to the server (a http server).

This may look all fairly simple, but it is still a lot of work. Especially, since we want to create a nice self-motivating website which people like to use. This means that they have to log in (we need an expert in browser cookies etc.!), their names should be on a performance page. But also: it should be possible to embed the labeling into a game-like environment with points/goodies and levels. This as regards the user interface.

On the server side, we need quite some bookkeeping: there may be many users, there are a lot of characters, and there will be several labelings for one character, given by several users. This introduces the problem of consistency and of deciding who is right (voting schemes, and weeding out errors). The results of the labeling should be available to the general public, naturally. This means that one could download different versions as time proceeds.

As a next point on the agenda is the labeling of characters in cursive words. This really is a challenge which requires a lot of creativity, because it is a difficult task for a user. At the same time, however, this is exactly the data category research currently badly needs: "where are the characters in this handwritten word?". Because it is so expensive, there are not many databases available in the world to provide this kind of information. Anyway, I hope to have kept your enthusiasm to participate! What I would like to ask you to do is to send me an email with a brief description of your expertise. I would like everyone to work on what they like most and do best. Furthermore, there are the following tasks:

Please let me know what task you prefer! It may be very efficient to create subteams, too, focusing on a single task.

As regards platforms: we use as little java as possible (things must be kept lightweight), implementing with plain html,frames,gif, animated gif,mouseover at max (i.e.: no shockwave/vrml-type super duper but slow & browser crashing gadgetry). Also, we have a strong preference for using Linux at the server side.


Screendumps of programs for the labeling of on-line handwriting

An example of a Tcl/Tk program screen dump for expert character labeling in cursive words.

Note that this is too complicated for web use!


A concept (mockup) for word labeling on the web:

The bar on the bottom represents the time axis. Zone color represents a character shown in the panel above (m is red zone). The size and location can be changed by moving the zone handles. This implements the four buttons (more left/less left, more right/less right) in the previous example). In order to minimize the noise, not all sample positions are allowed: the system proposes a stroke-based segmentation.
(Note: this UI idea is untested).

(for now: concentrating on single-character labeling is the first point on the agenda. We get back to word labeling at a later stage. LS 28/12/1999)


Relevant sites