UNIPEN SCRAWL #9,
January 28th, 1998

- > - > - > - > - > - > - > - < - < - < - < - < - < - < - < - < -

- > UNIPEN project of data exchange and recognizer benchmarks < -

- > - > - > - > - > - > - > - < - < - < - < - < - < - < - < - < -

                                                 #
                                                #
                                               # OOOOOOOOOOOOOOOOO
                                         OOOOO# OOOOOOOOOOOOOOO
                                     OOOOOOOO#OOOOOOOOOOOOOOOOOO
                                    OOOO    # OOOOOOOOOOOOOO
                                   OOO     #   OOOOOOOOOOOOOO
                                   OOO    #   OOOOOO
                                   OOO   #    OOOO
                                    OOO #   OOOO
                                       #  OOO
                                      #
  UNIPEN SCRAWL #9                   /  January 22th, 1998


                 by Lambert Schomaker 


1. The Pilot Benchmark, continued ...

A Happy New Year to everybody! As was the purpose of the pilot benchmarking process, a number of quirks have surfaced. For one thing, the whole process appears to be much more time consuming than anticipated. Furthermore, several questions have been raised regarding testing details.

Serious problems with hardware at NIST

In the second week of january, the machine supporting UNIPEN data (magi at NIST) broke down:
.. We lost Magi's 9Gb system disk drive and its CPU on the same day two weeks ago ...(Stan Janet)

Reorganization takes more time than expected

.. (I) am currently coding a system to rename the files and writers randomly so that no extra info can be known from them, filtering out .DATA_SOURCE/comments/etc. similarly ... (Stan Janet)

Conclusion: we have a serious delay. I have already heard mutterings of relief here and there because this buys everyone extra training time. However, others may be disappointed with the way things are going. Also, please note that we at NICI do not have any special position as regards the accessibility of data. It is all in the hands of NIST.


2. Support for NIST

One of the problems of the current situation is that UNIPEN-related activities at NIST are on a 'low burning rate'. No matter how willing and cooperative Stan Janet is, the UNIPEN-related work must yield for the activities and projects which enjoy heavy funding and sponsoring at NIST. Personally, I have tried to collect grants for UNIPEN-related work (e.g., a European central point, or foundation). However, most agencies are surprised that such a large consortium of 40+ members including industry cannot support itself in a more solid way, and generating money is quite difficult. As a consortium, we could indeed try to sponsor manpower at NIST (e.g., 40 * $ 1000. would be about half a manyear of research assistence). What are your opinions? Please react.


3. Testing: Rules of the game

Below are some thoughts on the testing process, as collected during
a discussion between Isabelle Guyon, Stan Janet and Lambert Schomaker.


A taxonomy of recognition tests

1) Walk-up Recognition

Walk-up recognition is, by definition, writer-independent recognition. The paradigm is that, e.g.,

"... you put a pen-based PDA in a store, a potential customer comes in and starts to write a number of words: What is the recognition rate obtained under this condition?"

The test is sequential in real time. This means that a preparatory scan on a file, followed by a parameterized run of the recognizer is not allowed. Words are recognized as they come in, future words are unknown. Repeated tests or parameter adjustments are not allowed.

Note:
As the UNIPEN construction entails that each donator may see his/her own handwriting samples again in a particular test set, this ideal is not completely met. However, a score for the whole set, and the set with exclusion of the self-donated samples could be envisaged.

2) Writer-dependent Recognition

2.1) Lumped Writer-dependent Recognition

A test set is used from a writer who has been seen before in a (potentially very large) training set. No other knowledge on the writer's identity is used directly or indirectly. The test results are biased positivily due to the advance knowledge, which is unlikely to exist in a real application.

2.2) Adapted Writer-dependent Recognition

2.2.1) Autonomous Adaptation (Unsupervised)

The recognizer adapts to the current writer, e.g. in an enrollment stage. Possibly, the systems tries to correlate known handwriting styles and shapes to the current writer input script. The reported recognition rate is after adaptation. However: truth labels have never been used in the adaptation process.

2.2.2) Guided Adaptation/Manual Labeling (Supervised)

Writer-specific labeling has been used to train the recognizer. This is a case of supervised learning. Knowledge about the identity of the writer and details on his/her writing style are put into the system. For example, writer-specific HMMs or MLPs or rule bases are used. It is uncertain whether this category of tests is useful. It may yield an 'in-house' overview of the performance asymptote, but is unlikely to be useful as a basis for comparative benchmarking.


PROPOSAL:

The rationale is that for practical systems, only "Walk-up Recognition" and "Autonomous Adaptation" are of real value.

If there is sufficient interest in "Guided Adaptation", a scheme might be developed in the future which enables to standardize on a series of training set, development set, adaptation set and test set for a limited number of writers. Given the amount of work involved (at NIST) such a scheme is not likely to be implemented in the near future.


Test-specific remarks

Below are a number of remarks pertaining to individual tests.
If you have additions/note: send them to schomaker@nici.kun.nl.
Since we do not have much experience with isolated-character tests,
all hints are welcome.

(to refresh your memory: The test overview)

Benchmark Description

1a

isolated digits

1b

isolated upper case

1c

isolated lower case

1d

isolated symbols (punctuations etc.)

2

isolated characters, mixed case

3

isolated characters in the context of words or texts

4

isolated printed words, not mixed with digits and symbols

5

isolated printed words, full character set

6

isolated cursive or mixed-style words (without digits and symbols)

7

isolated words, any style, full character set

8

text: (minimally two words of) free text, full character set

Ad tests 6 and 7

Tests 6 and 7 tap the recognizer's ability to correctly segment an unknown word into its constituent characters. Therefore, no character-identity information may be used at any time of a given input word. Also, manual word segmentation (i.e., manual improvement of the word boundaries) is not allowed. The system should perform any necessary dehooking or pen-lift removal autonomously.

Ad test 8

The essence of this test is twofold: First, it captures the capacity of the recognizer to segment the data into isolated words. This must be achieved completely autonomously: Manual word segmentation is not allowed in this category. The second purpose of the test is to see whether the full character set can be handled.

Please send your comments to unipen-donators@magi.nist.gov.


4. Formats of *.RES and *.REC files

There have been a number of small changes in this area. Examples of the *.REC and *.RES formats are given at:


5. Proposal for a benchmark task descriptor format

Isabelle Guyon has written up a proposal for a benchmark descriptor format to be distributed with the test set. This format has not been finalized, as yet. Remarks are welcome.


6. Miscellaneous notes on signal quality in set train_r01_v07

Here are a few observations made at NICI. Maybe you have similar and other observations. Please share!
  1. Sets phi are missing .X|Y_POINTS_PER_MM and .X|Y_POINTS_PER_INCH. We have a C program (guess_resolution) which uses the fact that corpus-sized letters (x-height letters) have a population size of 2.5 mm. Although the actual setup may elicit other writing sizes, this is a good rule of thumb. The resulting value for the phi files is 100 points per mm. (Is this true, donator phi?).


  2. The sets kai have a .POINTS_PER_SECOND 9600?! This incredibly high number is undoubtedly a misinterpretation of the sampling rate as aka Baud rate. Interestingly, with 9600 bps, and a typical tablet format of 7bytes/coordinate, the often-used sampling rate of 200 Hz is impossible: (200*7*(8+2start/stop bits)) = 14000 bps > 9600 bps. Actually, the maximum sampling rate would be 137 Hz for a 7-byte format, or 192 Hz for an unlikely terse 5-byte format. Spectral analysis of the kai data shows that, with an assumed (guessed) sampling rate of 100 Hz, the spectral power is below 10 Hz, as expected (Fig. 1). Thus, a value for .POINTS_PER_SECOND of 100 seems reasonable (Is this true, donator kai?).

    Figure 1. The power-spectral density functions (PSDF), averaged over words, in the 6/data/kai sets.

    See the following reference for more information concerning the bandwidth of pen-tip movement in handwriting:

    Teulings, H.L., & Maarse, F.J. (1984). Digital recording and processing of handwriting movements. Human Movement Science, 3, 193-217.
    Here is another example of a handwriting power spectrum


  3. Figure 2. shows the dangers of resampling. The pcl set contains data which are resampled spatially with a very small delta increment. However, this was applied to the raw input coordinates coming from the digitizer. This means that the original low spatial resolution is retained! Low-pass filtering should be applied before resampling, not the other way around...

    Figure 2. An example of what may happen if indeterminate use (no offense intended) is made of resampling techniques. Each red dot represents a sample (coordinate pair) as exists in the UNIPEN file. However, from the coarse trajectory and the quantized lines it can be inferred that the original data is of a lower spatial resolution. It is much more difficult to remove the quantization noise, on the basis of the UNIPEN version: Many recipients probably would rather have had access to the original raw data.
    Possible post-hoc repair of this problem (not tested, this is only a hint):
    1. Perform subsampling, e.g. 1 in 4,
    2. then low-pass filtering,
    3. then high-resolution resampling again


6. Important WWW sites


Lambert Schomaker
NICI, Nijmegen Institute for Cognition and Information
University of Nijmegen, P.O.Box 9104
6500 HE Nijmegen, The Netherlands
Phone: +31 24 3616029 / Fax: +31 24 3616066
E-mail: schomaker@nici.kun.nl


Previous issue

UNIPEN homepage