UNIPEN SCRAWL #9,
January 28th, 1998
- > - > - > - > - > - > - > - < - < - < - < - < - < - < - < - < -
- > UNIPEN project of data exchange and recognizer benchmarks < -
- > - > - > - > - > - > - > - < - < - < - < - < - < - < - < - < -
#
#
# OOOOOOOOOOOOOOOOO
OOOOO# OOOOOOOOOOOOOOO
OOOOOOOO#OOOOOOOOOOOOOOOOOO
OOOO # OOOOOOOOOOOOOO
OOO # OOOOOOOOOOOOOO
OOO # OOOOOO
OOO # OOOO
OOO # OOOO
# OOO
#
UNIPEN SCRAWL #9 / January 22th, 1998
by Lambert Schomaker
1. The Pilot Benchmark, continued ...
A Happy New Year to everybody! As was the purpose of the pilot benchmarking
process, a number of quirks have surfaced. For one thing, the whole process
appears to be much more time consuming than anticipated. Furthermore,
several questions have been raised regarding testing details.
Serious problems with hardware at NIST
In the second week of january, the machine supporting UNIPEN data
(magi at NIST) broke down:
.. We lost Magi's 9Gb system disk drive and its CPU on the
same day two weeks ago ...(Stan Janet)
Reorganization takes more time than expected
.. (I) am currently coding
a system to rename the files and writers randomly so that no extra
info can be known from them, filtering out .DATA_SOURCE/comments/etc.
similarly ... (Stan Janet)
Conclusion: we have a . I have already heard mutterings of relief
here and there because this buys everyone extra training time. However,
others may be disappointed with the way things are going. Also, please note
that we at NICI do not have any special position as regards the accessibility
of data. It is all in the hands of NIST.
2. Support for NIST
One of the problems of the current situation is that UNIPEN-related
activities at NIST are on a 'low burning rate'. No matter how willing and
cooperative Stan Janet is, the UNIPEN-related work must yield for the
activities and projects which enjoy heavy funding and sponsoring at NIST.
Personally, I have tried to collect grants for UNIPEN-related work
(e.g., a European central point, or foundation). However, most agencies
are surprised that such a large consortium of 40+ members including
industry cannot support itself in a more solid way, and generating money
is quite difficult. As a consortium, we could indeed try to sponsor
manpower at NIST (e.g., 40 * $ 1000. would be about half a manyear of research
assistence). What are your opinions? Please react.
3. Testing: Rules of the game
Below are some thoughts on the testing process, as collected during
a discussion between Isabelle Guyon, Stan Janet and Lambert Schomaker.
A taxonomy of recognition tests
| 1) Walk-up Recognition |
Walk-up recognition is, by definition, writer-independent
recognition. The paradigm is that, e.g.,
"... you put a pen-based PDA in a store,
a potential customer comes in and starts to write a number of words:
What is the recognition rate obtained under this condition?"
The test is sequential in real time. This means that a preparatory
scan on a file, followed by a parameterized run of the
recognizer is not allowed. Words are recognized as they come in,
future words are unknown. Repeated tests or parameter adjustments
are not allowed.
- Note:
- As the UNIPEN construction entails that each donator may
see his/her own handwriting samples again in a particular
test set, this ideal is not completely met. However,
a score for the whole set, and the set with exclusion of
the self-donated samples could be envisaged.
|
| 2) Writer-dependent Recognition |
|
2.1) Lumped Writer-dependent Recognition
A test set is used from a writer who has been seen before in a
(potentially very large) training set.
No other knowledge on the writer's identity is used directly or
indirectly. The test results are biased positivily due to
the advance knowledge, which is unlikely to exist in a real application.
2.2) Adapted Writer-dependent Recognition
2.2.1) Autonomous Adaptation (Unsupervised)
The recognizer adapts to the current writer, e.g. in
an enrollment stage. Possibly, the systems tries to
correlate known handwriting styles and shapes to the current
writer input script. The reported recognition rate is after
adaptation. However: truth labels have never been used in
the adaptation process.
2.2.2) Guided Adaptation/Manual Labeling (Supervised)
Writer-specific labeling has been used to train the
recognizer. This is a case of supervised learning.
Knowledge about the identity of the writer
and details on his/her writing style are put into the system.
For example, writer-specific HMMs or MLPs or rule bases
are used. It is uncertain whether this category of tests
is useful. It may yield an 'in-house' overview of the
performance asymptote, but is unlikely to be useful as
a basis for comparative benchmarking.
|
PROPOSAL:
- To report minimally on category 1) "Walk-up Recognition"
- To report optionally on 2.2.1) "Autonomous Adaptation"
- To NOT report on 2.1) "Lumped Writer-dependent Recognition"
- nor on 2.2.2) "Guided Adaptation/Manual Labeling"
The rationale is that for practical systems, only "Walk-up Recognition"
and "Autonomous Adaptation" are of real value.
If there is sufficient interest in "Guided Adaptation", a scheme might be
developed in the future which enables to standardize on a series of
training set, development set, adaptation set and test set for a limited
number of writers. Given the amount of work involved (at NIST) such a
scheme is not likely to be implemented in the near future.
Test-specific remarks
Below are a number of remarks pertaining to individual tests.
If you have additions/note: send them to
schomaker@nici.kun.nl.
Since we do not have much experience with isolated-character tests,
all hints are welcome.
(to refresh your memory: The test overview)
| Benchmark | Description
|
1a | isolated digits
| 1b | isolated upper case
| 1c | isolated lower case
| 1d | isolated symbols (punctuations etc.)
| 2 | isolated characters, mixed case
| 3 | isolated characters in the context of words or texts
| 4 | isolated printed words, not mixed with digits and symbols
| 5 | isolated printed words, full character set
| 6 | isolated cursive or mixed-style words (without digits and symbols)
| 7 | isolated words, any style, full character set
| 8 | text: (minimally two words of) free text, full character set
| | | | | | | | | | | |
Ad tests 6 and 7
Tests 6 and 7 tap the recognizer's ability to correctly
segment an unknown word into its constituent characters.
Therefore, no character-identity information may be used
at any time of a given input word. Also, manual word segmentation
(i.e., manual improvement of the word boundaries) is not
allowed. The system should perform any necessary dehooking
or pen-lift removal autonomously.
Ad test 8
The essence of this test is twofold: First, it captures the
capacity of the recognizer to segment the data into isolated
words. This must be achieved completely autonomously: Manual
word segmentation is not allowed in this category.
The second purpose of the test is to see whether the full
character set can be handled.
Please send your comments
to unipen-donators@magi.nist.gov.
4. Formats of *.RES and *.REC files
There have been a number of small changes in this area.
Examples of the *.REC and *.RES formats are given at:
-
*.RES example and comments
AS A NEW RULE, .SEGMENT ENTRIES ARE FORBIDDEN IN .RES FILES!
============================================================
There was a logical inconsistency in previous examples in that
the truth value is actually not known for test sets. Furthermore,
the pen-stream delineation information is already present
in the .REC_LABELS field, which makes the presence of .SEGMENT
completely redundant (even if the label contained 'meta text' like
"?" the entry would be non-informative).
Therefore the use of the .SEGMENT keyword is prohibited in *.RES
files.
I. Guyon/L. Schomaker, January 1998.
-
*.REC examples.
5. Proposal for a benchmark task descriptor format
Isabelle Guyon has written up a proposal for a benchmark descriptor format
to be distributed with the test set. This format has not been finalized,
as yet. Remarks are welcome.
6. Miscellaneous notes on signal quality in set train_r01_v07
Here are a few observations made at NICI. Maybe you have similar and other
observations. Please share!
- Sets phi are missing .X|Y_POINTS_PER_MM and
.X|Y_POINTS_PER_INCH. We have a C program (guess_resolution)
which uses the fact that corpus-sized letters (x-height letters)
have a population size of 2.5 mm. Although the actual setup may
elicit other writing sizes, this is a good rule of thumb. The resulting
value for the phi files is 100 points per mm.
(Is this true, donator phi?).
- The sets kai have a .POINTS_PER_SECOND 9600?!
This incredibly high number is undoubtedly a misinterpretation of
the sampling rate as aka Baud rate. Interestingly, with
9600 bps, and a typical tablet format of 7bytes/coordinate, the often-used
sampling rate of 200 Hz is impossible: (200*7*(8+2start/stop bits)) = 14000 bps > 9600 bps.
Actually, the maximum sampling rate would be 137 Hz for a 7-byte format,
or 192 Hz for an unlikely terse 5-byte format.
Spectral analysis of the kai
data shows that, with an assumed (guessed) sampling rate of 100 Hz,
the spectral power is below 10 Hz, as expected (Fig. 1). Thus, a value
for .POINTS_PER_SECOND of 100 seems reasonable
(Is this true, donator kai?).
|
|
Figure 1.
The power-spectral density functions (PSDF), averaged over words,
in the 6/data/kai sets.
|
See the following reference for more information concerning the bandwidth
of pen-tip movement in handwriting:
Teulings, H.L., & Maarse, F.J. (1984).
Digital recording and processing of handwriting movements.
Human Movement Science, 3, 193-217.
Here is another example of a handwriting power spectrum
- Figure 2. shows the dangers of resampling. The pcl set contains
data which are resampled spatially with a very small delta increment.
However, this was applied to the raw input coordinates coming from the
digitizer. This means that the original low spatial resolution is
retained! Low-pass filtering should be applied before resampling, not the
other way around...
|
|
Figure 2.
An example of what may happen if indeterminate use (no offense intended)
is made of resampling techniques. Each red dot
represents a sample (coordinate pair)
as exists in the UNIPEN file. However, from the coarse trajectory and
the quantized lines it can be inferred that the original data is of
a lower spatial resolution. It is much more difficult to remove the quantization
noise, on the basis of the UNIPEN version: Many recipients probably
would rather have had access to the original raw data.
|
Possible post-hoc repair of this problem (not tested, this is only a hint):
- Perform subsampling, e.g. 1 in 4,
- then low-pass filtering,
- then high-resolution resampling again
6. Important WWW sites
Lambert Schomaker
NICI, Nijmegen Institute for Cognition and Information
University of Nijmegen, P.O.Box 9104
6500 HE Nijmegen, The Netherlands
Phone: +31 24 3616029 / Fax: +31 24 3616066
E-mail: schomaker@nici.kun.nl
Previous issue
UNIPEN homepage