(COPY OF ORIGINAL E-MAIL MESSAGE)
Subject:
Action plan / Results of meeting at NIST
Date:
Thu, 28 May 1998 17:43:32 +0200
From:
Lambert Schomaker
Organization:
NICI
To:
unipen-participants
[]
Dear Unipen participant,
as you may have noticed, we have experienced increasing
difficulties during the past few months with the
organization of the first Unipen benchmark because of funding
cuts at NIST.
In any case, NIST seems to be committed to resume and finish the
work just until the benchmark Workshop, under the same conditions
as before, i.e., without extra sustenance. There will be no long-term
commitment, due to a changed agenda.
Therefore, a phase-out/phase-in process must be started right
now, in order to migrate towards a 'post-NIST' stage and
to ensure that the long-term interests of Unipen are safeguarded.
Please read this large document carefully, and execute the
action items as requested, in order to make sure that the
Unipen project may be continued.
Contents:
o TO DO list: ACTION ITEMS 1,2 and 3
o Report on meeting at NIST
o and other email exchanges with NIST management.
DEADLINE:
=========
We ask you kindly to execute all action items
before * June 21, 1998 * and let us know immediately
if you need more time. Your prompt answer will ensure
that your input will be taken into account prior to
making any decision.
========================================================================
1) First action item
-----------------
Please acknowledge having received and read the present
message by replying to it, i.e., to schomaker@nici.kun.nl.
This will make sure that you will be kept part of the
important decisions that have to be made regarding the fate
of the benchmark and the data.
2) Second action item
------------------
Please send by regular mail the following statement, which
ensures that the original Unipen goal of public
availability of the data does not encounter legal
impediments to:
L. Schomaker
Chairman of the Unipen project
Nijmegen Institute for Cognition and Information
Nijmegen University
P.O. Box 9104
6500 HE The Netherlands
-------------------cut here and add institution letterhead---
Data release consent form
-------------------------
I, ---name----, representing ---institution---, hereby
authorize public distribution of the data donated to the
Unipen consortium as of the date of the end of the first
realized official Unipen benchmark and in any case before
January 2000.
Date:
Authorized person name (printed):
Signature:
------------------------------- cut here ---------------------
3) Third action item
--------------------
A later section of this document contains a report of LS
visiting NIST. Although the meeting initially appeared to
generate some hope for improvement, as may be witnessed from
the text, the bottom line is that robust support of Unipen at
NIST is virtually impossible. Therefore, the following action
plan had to be developed.
Please return your comments and suggestions on the new action
plan to Lambert Schomaker . We
understand that you may have concerns and will examine all of
them. But if you do not get involved now in resolving the
present problems you run the risk of loosing your influence
would it come to make the decision to cancel the benchmark
altogether and release the data to the public.
New action plan:
================
You will notice that our strategy is to split the
organization of the Unipen project into several independent
subtasks:
(0) ORGANIZATION (may continue at NICI, 'as is')
(1) FUND RAISING
(2) DATA REPOSITORY
(3) ARBITRATION
(4) WORKSHOP ORGANIZATION
(5) PUBLIC DATA DISTRIBUTION
This division gives us more flexibility and opportunities
for choosing alternatives. In addition, each individual
subtask is more likely to be picked up by voluntary work
(i.e., it cuts down the expenses). Also, the goal of
this plan is to continue the good work prepared by NIST,
along a similar path as planned.
- FUND RAISING:
Raise money to fund the benchmark process, if necessary.
This is a responsibility of the organizers and the
participants.
- DATA REPOSITORY:
Choose a competent organization that will be responsible to
host the data and ensure its security. NIST can remain the
data host but we will accept other candidatures. We have
already contacts with several institutions and government
agencies including ISO, IAPR and LDC. NIST will ensure data
integrity and security during the transfer, if needed.
- ARBITRATION:
Arbitration entails (a) scoring software development, (b) testing
and (3) running the benchmark itself. The aim is to entrust
competent people to write the scoring software (these people
may or may not belong to the data host organization). We have
already several options, including continuing with NIST and
using the work of Isabelle Guyon who has started implementing
scoring software and is interested in continuing if support can
be found to cover the expenses.
We will start seeking other proposals.
- Test the scoring software with participant results using
the development test set (a collaboration between the host
organization, the scoring software people and all the
participants).
- Run the benchmark with participant results using the
benchmark test set (a collaboration between the host
organization and the scoring software people).
- WORKSHOP ORGANIZATION:
Organize a workshop to report on the results. NIST may
co-sponsor the workshop and it may take place on NIST's
premises.
- PUBLIC DATA DISTRIBUTION:
Make the data publicly available to download from the web or
by ftp and print a CDROM to distribute it (done by the data
host organization or in collaboration between the data host
organization and another organization of our choice) Both
NIST and LDC have offered to do it at moderate or no cost.
=-=-=-=-=-=-=-=-=-=-=--=-=-=-=-=-=
While we all want that NIST finishes the project started, we
have to face the eventuality that this will not happen.
There are also positive sides of the alternative plan:
* We have more control on the modalities of the benchmark.
In particular, we can decide again on whether or not we
want the results to remain anonymous.
* We have more control over the scoring software. We can
impose that the source of the code be published (ensuring
that everybody will have an opportunity to test it before
the benchmark and verify the accuracy of the results after
the benchmark). We can impose the metrics.
* We can decide on how to publish the results
The down sides are:
* We loose the quality label of NIST. However Unipen, as a
consortium of 40 companies and universities, is a credible
institution to organize a benchmark. Unipen can ensure by
itself the quality of the scoring software and entrust a
competent organization to host the data.
* We need to raise some money. However, our budget can be
very small (of the order of 1/2 man year) if we can find
free data hosting and rely mostly on voluntary work for the
scoring software.
============== END NEW ACTION PLAN ============================
Alternative Options
===================
- distribute training data to the public now, transfer remaining data
to an independent repository. (I think there are many problems
associated with this option, LS)
- use the money and influence of new companies which appear to be
interested in Unipen and which would like to participate: Our strategy
in this variant would be to facilitate fund raising by opening the
benchmark to participants who have not donated data.
(this may cause problems w.r.t. the interests of existing corporate
participants of Unipen, LS)
More background information:
---------------------------
Report of LS visiting NIST
Below is a report of a visit to NIST, Gaithersburg, on 14
may 1998, 8:30-13:00. A meeting took place between Charles
Wilson (CW), Stan Janet (SJ) and Lambert Schomaker (LS) at
Charles' office. The purpose of this visit was to make clear
to NIST that the Unipen participants are really very active
and eager to see benchmark results. In the course of the
visit, it became evident that current interests at NIST are
now mainly focussed on (pattern recognition in) digital
video and face recognition. Handwriting OCR, and especially
on-line handwriting recognition have a distinct low
priority. However, there are possibilities to continue the
Unipen work, since it is also apparent that already much
work has been invested, of which it would be a pity if it
were lost.
Some things Lambert Schomaker said
* some companies (Unipen participants) are eager to get
the benchmark going
* universities are very active in the Unipen area: at
CIFED'98: 4 papers on-line, plus one paper where Unipen
was used to generate bitmap images of characters
* outside interest is very strong (It appears that also
Stan receives requests for Unipen data)
* the rising interest is caused by the Goldberg&Richardson
and subsequent Graffiti-type simplified classifiers
and the 3Com PalmPilot phenomenon
* outside companies may occasionally even offer money for Unipen data
(which is of course intolerable, beware!)
* LS told more about the current situation, about all the
researchers (i.e., non-participants) asking for data
and asking about the status of the Unipen benchmark
process. CW said that from his place he could not
at all see any clear activity.
* The Unipen keyword cannot be found on any NIST WWW site,
which raises suspicion about the status of Unipen at
NIST, not only among donators, but also among those
who know Unipen.
Some things Charles Wilson said
* NIST appears to lose a lot of time and work with
correcting file format errors. Some groups have been
very cooperative in producing corrections whereas with
others there was only discussion on what is right and
what is wrong. Note that this is not even about label
quality but about Unipen format consistence.
* Charles appears to be completely tired of improving on
other peoples bad labels. He says he will never make
man power available to do that. LS said global visual
inspection could be already reveal many problems.
* The main goal for NIST is organizing the evaluation
conference, publish the results of the test (on paper
and CDROM). Publication is essential.
There will be no secrecy/anonymity of submitted benchmark
recognition results.
* Examples of the OCR evaluation are available from
NIST as very thick reports. This is the degree of
openness which will be the case with the Unipen
benchmark.
* Charles will need a number of letters, esp. from the
larger U.S. corporate donators, stating that they are
committed to Unipen, and want to see the benchmark
finished. The combined pile of letters should be sent
to his boss (S. Wakid) directly by Lambert in order to
prevent the impression that CW is enforcing personal
hobbies to be funded from within NIST.
(But: note subsequent developments as described below, LS)
* As regards software use, the issue is the following.
When the test sets and profiling data will be put on
CDROMs for donators, the software which is on that disk
must be free of restrictions other than the copyleft
requirement that the original author's name etc. need
to be left intact, like is done in GNU.
* The difference between Unipen and the other sets is
the diversity of donators in the case of Unipen. The
other sets in, e.g., OCR are produced by one or two
data sources.
* If the evaluation is not done this year, it may never
happen.
* Some mentioning of Unipen from within the NIST website
could be arranged.
Remarks by Stan Janet:
* The latest versions of the scoring software should always be
available from a specified FTP server too (apart from residing
on the evaluation CDROMs)
* With only a handful of exceptions, the donated datasets had
errors of varying degrees of seriousness. NIST believes that
datafiles or datasets that still have errors may have to be
left out.
* As regards future benchmarks: Based on how my partitioning
software has been written, there will be training data, a
development test set (for a pilot test, a "run-through" to
iron out kinks), and two benchmark test sets. The training
data will all go out for the pilot test. The plan has been to
use it for that and for both benchmark tests. The second
benchmark test will provide a mechanism to track improvements
since the first.
Other things that came up
Profile scoring
LS said it is not fully clear what a profile should look
like. CW said they had a lot of experience already. LS
brought the topic of Isabelle Guyon's Java software to the
table. The NIST point of view is that there is no in-house
experience with Java and it would be counterproductive to
switch. Platform portability is (said to be) guaranteed by
use of Perl software for which several interpreter
implementations exist. LS proposed a dual approach. If Perl
routines have a nice function, they can be implemented in
Java and vice versa. LS proposes to function as a bridge
between the two types of profilers.
Data quality
Some people have apparently sent in badly formatted data.
NIST thinks that there are many sets that could be better
thrown away because it costs too much manpower to correct.
These are the error categories, in order of increasing
severity:
* handwriting errors (I think that we need them: they
will occur in real life too. As long as a human reader
can read it, there is usable information in it, LS)
* understandable labeling errors (i/l/1 0/O/o)
* blatant labeling errors
* severe file format ('syntax') errors
and errors in segmentation
* (binary) nonsense bytes in the ASCII data stream
In future work, it will be necessary to make the signal
concept of donators more explicit (mouse-based vs
equidistant-time sampling, for instance).
The proposal is to ask donators who have provided bad data
to correct these themselves within a time frame. Data which
cannot be corrected because the background information and
knowledge is gone must be considered to be lost. LS proposes
to identify sets which still could be used as test set in a
'garbage bin', keeping only the XY-coordinates, which should
be clean, in this case.
Schedule
The following schedule was proposed.
.
.
.
(deleted matter: Item 1. In a previous version of the present
document that we submitted to CW for reviewing, our main action
item was to send letters to CW's manager to convince him to
secure funding for Unipen at NIST, as suggested by CW.
Presenting this text would be confusing now because things
changed, see below. The text is available for those interested.)
.
.
.
2. define the clean sets. Tag the bad ones as problematic.
Two schemes are possible: tagging the labels, or putting
aside the bad data. CW proposed not to let everyone
lose a lot of computation on bad data.
Donators get a fixed period (2-4weeks?) to improve the
format.
3. The idea of a pilot benchmark is maintained. The details
of the *.REC and *.RES have to be finalized.
4. Workshop with evaluation. This is CW's one and only
goal. The workshop generates internal NIST funding
for Unipen-based activities. Features:
o public scores!
o (combined) system behavior analysis.
o a CDROM with test results and scoring S/W will be
distributed among the donators after the evaluation
5. Produce the CDROM with the final training data which is
made available to the public. The results of the
recognizers could be used in principle to clean the data
up once more. Legal issues have to be solved at that
stage. Letters from all participants will be needed
that no claims rest on these data, when it goes public,
finally (Note that this was the whole idea of this
IAPR-based activity!, LS).
6. The next round takes place, first with the as yet
unused training data still available at NIST. At a later
stage, even new calls for data may follow.
End of NIST meeting report
-------------------------------------------------------------
Subsequent developments
=======================
Unfortunately, although the result of the meeting
appeared to generate some hope for substantial
improvement of the situation, later developments
show that such an improvement may not be possible,
after all.
> From: wilson@magi.nist.GOV
> I will forward Stan's comments after sending this.
>
> My only comment is that all references to letters of support
> need to be removed. Such letters would be counterproductive
> at this time.
>
> C. L. Wilson
----------------------
From: schomaker@nici.kun.nl
I am rather surprised and do not understand. As I said,
it will be relatively easy for me to mobilize the participants
in order to obtain such letters. Something has changed?
--
Lambert Schomaker
-----------------------
> From: wilson@magi.nist.GOV
> Letters, to have the desired effect, would need to appear to be
> unsolicited. After discussing the letter here my management
> would interpert this as an attempt apply presure for additional
> funds. Since next years budget is now fixed, this will not be
> appreciated
> and will make future funding more difficult for a small effort of
> the kind we expect.
>
> C. L. Wilson
-----------------------
Conclusion
==========
As a result of these developments a revised and new action
plan as described above was developed after lengthy
discussions with a number of people, notably Isabelle
Guyon and others: Thank you!
When you have read everything until this line, please take a
look back at the Action Items above. Make sure to reply
promptly to defend your rights and interests with respect to
the Unipen data and benchmark!
Best regards,
--
Lambert Schomaker /
NICI, Nijmegen Institute for Cognition and Information #/########
University of Nijmegen, P.O.Box 9104 ##/########
6500 HE Nijmegen, The Netherlands # / #######
Tel: +31 24 3616029 / Fax: +31 24 3616066 #/####
E-*: schomaker@nici.kun.nl http://hwr.nici.kun.nl/
-----------------------------------------------------------------------