The following UNIPEN data sets have been released by NIST:
| Date | Release | Remark |
| Wed, 09 Aug 1995 | "train_r01_v01" | Initial release |
| Wed, 20 Sep 1995 | "train_r01_v02" | Supersedes previous |
| Tue, 14 Nov 1995 | "train_r01_v03" | Supersedes previous |
| Fri, 26 Jan 1996 | "train_r01_v04" | Supersedes previous |
| Mon, 15 Jul 1996 | "train_r01_v05" | Supersedes previous |
| Tue, 01 Oct 1996 | "train_r01_v06" | Supersedes previous |
| "devtest_r01_v01" | First development test release | |
| Fri, 25 Oct 1996 | "train_r01_v07" | Supersedes previous |
| "devtest_r01_v02" | Supersedes previous devtest |
Reference example:
"As a training set, we used UNIPEN [xx] Train-R01/V07,
benchmark ... (see III, below), subsets .....
As a test set, we used UNIPEN DevTest-R01/V02,
benchmark ..., subsets ....
To the raw UNIPEN data, the following pre-processing
was applied: ...."
.
.
.
[xx] Guyon, I., Schomaker, L., Plamondon, R.,
Liberman, M. & Janet, S. (1994).
UNIPEN project of on-line data exchange and recognizer
benchmarks, Proceedings of the 12th International
Conference on Pattern Recognition, ICPR'94,
pp. 29-33, Jerusalem, Israel, October 1994. IAPR-IEEE.
Note that there is a problem in the use of test sets. Iterated use of a particular training / test set pair in a development process can be considered as indirect training! Even if a development set as such is not formally used for training, it is a well-known fact that all parameter adjustments, code improvements, etc., are a form of training, regardless of the type of pattern recognition algorithm which is used. Therefore, it is good practice to explain the effort spent in iterated testing in the publications.
| Benchmark | Description | ||||||||||||||||||||
1a | 1b | 1c | 1d | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
Note that only Benchmark #8 is a realistic, application-oriented test, because the word segmentation problem must also have been solved by the recognizer. No manual word segmentation is allowed in test Benchmark #8.