SIMILARITY MEASURES FOR WRITER CLUSTERING

Jayashree SUBRAHMONIA

IBM T.J. Watson Research, P.O. Box 218 / Route 134,
Yorktown Heights, NY 10598, U. S. A.
E­mail: jays@watson.ibm.com

This paper addresses the problem of improving the performance of an online, writer­independent, large­vocabulary, unconstrained, handwriting recognition sys­ tem by clustering writers with similar writing styles. Recognition performance is enhanced by identifying the writer cluster that a test writer is closest to and using a model trained for the corresponding writer cluster in decoding. The recognition system is based on hidden Markov models. A common set of features are computed for all writers, which are then projected to a lower dimensional space that preserves most of the information in the original feature set. The reduced dimensional space varies from writer to writer. This paper describes two measures of similarity between writing styles. The first is based on the distance between the writer­dependent reduced dimensional feature subspaces. The second is based on the hidden Markov Model output probabilities.

In: L.R.B. Schomaker and L.G. Vuurpijl (Eds.)
Proceedings of the Seventh International Workshop on Frontiers
in Handwriting Recognition, September 11-13 2000, Amsterdam,
Nijmegen: International Unipen Foundation,
ISBN 90-76942-01-3
pp. 541-546.