| Methods for detecting and extracting whole text lines from unconstrained online handwritten text are described. The general approach is a ``bottom-up'' clustering of discrete strokes into small groups that are then merged into isolated lines of text. Initial clustering of strokes into groups is based on combined temporal and spatial stroke proximity. Spatial stroke proximity is gauged relative to estimated inter-line distance and mean character height. Two methods applicable to off-line or on-line data are described for estimating the inter-line distance: autocorrelation (self-convolution) of the Y-axis projection histogram, and a fitting function. Inter-line distance is accurately determined for 99% of all text pages. Text line extraction accuracy on letters (correspondence) is 98.7% and on tables is 94.9%. |
In: L.R.B. Schomaker and L.G. Vuurpijl (Eds.)
Proceedings of the Seventh International Workshop on Frontiers
in Handwriting Recognition, September 11-13 2000, Amsterdam,
Nijmegen: International Unipen Foundation,
ISBN 90-76942-01-3
pp. 33-42.