A GENERIC SYSTEM TO EXTRACT AND CLEAN HANDWRITTEN DATA FROM BUSINESS FORMS

Xiangyun YE1,2 Mohamed CHERIET1,2 and Ching Y. SUEN1

1Centre for Pattern Recognition and Machine Intelligence
Concordia University, Suite GM606, 1455 de Maisonneuve Blvd. West
Montréal, Québec H3G 1M8, Canada

2Imagery, Vision and Artificial Intelligence Laboratory
École de Technologie Supérieure, University of Québec
1100, Notre­Dame West, Montréal, Québec H3C 1K3, Canada
E­mail: {xyye, suen@cenparmi.concordia.ca}, cheriet@gpa.etsmtl.ca

A generic system is proposed to automatically extract and clean handwritten items from business forms. Handwritten data usually touch or cross preprinted form frames and texts. Having assumed that the item­of­interest can be located roughly by existing form registration methods, we focus only on the extraction and cleaning of the filled­in items. The proposed system includes training and cleaning phases. In the training phase, a model template is generated automatically from a blank form. Features such as the position and stroke width of the preprinted entities (including form frames and instructions) are extracted. In the cleaning phase, the system registers the template to the input form by landmark alignment. The form frames are removed and the handwritings are restored by morphological operations. When the handwritings are found touching or crossing preprinted texts, morphological operations based on statistical features are used to clean them. Both subjective and objective evaluations show promising results of the proposed system.

In: L.R.B. Schomaker and L.G. Vuurpijl (Eds.)
Proceedings of the Seventh International Workshop on Frontiers
in Handwriting Recognition, September 11-13 2000, Amsterdam,
Nijmegen: International Unipen Foundation,
ISBN 90-76942-01-3
pp. 63-72.