WandaML - a markup language for digital document annotation

Katrin Franke[2], Isabelle Guyon[1], Lambert Schomaker[3], and Louis Vuurpijl[4]
1. ClopiNet, 955 Creston Rd, Berkeley, USA, (corresponding author.)
2. Fraunhofer Institute, Berlin, Germany.
3. Rijksuniversiteit Groningen, The Netherlands.
4. University of Nijmegen, The Netherlands.

WandaML is an XML-based markup language for the annotation and filter journaling of digital documents. It addresses in particular the needs of forensic handwriting data examination, by allowing experts to enter information about writer, material (pen, paper), script and content, and to record chains of image filtering and feature extraction operations applied to the data. We present the design of this format and some annotation examples, in the more general perspective of digital document annotation. Annotations may be organized in a structure that reflects the document layout via a hierarchy of document regions. WandaML can lend itself to a variety of applications, including the annotation of all kinds of handwriting documents (on-line or off-line), images of printed text, medical images, and satellite images. Keywords: Handwriting, forensic data, XML, annotations, data format, document analysis.

WandaML was developed in the Wanda project that aims at standardization, objectification and replicability of results in computer-based forensic handwriting examination and writer identification.

Links

  • WandaML Document Type Definition (DTD)
  • A report describing the design of WandaML, its applications, and providing guidelines on how to use it
  • The Wanda project and a description of the Wanda framework
  • A zip file containing the entire WandaML DTD distribution


  • Last modified by Louis Vuurpijl Tue Apr 6 08:59:44 MET DST 2004