UPX is the new format for specifying annotated databases of "online ink".
It contains elements to describe information about datasets, about writers, the data collection setup, the data annotation process, etcetera. Furthermore, it contains pointers to refer to raw ink traces specified in inkML. UPX is designed to support the specification of heterogeneous documents containing, e.g., handwriting, drawings and sketches. Annotation of traces can be performed on various levels of hierarchy, like PARAGRAPH, LINE, or WORD. UPX can be used to annotate any scripts and enables linking online ink traces to regions of interest in scanned images.
UPX version 0.9.5 complies with the latest inkML specifications and is to a large extent based on its predecessor, UNIPEN. Earlier versions of UPX have been assessed in the collection and annotation of significant datasets
[1,
2,
3]. The current version will be used in the conversion of existing UNIPEN databases and other databases that the international Unipen Foundation is planning to release.
UPX version 0.9.5 is now published as the first UPX Working Draft. Any comments or suggestions for improvement are welcome.
2. The UPX 0.9.5 version and ongoing efforts
As of October 2006, the iUF and HP Labs are actively promoting UPX. At
the same time, the new final release of inkML was announced. Although
there remain a few issues to be discussed in the UPX definition,
we are now transforming UNIPEN and the afore-mentioned IBM collection
into UPX. We are confident that the format is excellent for specifying
annotated databases containing online ink. This confidence is based on
three observations. First, HP Labs already proved [1,
2, 3] that the format works
well for specifying their databases. Second, during Summer 2006 the
iUF has executed two pilot studies that had the goal to assess the UPX
format for existing UNIPEN data and other databases. Third, UPX provides
elaborate elements for specifying annotations of online ink and is loosely
connected to the results from inkML, which was designed for specifying
the raw ink trajectories and corresponding recording devices.
The current version is now used and further assessed in the conversion of
(i) the UNIPEN train_r01_v07 database, (ii) the Firemaker on/off database
(containing both online trajectories and scanned offline images), (iii)
the ICIS database [4] that contains pen gestures
collected in the context of annotations of images of photographs, and
(iv) the IBM databases that have been generously made available by
Michael Perrone from IBM.
We realize that many people interested in UPX will really appreciate
the availability of software. Software for extracting "ink" from UPX
documents, software for viewing and browsing through UPX documents and
software for annotating online pen input. At the moment, we have various
implementations of such software that are used in our labs. Unfortunately,
we have not decided about when and how to release stable versions of
these tools. However, we can guarantee that when datasets will become
available in UPX, we will at least have a convertor that converts UNIPEN
to UPX and vice versa.
In this section, we will present some examples of UPX. For more information,
please consider the working draft specification (which is open for review and comments), which is part of the technical documentation listed at the bottom of this document.
3. Examples of UPX0.9.5
The <upx> element is the root element of each document that comprises the Dataset. The Dataset
is comprised of multiple InkML and UPX documents organized in a directory structure,
with perhaps common information stored in a separate UPX document and referred to by
other UPX documents. The <upx> element contains three main sub-elements for specifying annotated online ink:
- <datasetInfo> specifies metadata related to the dataset as a whole, a
high-level characterization of the Dataset.
- <datasetDefs>, containing information about writers,
sources of annotation and annotation hierarchies referred to in the UPX document
- <hwData>, contains detailed labeling of digital ink traces, organized in a
customizable hierarchy. Gestures and other notations like music and math can also be
accommodated in the same structure.
In this section, some examples of using the <hwData>
element are described. The <hwData> element contains (i) the
<hLevel> element, through which annotations and references to
digital ink can be made at a certain level of hierarchy, (ii) the
<uiInfo> element, via which the layout of the user-interface
that was used during data capture can be described, and (iii) the
<imgInfo> element, via which regions of interest of a background
image (on which ink was captured) or scanned offline image (corresponding
to the online ink) can be specified.
Below, in Section 3.1., we will present an example of an hierarchical
annotation of handwriting (through <hLevel>). In Section 3.2, we
will show how the same online handwriting can be coupled to a scanned
document (through <imgInfo>). And in Section 3.3, an example of
heterogeneous hierarchies of online ink collected in the context of
photograph annotations (with some references to the image material)
is presented. Finally, in Section 3.4, the UPX concept of specifying
information about datasets, the data collection setup, annotation schemes,
and label sources is explained.
3.1. The hLevel element: specifying hierarchy in ink trajectories
To illustrate UPX "in action", consider the following handwritten
fragment, which contains a line of text from the Firemaker on/off
database. A writer with writerID 0629 wrote the line "Bob,
David en sexy Xantippe sparen postzegels". An annotator segmented this
online ink into a line, a word and 5 characters (see Fig 1. below).

Figure 1. An example
piece of handwriting with (sparsely annotated) different hierarchy
levels. See text below for more details.
Please note that in UNIPEN, there is no way to explicitly specify this
hierarchy, since UNIPEN entries can be located anywhere in a document. So
the following two UNIPEN specifications of the above ink are identical:
.SEGMENT LINE 0-51 OK "Bob,...postzegels" .SEGMENT LINE 0-51 OK "Bob,...postzegels"
.SEGMENT CHAR 1-3:184 OK "B" .SEGMENT WORD 21-23 OK "sexy"
.SEGMENT CHAR 19:0-19:80 OK "e" .SEGMENT CHAR 1-3:184 OK "B"
.SEGMENT WORD 21-23 OK "sexy" .SEGMENT CHAR 19:0-19:80 OK "e"
.SEGMENT CHAR 21:90-21:160 OK "e" .SEGMENT CHAR 21:90-21:160 OK "e"
.SEGMENT CHAR 23:60-23:161 OK "y" .SEGMENT CHAR 23:60-23:161 OK "y"
.SEGMENT CHAR 41:112-41:223 OK "p" .SEGMENT CHAR 41:112-41:223 OK "p"
Code snippet 1. Two ways of specifying the example ink in UNIPEN.
Only through parsing the UNIPEN file and comparing the delineations of each .SEGMENT entry, it can be
decided which entries are parents or children from each other. The software in the iUF uptools distribution
makes this possible by the way.
In UPX, this hierarchy can be specified through the hLevel element. Each hLevel can contain any number of child hLevel of any hierarchy level, allowing, e.g., to specify that a LINE contains a WORD with three sibling CHAR (as in the example above). The recursive definition of hLevel makes it possible to specify that the WORD in the example
contains two child hLevel elements from the CHAR hierarchy level:
<hwData annotationSchemeRef="#annotationScheme_0" writerRef="#w0629">
<hLevel id="LINE_0" level="LINE"><label id="LINE_0">
<alternate rank="1" score="1">Bob,...postzegels</alternate>
</label>
<hwTraces> <!-- from UNIPEN segment [.SEGMENT LINE 0-51 OK "Bob,...postzegels"] --><inkml:traceView traceRef="w0629.inkml#dataSet_0_traces" from="1" to="52"/>
</hwTraces>
<hLevel id="WORD_0" level="WORD"><label id="WORD_0">
<alternate rank="1" score="1"/>sexy</alternate>
</label>
<hwTraces> <!-- from UNIPEN segment [.SEGMENT WORD 21-23 OK "sexy"] --><inkml:traceView traceRef="w0629.inkml#dataSet_0_traces" from="22" to="24"/>
</hwTraces>
<hLevel id="CHAR_0" level="CHAR"><label id="CHAR_0">
<alternate rank="1" score="1"/>e</alternate>
</label>
<hwTraces> <!-- from UNIPEN segment [.SEGMENT CHAR 21:90-21:160 OK "e"] --><inkml:traceView traceRef="w0629.inkml#dataSet_0_traces" from="22:91" to="22:161"/>
</hwTraces> </hLevel>
<hLevel id="CHAR_1" level="CHAR"><label id="CHAR_1">
<alternate rank="1" score="1"/>y</alternate>
</label>
<hwTraces> <!-- from UNIPEN segment [.SEGMENT CHAR 23:60-23:161 OK "y"] --><inkml:traceView traceRef="w0629.inkml#dataSet_0_traces" from="24:61" to="24:162"/>
</hwTraces> </hLevel> </hLevel>
<hLevel id="CHAR_2" level="CHAR"><label id="CHAR_2">
<alternate rank="1" score="1"/>B</alternate>
</label>
<hwTraces> <!-- from UNIPEN segment [.SEGMENT CHAR 1-3:184 OK "B"] --><inkml:traceView traceRef="w0629.inkml#dataSet_0_traces" from="2" to="4:185"/>
</hwTraces> </hLevel>
<hLevel id="CHAR_3" level="CHAR"><label id="CHAR_3">
<alternate rank="1" score="1"/>e</alternate>
</label>
<hwTraces> <!-- from UNIPEN segment [.SEGMENT CHAR 19:0-19:80 OK "e"] --><inkml:traceView traceRef="w0629.inkml#dataSet_0_traces" from="20" to="20:81"/>
</hwTraces> </hLevel>
<hLevel id="CHAR_4" level="CHAR"><label id="CHAR_4">
<alternate rank="1" score="1"/>p</alternate>
</label>
<hwTraces> <!-- from UNIPEN segment [.SEGMENT CHAR 41:112-41:223 OK "p"] --><inkml:traceView traceRef="w0629.inkml#dataSet_0_traces" from="42:113" to="42:224"/>
</hwTraces> </hLevel> </hLevel>
</hwData>
Code snippet 2. Hierarchical organization of <hLevel> elements.
3.2. The imgInfo element: UPX allows linking online ink to scanned documents
To illustrate that UPX now supports the linking of online ink to scanned documents containing handwriting, consider the
following image, which depicts the super-imposed online trajectory on top of the corresponding scanned ink image from the words "en sexy Xantippe".
The original scanned document is located here.

Figure 2. Scanned offline document and super-imposed ink trajectories in light-blue.
Given a correct alignment procedure for aligning online traces with offline fragments, automatic extraction of
scanned ink fragments becomes possible. Alternatively, manual specification of regions of interests in the image, corresponding to online trajectories can be performed . In UPX, it is possible to link such a region of interest on an image to a
<hLevel> element through the <imgInfo> element.
For example, consider the <hLevel> element describing the line "Bob, ... postzegels" from code snippet 3.
<hLevel id="LINE_0" level="LINE"><label id="LINE_0">
<alternate rank="1" score="1">"Bob,...postzegels"</alternate>
</label>
<hwTraces> <!-- from UNIPEN segment [.SEGMENT LINE 0-51 OK "Bob,...postzegels"] --><inkml:traceView traceRef="w0629.inkml#dataSet_0_traces" from="1" to="52"/>
</hwTraces>
<imgInfo href="w0629/0629.gif">
<roi points="317 249 2059 249 2059 369 317 369 317 249"/>
</imgInfo>
Code snippet 3. A <hLevel> element linked to a background image through the <imgInfo> element. Here, the roi is specified as a bounding box. Other specifications are possible, like a convex hull or any other closed polygon.
Note that in this example, the <imgInfo> element overrides any <imgInfo> elements defined earlier (e.g. in the parent <hwData>).
3.3. An example in the annotation of photographic content: heterogeneous hierarchy
In Figure 3, another example is presented, but now in the context of a person marking situations in
photographic material. The example is taken from
the ICIS database [4]. The writer was asked to describe a car accident on a
map. Typically, in such descriptions of scenes on photographs, writers tend to combine sketches, drawings, and marking gestures with text.
|
In this example, the top-level of annotation consists of multiple
sub-levels of different categories. To describe the hierarchical organization of the collected
online trajectories, a pipeline hierarchy as was used in UNIPEN cannot
be used. The top level "Semantic-unit" can constist of deictic gestures,
sketched objects (houses/people/cars/etc.), and handwriting. Here,
a tree-based hierarchy is needed, where each node can represent a
collection of ink trajectories from a certain hierarchical level which
is distinct from the hierarchy of other nodes.
In Code snippet 3. depicted below, one can see how this is solved
in UPX. The <hLevel> element at level "semantic-unit" has 4
sub-<hLevel> elements: two objects (cars involved in the accident),
one sentence of handwriting containing two words ("burning cars"), and one
deictic gesture (linking the handwriting with the cars that are burning).
To clearly distinguish the hierarchy and different levels, the code is
coloured in blue (objects), green (deictic gesture), and various colors
of red (handwriting).
The <hwData> element refers to an annotationScheme that was defined elsewhere in
the same document. Note that this may also refer to another document on a file system or
on the internet.
|

Figure 3. An example of heterogeneous hierarchy levels. See text for more details.
|
<hwData annotationSchemeRef="#ICISHF2005_Annotation_hierarchy" id="PARTICIPANTID-dwillems-1124455034791-12-1"><hLevel level="semantic-unit" id="SU1"><!-- The first level is the semantic-unit which contains in this instance a two objects (rectangles), an deictic gesture (arrow)
and two lines of handwriting.--><label id="SU1label" labelSrcRef="#labelref_DW" labelType="truth"><alternate/>
</label>
<hwTraces><!-- These trace references contain all traces that make up the semantic unit (from trace 2 to 34). --><inkml:traceView traceRef="./example-HF05.inkml" from="2" to="34"/>
</hwTraces>
<hLevel level="object" id="OBJECT1"><!-- The first sub level contains an object. --><label id="OBJECT1label" labelSrcRef="labelref_DW" labelType="truth"><alternate/>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="2" to="2"/>
</hwTraces>
<hLevel level="rectangle" id="RECT1"><!-- The object is of type rectangle. --><label id="RECT1label" labelSrcRef="labelref_DW" labelType="truth"><alternate/>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="2" to="2"/>
</hwTraces> </hLevel> </hLevel>
<hLevel level="object" id="OBJECT2"><label id="OBJECT2label" labelSrcRef="labelref_DW" labelType="truth"><alternate/>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="4" to="4"/>
</hwTraces>
<hLevel level="rectangle" id="RECT2"><label id="RECT2label" labelSrcRef="labelref_DW" labelType="truth"><alternate/>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="4" to="4"/>
</hwTraces> </hLevel> </hLevel>
<hLevel level="deictic" id="DEICTIC1"><!-- This gesture is a deictic gesture. --><label id="DEICTIC1label" labelSrcRef="labelref_DW" labelType="truth"><alternate/>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="6" to="10"/>
</hwTraces>
<hLevel level="mark" id="MARK1"><!-- The deictic gesture contains a marking gesture. --><label id="MARK1label" labelSrcRef="labelref_DW" labelType="truth"><alternate/>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="6" to="10"/>
</hwTraces>
<hLevel level="arrow" id="ARROW1"><!-- The marking getsure consists of an arrow.--><label id="ARROW1label" labelSrcRef="labelref_DW" labelType="truth"><alternate/>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="6" to="10"/>
</hwTraces>
<hLevel level="tail" id="TAIL1"><!-- The first part of the arrow that was drawn is the tail of the arrow. --><label id="TAIL1label" labelSrcRef="labelref_DW" labelType="truth"><alternate/>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="6" to="6"/>
</hwTraces> </hLevel>
<hLevel level="head" id="HEAD1"><!-- The second part of the arrow that was drawn is the head of the arrow. --><label id="HEAD1label" labelSrcRef="labelref_DW" labelType="truth"><alternate/>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="8" to="10"/>
</hwTraces> </hLevel> </hLevel> </hLevel> </hLevel>
<hLevel level="handwriting" id="HWR1"><!-- The part of the ink trace contains handwriting. --><label id="HWR1label" labelSrcRef="labelref_DW" labelType="truth"><alternate>"burning cars"</alternate>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="12" to="34"/>
</hwTraces>
<hLevel level="line" id="HWRLINE1"><!-- The first line of handwriting. --><label id="HWRLINE1label" labelSrcRef="labelref_DW" labelType="truth"><alternate>"burning"</alternate>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="12" to="26"/>
</hwTraces>
<hLevel level="word" id="WORD1"><!-- The first word in this line of handwriting. --><label id="WORD1label" labelSrcRef="labelref_DW" labelType="truth"><alternate>"burning"</alternate>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="12" to="26"/>
</hwTraces>
<hLevel level="character" id="CHAR1"><!-- The first character in this word of handwriting. --><label id="CHAR1label" labelSrcRef="labelref_DW" labelType="truth"><alternate>"b"</alternate>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="12" to="12"/>
</hwTraces> </hLevel>
<hLevel level="character" id="CHAR2"><label id="CHAR2label" labelSrcRef="labelref_DW" labelType="truth"><alternate>"u"</alternate>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="14" to="14"/>
</hwTraces> </hLevel>
<hLevel level="character" id="CHAR2"><label id="CHAR2label" labelSrcRef="labelref_DW" labelType="truth"><alternate>"r"</alternate>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="16" to="16"/>
</hwTraces> </hLevel>
<hLevel level="character" id="CHAR3"><label id="CHAR3label" labelSrcRef="labelref_DW" labelType="truth"><alternate>"n"</alternate>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="18" to="18"/>
</hwTraces> </hLevel>
<hLevel level="character" id="CHAR4"><label id="CHAR4label" labelSrcRef="labelref_DW" labelType="truth"><alternate>"i"</alternate>
</label>
<hwTraces><!-- The character 'i' is constructed by first the long stroke, which is followed by other characters and then by the dot.
Two traceViews are necessary because the stroke and the dot are seperated.--><inkml:traceView traceRef="/example-HF05.inkml" from="20" to="20"/>
<inkml:traceView traceRef="/example-HF05.inkml" from="26" to="26"/>
</hwTraces> </hLevel>
<hLevel level="character" id="CHAR5"><label id="CHAR5label" labelSrcRef="labelref_DW" labelType="truth"><alternate>"n"</alternate>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="22" to="22"/>
</hwTraces> </hLevel>
<hLevel level="character" id="CHAR6"><label id="CHAR6label" labelSrcRef="labelref_DW" labelType="truth"><alternate>"g"</alternate>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="24" to="24"/>
</hwTraces> </hLevel> </hLevel> </hLevel>
<hLevel level="line" id="HWRLINE2"><label id="HWRLINE2label" labelSrcRef="labelref_DW" labelType="truth"><alternate>"cars"</alternate>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="28" to="32"/>
</hwTraces>
<hLevel level="word" id="WORD2"><label id="WORD2label" labelSrcRef="labelref_DW" labelType="truth"><alternate>"cars"</alternate>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="28" to="32"/>
</hwTraces>
<hLevel level="character" id="CHAR7"><label id="CHAR7label" labelSrcRef="labelref_DW" labelType="truth"><alternate>"c"</alternate>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="28" to="28"/>
</hwTraces> </hLevel>
<hLevel level="character" id="CHAR8"><label id="CHAR8label" labelSrcRef="labelref_DW" labelType="truth"><alternate>"a"</alternate>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="34" to="34"/>
</hwTraces> </hLevel>
<hLevel level="character" id="CHAR9"><label id="CHAR9label" labelSrcRef="labelref_DW" labelType="truth"><alternate>"r"</alternate>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="32" to="32"/>
</hwTraces> </hLevel>
<hLevel level="character" id="CHAR10"><label id="CHAR10label" labelSrcRef="labelref_DW" labelType="truth"><alternate>"s"</alternate>
</label>
<hwTraces><inkml:traceView traceRef="/example-HF05.inkml" from="34" to="34"/>
</hwTraces> </hLevel> </hLevel> </hLevel> </hLevel> </hLevel>
<uiInfo xOrigin="1024" yOrigin="800" xDim="1024" yDim="800" top="true" bottom="true" left="true" right="true"/>
<imgInfo src="/images/barcelonaMap.jpg" id="image-task-43"><!-- A reference to the background image
The region of interest (roi) is the full image (but without the area in which interface controls were
situated).
One could also use for instance the convex hull of the ink trajectory.
--><roi id="roi01" points="0 0 1024 768"/>
<imgPreproc id="image-task-43-prepro"><!-- The image preprocessing steps taken before the image was displayed.
In this case only a scaling of the image was used.
--><scale xFactor="1" yFactor="0.9"/>
</imgPreproc> </imgInfo> </hwData>
Code snippet 4. A hwData element contain hLevel elements of different hierarchy.
3.4. Dataset specifications and annotation schemes in UPX.
The examples given above originate from different datasets. The
Firemaker on/off collection is the result of a significant data
acquisition effort in the context of forensic writer identification. The
ICIS database contains pen input collected in "interactive map"
scenarios. Furthermore, both datasets employ a different annotation
scheme. The first is based on segmentation levels in handwritten
text (paragraphs, lines, words, characters), whereas the second
uses a taxonomy of semantically well-defined units containing
objects, deictic gestures, and text. The information concerned
with the specification of such databases and how
they are segmented/labeled can be specified through the UPX <datasetInfo>
and <datasetDefs>
elements. The exact segmentation and
labeling is defined in the previously described <hwData>
element.
Typically, such information describing a data collection is contained in a location shared by
multiple documents. To support such an organization, many UPX elements
contain a href attribute, which may refer to such shared locations.
For the same reason, most UPX elements have an xsd:ID, which is
used by other elements to refer to.
To assess the UPX 0.9.5 definition, we are currently transforming both datasets from UNIPEN to UPX. The Firemaker on/off collection contains handwriting
from 30 writers. A more detailed
definition of one or more writers can be given via the <writerDefs>
element, which contains a block of writer definitions captured in the <writer>
element. Details about the sources of labels contained in the
annotations are specified in a block of definitions called the <labelSrcDefs>
element. Detailed information about one or more employed
annotation schemes can be specified in a block called <annotationDefs>.Below,
the organization of these blocks in UPX is depicted.

Figure 4. A UPX document contains (i) broad meta-information about the dataset specified in <datasetInfo>, (ii) details about the
dataset organization and content via <datasetDefs>, and (iii) segmentations/labeling of the data specified through <hwData>.
3.4.1 The UPX <datasetInfo> element
In the case of the Firemaker on/off collection,
the <datasetInfo> element contains the following information:
<datasetInfo id="dataSet_Firemaker-On/Off-Natural">
<name>
Firemaker-On/Off-Natural
</name>
<category>
As part of the Wanda project, we collected various multimodal
(i.e. online+offline) data. Subjects were asked to write their
signature, to copy signatures from others and to write various
pages of text (in natural or forged conditions).
The text each writer had to write was the same for all writers.
Texts were designed in cooperation with two forensics labs and
included city names, legal amounts, numbers, etcetera.
This Firemaker-On/Off-Natural collection contains handwriting
acquired in natural writing conditions. Writers (mainly students)
were rewarded with 5 euro and were asked to write the texts as
they normally would write.
</category>
<version>
October 2006
</version>
<contact>
Louis Vuurpijl, vuurpijl@nici.ru.nl
</contact>
<source>
NICI Wanda/Firemaker Project
</source>
<setup>
Each writer wrote on an A4 delineated paper with digital inking pen. The
paper was optimally aligned and taped to the tablet (Wacom A4) to minimize
translation and rotation effects. On each of the four corners of the
paper, writers marked a cross for later calibration purposes. All writers
managed to write all lines of text on a single A4 paper.
</setup>
<dataInfo>
<numWriters>
30
</numWriters>
<quality>
good
</quality>
<style>
mixed
</style>
<groundThruth>
Bob, David and sexy Xantippe....
</groundThruth>
</dataInfo>
</datasetInfo>
Code snippet 5. The specification of <datasetInfo> for the Firemaker on/off collection.
3.4.2. Definition of label sources: the <labelSrcDefs> element.
The process of segmentating an online pen input signal into meaningful
(pieces of) ink tractectories and assigning corresponding labels
to these segments is --- especially when performed by human
annotators --- laborious work. In many data collection efforts,
this process is performed by a team of persons. In increasingly
more cases, this process is supported by automated labelers:
Recognizers, that support the human annotator. In UPX, the sources
of the segmentation and labeling process are specified in the <labelSrcDefs>
element, which contains a block of
definitions of label sources captured in the <labelSrc>
element. In the <labelSrc> element, the type of annotator (person or machine), the
name of the label source, the organization and contact information are specified. Furthermore,
a description can be given of the label source, which is particularly valuable in case of
automated annotators.
3.4.3. Definition of annotation schemes: the <annotationDefs> element.
UPX provides support for specifying multiple annotation schemes. It could, for example, contain definitions for the hierarchical organization of gestures and text from Figure 3 and at the same time, specify traditional text-based schemes like depicted in Figure 1. One or more of such annotation schemes are specified in the <annotationDefs> element. An annotation scheme is described by the
<annotationScheme> element. Such an element contains one or more <annotationLevel> elements, each presenting a specific view at the online traces. As an example, consider the following code below, which describes the annotation scheme employed for specifying Figure 1:
<annotationScheme id="annotationScheme_0">
<annotationLevel name="LINE" rank="1" desc="instance of LINE">
<labelTypes>
<labelName labelFormat="ASCII"> thruth </labelName>
<labelName> quality </labelName>
</labelTypes>
</annotationLevel>
<annotationLevel name="WORD" rank="2" desc="instance of WORD">
<labelTypes>
<labelName labelFormat="ASCII"> thruth </labelName>
<labelName> quality </labelName>
</labelTypes>
</annotationLevel>
<annotationLevel name="CHAR" rank="3" desc="instance of CHAR">
<labelTypes>
<labelName labelFormat="ASCII"> thruth </labelName>
<labelName> quality </labelName>
</labelTypes>
</annotationLevel>
</annotationScheme>
Code snippet 6. The specification of the annotation levels in the Firemaker on/off collection.
4. Join the UPX mailing list
Please, follow this website for news about UPX. If you would like to receive any news or announcements regarding UPX, please send an email to vuurpijl@nici.ru.nl, to subscribe to our upcoming newsletters, or to announce your participation.
5. UPX technical documentation
- Report describing Schema version 0.9.5
- The xml Schema version 0.9.5
v0.9.5 is dated
Thu Oct 19 14:36:47 CEST 2006
- The html version of the Schema version 0.9.5
References