VINDEX: Visual Indexing of Objects VINDEX: Visual Indexing of Objects
Human-Machine interaction yielding mutual benefits

NICI contributions to ToKeN2000



Louis Vuurpijl, Lambert Schomaker and Eduard Hoenkamp
NICI/Cognitive Engineering ( NICI/CE)

Abstract

The automatic classification of objects within photographic images is still a far-fetched goal in pattern recognition. However, human users are quite capable of separating objects from the background and naming or labeling such objects. In order to bootstrap content-based image search methods, it may be fruitful to exploit the perceptual/cognitive capabilities in a user population. By allowing users to draw object outlines on photographs by using a pen & digitizer (or mouse) for image annotation and queries, an incremental training set can be collected. The availability of object-based outlines allows for new shape matching schemes, as well as class-dependent contour detection. The rationale for object-based matching and indexing of image collections is that users are predominantly interested in object content, not in layout or abstract visual features. By combining textual and visual annotation methods, multimodal queries can be realized. As a test case, a large image collection of paintings developed at the Rijksmuseum in Amsterdam will be used. The results of the project will consist of (1) interactive tools for the creation of annotated image material, (2) pattern matching tools for the automatic indexing and search of objects on images, (3) conversion tools to generate dynamic web pages from the annotated database, (4) a database of object outlines for use by researchers in pattern recognition, and last but not least, (5) publications in the area of pattern recognition and human-computer interaction.

1  Project Description

In order to improve human-computer interaction, it has become clear that technological development as such is a necessary but insufficient factor determining the success of an information processing system. There are many examples of perfectly logically designed applications which are nonetheless difficult to use by the intended human user group. Apparently, an additional source of constraints is needed to guide the development of software applications which are targeted at a general audience. Alternatively, it is also true that humans are still able to display a wide range of perceptual and cognitive functions which we are just beginning to be able to implement as algorithms. Thus, the use of knowledge on human perception and cognition in the area of intelligent system design may serve a dual purpose: (1) increasing the ease of use of complex systems, and (2) improving machine intelligence. Using insights from the fields of cognitive science and of pattern recognition, the VINDEX project at NICI will concentrate on an exciting area of applications: Image annotation and search in a large database of photographs or scans of paintings. This is a difficult domain, in which both the aspect of interface usability (1) and the utilization of the presence of human cognitive abilities (2) needs to be addressed, benefitting human and machine, respectively. The basic paradigm is outline-based object search, where users indicate the presence of a object by producing a closed curve as the object outline, including additional text annotation. The choice for object-based image search is based on earlier research [2,3,1]:

  1. Full-image template matching yields bad retrieval results
  2. Feature-based matching requires a lot of input and knowledge by the user
  3. Layout-based search only suits a subset of image needs
  4. Reasons behind a retrieved image list are unclear.
  5. Features and matching scheme are not easily explainable to the user

An object-based search method will alleviate many of the problems above. In a collaborative user group, working on a given domain, sufficient examples of objects may be generated to bootstrap a pattern-search algorithm which will find unseen similar objects elsewhere in the database. Within ToKeN2000, five research themes have been identified for common research in this area. Our perspective on the research themes of ToKeN2000 will be as follows, in order of decreasing amount of attention:

1.1  Adaptation and Learning

The theme of adaptation and learning plays an important role in human use of computer applications. However, we will be addressing this theme from the perspective of the machine, in VINDEX. As will be explained later, the goal is to collect large numbers of graphical outlines of objects on images, together with their textual label. Current machine-learning technologies, combined with the currently available computing power 'at the server side' allow for autonomous and adaptive learning processes to take place. For instance, the presence of a large collection of sub images of objects of a single class makes it possible to find a robust representation of that class, using straightforward techniques that were hitherto impractical due to the absence of sufficient numbers of examples and/or limited computing power (e.g. k-Nearest Neighbour). At NICI, there is extensive experience with tree-based data representations, fast search algorithms, and support-vector classifiers. The data which are collected via collaborative image annotation will be used to train an image search system in an incremental fashion.

1.2  Knowledge Enrichment

The envisaged process of annotating multimedial material is an example of knowledge enrichment in itself: Raw pixel images are annotated with text and object-outline curves. However, there is a second aspect of knowledge enrichment, which entails the collaborative character of the annotation process. The concept of having several users work together on the annotation of an image collection over Internet introduces a whole new range of possibilities in knowledge-discovery techniques. More specifically, new operators are needed for merging or splitting the contributions (i.e., content interpretations) of the different users, at the level of the object outlines and at the level of the corresponding textual annotation. In this area, use can be made of algorithms for multiple-expert decision fusion such as voting, weighting and rank combination schemes. A recurrent research question is the role of hand-crafted knowledge representations versus autonomously computed knowledge patterns. Whereas precision in the former approach may be higher, the latter approach is more error prone but much more convenient.

1.3  Control and navigation

The paradigmatic application context for VINDEX is the interactive user of a browser program who is looking for image material, for instance to create a new multimedial document (a report, home page, etc.). The sub theme Control refers to the transfer of responsibility from user to machine. Since content-based image search using pattern recognition techniques is still computationally very intensive, it cannot be guaranteed that all forms of image search can deliver results within a time frame of a minute. In such a case, a query leads to a deferred return of results. It is important to know, in how far users can accept such delays. Will they be able to remember the actual query context in which they needed a particular hit list of images if it arrives after a few days by email? This is a very relevant question, but it will not be a focus of attention in VINDEX . The second sub theme will be addressed in more detail, however. The process of Navigation is essential in image search because it plays a role in (1) the initial search for a subset of a huge database, i.e., navigating towards the right ballpark of possible target images, and (2) the final search through a hit list of images returned by the image retrieval system. It is known that the subjective experience of system quality is largely determined by the ease of navigation at the motoric level (agility) and perceptual level (recognizability of whereabouts). It is one of the goals of VINDEX to be able to design a system in which users themselves can quickly develop traces of image-based navigation, in a much more fluid way than is currently done in, e.g., WWW-page design.

1.4  Delivery and Presentation Technology

Presentation methods are a focus area of VINDEX : The proposed system will return lists of images, and will also be able to generate new (html) documents on the basis of annotated material. Prototypes of new presentation methods will be proposed by VINDEX. These may be elaborated in more detail by project partners in ToKeN2000 who are specialized in the theme. Delivery technologies as such - which are at a lower, more technical level than presentation methods (i.e., network & database architectures) are not a central point of research interest in VINDEX.

1.5  Language Technology

Within the VINDEX project, language technology as such is not a research area. However, there are fundamental relations between image and text content which can be explored together with interested partners in ToKeN2000 . Areas of overlap include: Semantic ontologies and Information Retrieval or text matching techniques. The goal is to allow both text and image-based query methods in VINDEX, , so a cooperation with a language-oriented group will be considered fruitful, indeed.

2  Project Goals

To introduce our envisaged system [2,3], consider the situation of a user searching for images containing certain objects like, e.g., ``I am looking for pictures containing 17th century candlesticks, positioned on a table''.

Initially, no images contained in the database will be annotated, so image retrieval will be performed via textual queries only. We assume that the other partners will contribute browsing and some standard querying techniques for this purpose. Given one or more retrieved images, a collaborative user may annotate the images by outlining and specifying the objects (e.g., candlesticks, musical instruments) it contains. Subsequently, the database will contain an ever-growing amount of annotated information, i.e. it contains a set of objects, distinguishable in object classes. Both the number of objects and the number of classes will grow. Images containing these objects can be retrieved using standard retrieval techniques. An example of such an image retrieval system will be demonstrated. Given such an annotated database, novel image retrieval techniques can be applied, comprising template matching and outline matching techniques. Using the increasing set of annotated objects: (1) image retrieval techniques can be continuously trained and improved, (2) the information needs of users of the system can be further classified and (3) the user-interface can be continuously improved and adapted to the information needs. In the proof of concept of multimedial information retrieval, a user will be able to (1) browse the database, searching for information using standard keyword-based retrieval techniques based on information indexing, and (2) browse the database, searching for objects already annotated and (3) search for images matching a set of specifications, using the proposed novel image retrieval techniques.

The goals of the project are as follows:

  1. research and development of interactive tools for the creation of annotated image material;
  2. research and development of pattern matching tools for the automatic indexing and search of objects on images;
  3. development of conversion tools to generate dynamic web pages from the annotated image database;
  4. collection of an open database of object outlines for use by researchers in pattern recognition by allowing collaborative annotation by a large group of users via Internet;
  5. generating publications via in the area of pattern recognition and human-computer interaction.

At the end of the first phase, the stage is set for a number of PhD projects, addressing fundamental problem areas at the level of human-computer interaction, at the level of cognitively plausible and computationally tractable representations of image and text, and at the level of pattern recognition, which is fully compatible with the interdisciplinary nature of research at NICI. The collected data will provide fruitful training and test sets for the PhD projects.

3  Deliverables

  1. outline2html: a conversion tool to automatically generate dynamic web (mouse-over) pages from an annotated image database (3 months)

  2. Web site for image annotation, including login, user database, annotation database, original image set is derived from the Rijksmuseum CDROM. A set of pilot experiments will be run with selected subjects (6 months).

  3. pattern matching tools for the automatic indexing and search of objects on images: outline based and pixel-content based. This includes a new adapted method for fast search in feature-vector lists.

  4. an open database of object outlines ( > 12 months)

  5. outline-to-edge matching algorithms based on the training outlines provided by the users (12 months)

4  Cooperation with other groups

Cooperation with other groups can be envisaged as follows. Input. NICI is interested in sharing image representation and feature schemes with Maastricht and Delft. Output. The outline-based software may be interesting to the hypermedia group (Amsterdam), as it constitutes a specialized mode of Presentation/Navigation. Input/Output. Cooperation with Leiden will yield a standardized approach for semantic object representation. The outline-based user interface allows for new query modes in generating text by pointing to image components in sequence. Furthermore, the outline drawing method is generic such that the video-desk based interface developed at IPO may serve as one of the input modes.

5  Work Plan

Month Activity Output
1-3 basic system architecture web site for annotation
4-6 pilot experiments with subjects technical report
7-9 development of 1st generation matcher preliminary S/W version
10-12 dev. of outline-to-edge matcher final test bed software

6  Personnel

Person Task fte months
Dr. L.R.B. Schomaker supervision patrec. 0.2 12
Dr. Ir. E. Hoenkamp supervision semantics 0.2 12
Dr. L. Vuurpijl postdoc, system architecture 1.0 12
Dr. F. Wang pattern recognition 1.0 12
Dhr. E. Heyne system support 0.1 12

7  Infrastructure

The computing infrastructure consists of a Sun Ultra SPARC II, a HP-UX 700 workstation, and a number of Pentium systems (133-400 MHz) running Linux. For the fast image matching, a dedicated 'number cruncher' will have to be purchased in the form of a bare bones Pentium 800+ MHz or G4 computing server running Linux. There is a permanent web site for ToKeN2000. .

References

[1]
R.W. Picard. Light-years from Lena: Video and image libraries of the future. In Proceedings of the International Conference on Image Processing (ICIP), volume I, pages 310-313, October 1995.

[2]
L. Schomaker, L. Vuurpijl, and E. de Leau. New use for the pen: outline-based image queries. In ICDAR'99 Fifth Conference on Document Analysis and Recognition, pages 293-296. IEEE Computer Society, September 1999.

[3]
L. Schomaker, L. Vuurpijl, and E. de Leau. Using pen-based outlines for object-based queries. In Visual Information and Processing Systems, pages 585-592. Springer Verlag, 1999.


File translated from TEX by TTH, version 2.51.
On 22 Mar 2000, 15:58.