Louis Vuurpijl, Lambert Schomaker and Eduard Hoenkamp
NICI/Cognitive Engineering ( NICI/CE)
In order to improve human-computer interaction, it has become clear that technological development as such is a necessary but insufficient factor determining the success of an information processing system. There are many examples of perfectly logically designed applications which are nonetheless difficult to use by the intended human user group. Apparently, an additional source of constraints is needed to guide the development of software applications which are targeted at a general audience. Alternatively, it is also true that humans are still able to display a wide range of perceptual and cognitive functions which we are just beginning to be able to implement as algorithms. Thus, the use of knowledge on human perception and cognition in the area of intelligent system design may serve a dual purpose: (1) increasing the ease of use of complex systems, and (2) improving machine intelligence. Using insights from the fields of cognitive science and of pattern recognition, the VINDEX project at NICI will concentrate on an exciting area of applications: Image annotation and search in a large database of photographs or scans of paintings. This is a difficult domain, in which both the aspect of interface usability (1) and the utilization of the presence of human cognitive abilities (2) needs to be addressed, benefitting human and machine, respectively. The basic paradigm is outline-based object search, where users indicate the presence of a object by producing a closed curve as the object outline, including additional text annotation. The choice for object-based image search is based on earlier research [2,3,1]:
An object-based search method will alleviate many of the problems above. In a collaborative user group, working on a given domain, sufficient examples of objects may be generated to bootstrap a pattern-search algorithm which will find unseen similar objects elsewhere in the database. Within ToKeN2000, five research themes have been identified for common research in this area. Our perspective on the research themes of ToKeN2000 will be as follows, in order of decreasing amount of attention:
The theme of adaptation and learning plays an important role in human use of computer applications. However, we will be addressing this theme from the perspective of the machine, in VINDEX. As will be explained later, the goal is to collect large numbers of graphical outlines of objects on images, together with their textual label. Current machine-learning technologies, combined with the currently available computing power 'at the server side' allow for autonomous and adaptive learning processes to take place. For instance, the presence of a large collection of sub images of objects of a single class makes it possible to find a robust representation of that class, using straightforward techniques that were hitherto impractical due to the absence of sufficient numbers of examples and/or limited computing power (e.g. k-Nearest Neighbour). At NICI, there is extensive experience with tree-based data representations, fast search algorithms, and support-vector classifiers. The data which are collected via collaborative image annotation will be used to train an image search system in an incremental fashion.
The envisaged process of annotating multimedial material is an example of knowledge enrichment in itself: Raw pixel images are annotated with text and object-outline curves. However, there is a second aspect of knowledge enrichment, which entails the collaborative character of the annotation process. The concept of having several users work together on the annotation of an image collection over Internet introduces a whole new range of possibilities in knowledge-discovery techniques. More specifically, new operators are needed for merging or splitting the contributions (i.e., content interpretations) of the different users, at the level of the object outlines and at the level of the corresponding textual annotation. In this area, use can be made of algorithms for multiple-expert decision fusion such as voting, weighting and rank combination schemes. A recurrent research question is the role of hand-crafted knowledge representations versus autonomously computed knowledge patterns. Whereas precision in the former approach may be higher, the latter approach is more error prone but much more convenient.
The paradigmatic application context for VINDEX is the interactive user of a browser program who is looking for image material, for instance to create a new multimedial document (a report, home page, etc.). The sub theme Control refers to the transfer of responsibility from user to machine. Since content-based image search using pattern recognition techniques is still computationally very intensive, it cannot be guaranteed that all forms of image search can deliver results within a time frame of a minute. In such a case, a query leads to a deferred return of results. It is important to know, in how far users can accept such delays. Will they be able to remember the actual query context in which they needed a particular hit list of images if it arrives after a few days by email? This is a very relevant question, but it will not be a focus of attention in VINDEX . The second sub theme will be addressed in more detail, however. The process of Navigation is essential in image search because it plays a role in (1) the initial search for a subset of a huge database, i.e., navigating towards the right ballpark of possible target images, and (2) the final search through a hit list of images returned by the image retrieval system. It is known that the subjective experience of system quality is largely determined by the ease of navigation at the motoric level (agility) and perceptual level (recognizability of whereabouts). It is one of the goals of VINDEX to be able to design a system in which users themselves can quickly develop traces of image-based navigation, in a much more fluid way than is currently done in, e.g., WWW-page design.
Presentation methods are a focus area of VINDEX : The proposed system will return lists of images, and will also be able to generate new (html) documents on the basis of annotated material. Prototypes of new presentation methods will be proposed by VINDEX. These may be elaborated in more detail by project partners in ToKeN2000 who are specialized in the theme. Delivery technologies as such - which are at a lower, more technical level than presentation methods (i.e., network & database architectures) are not a central point of research interest in VINDEX.
Within the VINDEX project, language technology as such is not a research area. However, there are fundamental relations between image and text content which can be explored together with interested partners in ToKeN2000 . Areas of overlap include: Semantic ontologies and Information Retrieval or text matching techniques. The goal is to allow both text and image-based query methods in VINDEX, , so a cooperation with a language-oriented group will be considered fruitful, indeed.
To introduce our envisaged system [2,3], consider the situation of a user searching for images containing certain objects like, e.g., ``I am looking for pictures containing 17th century candlesticks, positioned on a table''.
Initially, no images contained in the database will be annotated, so image retrieval will be performed via textual queries only. We assume that the other partners will contribute browsing and some standard querying techniques for this purpose. Given one or more retrieved images, a collaborative user may annotate the images by outlining and specifying the objects (e.g., candlesticks, musical instruments) it contains. Subsequently, the database will contain an ever-growing amount of annotated information, i.e. it contains a set of objects, distinguishable in object classes. Both the number of objects and the number of classes will grow. Images containing these objects can be retrieved using standard retrieval techniques. An example of such an image retrieval system will be demonstrated. Given such an annotated database, novel image retrieval techniques can be applied, comprising template matching and outline matching techniques. Using the increasing set of annotated objects: (1) image retrieval techniques can be continuously trained and improved, (2) the information needs of users of the system can be further classified and (3) the user-interface can be continuously improved and adapted to the information needs. In the proof of concept of multimedial information retrieval, a user will be able to (1) browse the database, searching for information using standard keyword-based retrieval techniques based on information indexing, and (2) browse the database, searching for objects already annotated and (3) search for images matching a set of specifications, using the proposed novel image retrieval techniques.
The goals of the project are as follows:
At the end of the first phase, the stage is set for a number of PhD projects, addressing fundamental problem areas at the level of human-computer interaction, at the level of cognitively plausible and computationally tractable representations of image and text, and at the level of pattern recognition, which is fully compatible with the interdisciplinary nature of research at NICI. The collected data will provide fruitful training and test sets for the PhD projects.
Cooperation with other groups can be envisaged as follows. Input. NICI is interested in sharing image representation and feature schemes with Maastricht and Delft. Output. The outline-based software may be interesting to the hypermedia group (Amsterdam), as it constitutes a specialized mode of Presentation/Navigation. Input/Output. Cooperation with Leiden will yield a standardized approach for semantic object representation. The outline-based user interface allows for new query modes in generating text by pointing to image components in sequence. Furthermore, the outline drawing method is generic such that the video-desk based interface developed at IPO may serve as one of the input modes.
| Month | Activity | Output |
| 1-3 | basic system architecture | web site for annotation |
| 4-6 | pilot experiments with subjects | technical report |
| 7-9 | development of 1st generation matcher | preliminary S/W version |
| 10-12 | dev. of outline-to-edge matcher | final test bed software |
| Person | Task | fte | months |
| Dr. L.R.B. Schomaker | supervision patrec. | 0.2 | 12 |
| Dr. Ir. E. Hoenkamp | supervision semantics | 0.2 | 12 |
| Dr. L. Vuurpijl | postdoc, system architecture | 1.0 | 12 |
| Dr. F. Wang | pattern recognition | 1.0 | 12 |
| Dhr. E. Heyne | system support | 0.1 | 12 |
The computing infrastructure consists of a Sun Ultra SPARC II, a HP-UX 700 workstation, and a number of Pentium systems (133-400 MHz) running Linux. For the fast image matching, a dedicated 'number cruncher' will have to be purchased in the form of a bare bones Pentium 800+ MHz or G4 computing server running Linux. There is a permanent web site for ToKeN2000. .