CHOICE Home
The CHOICE project is part of the CATCH program of the NWO. See this document for details regarding CATCH.
The CATCH research programme will develop key technology to ensure continuous access to the cultural riches of the world. The CHOICE project seeks to chart the uncharted information landscape, focusing on semi-automatic semantic annotation and employing context information.
Semantic annotation involves the annotation of archived objects, such as video, images and books with semantic categories from some standardized metadata repository, such as domain thesauri and ontologies. The use of semantic annotation allows one to widen the search facilities in a collection. For example, annotating a photograph with the semantic category "bed" (in the sense of: to sleep in) from the WordNet thesaurus makes it possible to search for "sleeping beds" while not retrieving other "beds" such as "river beds". As most thesauri have a hierarchical broader/narrower structure, it also makes it possible to generalize or specialize a query in semantic terms: e.g. retrieving photographs of "cribs' (a narrower semantic category) when searching for beds in the "sleeping" sense. Hyvonen (2003) describes an example of a working system in the cultural heritage domain that allows semantic search.
The driving use case of this project is the Sound and Vision video archive. The objective is to show
how semantic annotation can be supported in the archiving process by exploiting the available context information and
to show how these annotations can subsequently be used to improve search facilities.
Hollink et al. (2003) show that linking a number of diverging thesauri to an annotation application for images of paintings can improve both the semantic annotation process for human annotators and the search process. In the CHOICE project, the annotation application developed by Hollink et al. will be adjusted for video annotation. The aim is to construct a video annotation system based on a shared annotation structure (in the Sound and Vision case: iMMix), allowing annotators to mark up video with relevant semantic categories from multiple thesauri relevant for the field
At the moment automatic techniques for video analysis are still of limited value for the derivation of semantic categories (e.g., Hollink et al., 2004). On the other hand, manual semantic annotation is time-consuming. Therefore, this project will focus on speeding up the manual annotation process by applying natural language processing (NLP) techniques to generate candidate semantic categories that appear in the selected thesauri from (textual)context information. Context information provides peripheral insights into an object; how it was perceived, how it was created, how it relates to other objects made during the same era and so on. Having access to these sources enables users to expand their explorations into greater depth. In the audiovisual realm, examples of sources to be somehow linked to objects include: commentary sheets, external reviews, broadcast schedules, viewer ratings and awards. Within CHOICE, possibly relevant statements and setting descriptions from the textual context information will be offered to the human annotator for approval or rejection. Whether a fragment of the context information is (possibly) relevant for semantic annotation is determined by checking whether concepts from relevant thesauri or from the metadata belonging to the video occur in it. Machine learning and statistical methods for natural language processing and information extraction are applied to determine which terms from fragments or sentences will be used in the statements that are offered to the annotator (Hearst (1999), Jackson and Moulinier (2002), Mitchell (1999)). For the development of a semantic-annotation system for video annotation the following research issues need to be tackled:
How should the annotation interface for images, as developed by Hollink et al., be adapted to video annotation? In this Sound and Vision case this means integrating the iMMix model into the annotation architecture and incorporating facilities for video browsing and searching, and viewing context information.
Which thesauri and/or ontologies can be used as repositories of relevant semantic categories for archive search? Typical example corpora could be WordNet, a geographical thesaurus such as TGN, and the "Gemeenschappelijke Thesaurus Audiovisuele Archieven" developed by Sound and Vision and the Filmmuseum.
How can these thesauri/ontologies be partially mapped/integrated? This issue will build upon the work in the CATCH project STITCH project, also carried out within the CATCH framework.
How can we use NLP and learning techniques to derive relevant semantic categories from the text? There is a link here to the MITCH project of CATCH.
How can these semantic categorization techniques be used to support the search process? For example, when searching for video fragments about Limburg, one could use TGN to find geographical parts of Limburg (towns, rivers, lakes, mountains) to enhance the search. As another example, when searching for videos about "crime" it should be possible to find fragments about "murder".
Scoping remarks:
- Allowing all visitors and experts to add additional (semantic) annotation is a avid voluntary cataloguers who will find surprising ways to mine and exploit the treasure trove offered. However, conducting extensive research in this topic is expected to be out of scope for this particular project. - Integration into the Sound and Vision business process is strictly speaking not part of the project. However, the project will consider business-integration issues that have a general flavor, such as the storage of the actual context information objects and the storage of resulting annotations.
Scientific approach and methodology
The proposed research is methodological. It is aimed at exploiting the possibilities of combining semantic categorization techniques with techniques for natural language processing to make possible semi-automatic semantic annotation. The NLP techniques are provided with relevant concepts (e.g. from thesauri, term lists and metadata) to focus the processing. Thus, the research is not aimed at developing new techniques for natural language processing but on applying existing techniques in a goal-oriented way. The project will build on existing open standards for data and metadata representation, such as XML and RDF/OWL.
Scientific relevance
The CHOICE project will explore a novel combination of existing semantic categorization techniques and NLP techniques in the context of semantic video annotation. These techniques will be useful in all situations were there are textual annotations of multimedia material and also a set of relevant (possibly heterogeneous) thesauri and/or ontologies. This is a common theme in the cultural-heritage setting. Almost all collections have been annotated with text. In some collections there is some degree of formality because characteristics have already been described with standardized metadata repositories such as AAT. But even in those collections the textual parts may contain relevant parts suitable for semantic search. For example, in painting collections the subject of the painting is typically only described with an informal piece of text. The techniques developed in thIs project could thus help making semantic subject search possible. A possible use case could be: searching for paintings about fruit will retrieve paintings about apples, pears, grapes, etc.
Related work
CHOICE is a project on the intersection of semantic annotation and natural language processing with an emphasis on (semi-automatic) semantic annotation. CHOICE builds on several projects and work groups the project members are and were involved in with respect to the Semantic Web (e.g, W3C SWBPD27), semantic annotation (Hollink et al., 2003, Schreiber et al. 2001), video annotation (IMMix 28), semantics-based presentation (CHIME29, Topia30) and semantic interoperability (Wittenburg et al. 2004a; CHOICE Bibliography#Wittenburg2004b). Semantic annotation is studied in the semantic-web research field. Both manual techniques and automatic techniques are being used. Annotea31 is a W3C project targeted at baseline semantic annotation. The CREAM toolset (Handschuh and Staab, 2002b) provides a mix of manual and semi-automatic annotation techniques. The Armadillo approach (Ciravegna et al., 2004) is mainly aimed at using automatic (natural-language) techniques for constructing semantic annotations. These efforts are mainly aimed at text documents. There is relatively little work on semantic annotation of multimedia documents. One of the few examples in the PhD work of Troncy (2003), who did a case study with the archives of INA, the French equivalent of Sound and Vision. A good overview of current research on semantic annotation van be found in the proceedings of recent Semantic Annotation and Knowledge Markup Workshops (Handschuh et al., 2002a, CHOICE Bibliography#Handschuh2003).
Hyvonen et al. (2003) describe work related to CHOICE an STITCH in the cultural heritage domain. The joint Finnish national museum network developed by the University of Helsinki and The Helsinki Institute for Information Technology HIIT has recently been taken into trial use. The system is based on semantic web technology being seemingly the first of its kind in the world. This project is unique in that it includes a semantic data search system connecting the various collections with each other.
Main project location
Nederlands Instituut voor Beeld en Geluid
