Transcript Document
Vocabulary Matching for Book Indexing Suggestion in Linked Libraries – A Prototype Implementation & Evaluation Antoine Isaac, Dirk Kramer, Lourens van der Meij, Shenghui Wang, Stefan Schlobach, Johan Stapel Problem: subject indexing • Describing subjects of books • Using concepts from vocabularies (e.g. thesauri) Problem: re-indexing • Describing a book that has already be described • With a new vocabulary – Fitting a different context (e.g., different libraries) Why re-indexing at KB? • The Dutch National Library (KB) holds many books that are also in other Dutch public libraries • KB deposit uses Brinkman thesaurus for indexing • Public Libraries use Biblion thesaurus overlap between book collections Biblion Brinkman Dutch Public Libraries KB Deposit Collection A wider issue • KB shares books with many other libraries • All having their own description practices Doelgroep -audience BISAC subject codes other classifications NBC class. DDC Dewey decimal class. domain/ discipline classifications Brinkman GTT LCSH subject headings RAMEAU subject headings SWD subject headings subject thesauri / subj. heading lists KB Deposit Coll. KB Scientific Coll. LC (US Nat. Lib) BnF (French Nat. Lib) DNB (German Nat. Lib) book collection datasets LC authority file Autorités BNF Personen namen datei person/ corporation data NUR UNESCO class. Biblion Dutch Public Libraries Dutch Booktrade KB overlap between book collections (thickness indicates degree of overlap) Vertical adjustment between a coll. and KOSs denotes KOSs' being used to describe that coll. KB Corporatie + Persoon Room for improvement? • Libraries devote large resources to indexing – 20 people at KB – About 20,000 books per year • Leveraging already existing descriptions for reindexing can be beneficial for both sides Alignment and re-indexing • STITCH project – Tackling semantic interoperability in Cultural Heritage – Using ontology alignment • Mappings between concepts from different vocabularies can be used for re-indexing Basic idea: replace concepts in descriptions by conceptually equivalent concepts Goal: a re-indexing prototype • Past: preliminary experiments with KB data • Now: building a prototype and – plugging it onto the KB production system – having it evaluated by its potential users (indexers) • Prototype case: Dutch public libraries / KB Suggesting Brinkman subjects based on Biblion ones Alignment and re-indexing: requirements Subjects can be complex • Mappings between groups of concepts "Travel guides" + "Spain" → "Spain; travel guides" Concepts are used in descriptions • Mappings taking into account extensional semantics "Building engineering" → "Learning material ; building engineering" Obtaining re-indexing rules • Lexical alignments are not good enough • Probabilistic rules are calculated – Using extension of concepts: existing indexing – Simple probabilities, with adhoc adjustment "Travel guides","Spain"→"Spain; travel guides", 0.982 • Not only based on Biblion subjects – AUT – main authors of books – KAR – “characteristic” – DGP – intellectual level/target group Demo Doesn't work? User study • Quantitative aspect – How well does the tool compare to human subject indexing? • Qualitative aspect – User satisfaction – Improvement suggestion Evaluation setting • • • • 6 indexers 6 weeks 284 books Evaluation integrated in daily indexing work • Pre-evaluation briefing • Questionnaire during evaluation • Post-evaluation de-briefing & questionnaire User study results Suggestion class # suggestions precision recall blue 308 72.7% 47.9% purple 1,188 10.7% 27.1% red 2,525 1.11% 5.98% non suggested 89 19.0% • Top ranked mappings are indeed much better • Individual book satisfaction level > 70% User study results (1) • But the general satisfaction is lower – Only two out of six would use the tool as such • Quality of suggestions – Lower-level suggestions are often not meaningful • Perception of suggestions' quality – Long lists with wrong suggestions ad the end are bad – Ranking is appreciated, but it is not enough User study results (2) Suggestions were found promising • Bridging the indexing gap between collections – Different indexing strategies "Persian language" (Biblion) vs. "Iranian language and literature" (Brinkman) Lots of suggestions for improvement • More re-indexing! – Suggesting concepts from other vocabularies – More context metadata as input Conclusions • Shows the potential of re-using data in a library network • Alignment approach fitting indexing practice • Concrete demonstration, in KB production environment • Technology transfer: KB wants to continue efforts • Flexibility: architecture ready to exploit other vocabularies – Linked data & SKOS Prototype components GGC cataloguing system STITCH script (VisualBasic) WinIBW cataloguing interface suggestion service (SWI-Prolog) lexical alignments Sesame RDF store Sesame SKOS RDF store Indexer IE STITCH stylesheet (XSLT) vocabulary service (Java/Tomcat) LOD SPARQL endpoints Linked libraries? Doelgroep -audience BISAC subject codes other classifications NBC class. DDC Dewey decimal class. domain/ discipline classifications Brinkman GTT LCSH subject headings RAMEAU subject headings SWD subject headings subject thesauri / subj. heading lists KB Deposit Coll. KB Scientific Coll. LC (US Nat. Lib) BnF (French Nat. Lib) DNB (German Nat. Lib) book collection datasets LC authority file Autorités BNF Personen namen datei person/ corporation data wikipedia .de others NUR UNESCO class. Biblion Dutch Public Libraries Dutch Booktrade KB existing KOS alignment potential KOS alignment of interest KB Corporatie + Persoon overlap between book collections (thickness indicates degree of overlap) LCSH currently available entry point to the LOD cloud Vertical adjustment between a coll. and KOSs denotes KOSs' being used to describe that coll. wikipedia .nl Thank you! • Questions? Screenshots WinIBW production tool STITCH suggestion tool Original metadata Concept suggestions Comparing with human re-indexing Complement: lexical alignments Adding subjects using thesaurus access Concept suggestions Saving and back to WinIBW Screenshots • Back