Transcript Tiina Ison
National Library of Finland Metadata in the Digitisation Process Cultural unity and diversity of the Baltic Sea Region – common history, different languages, mixed culture Helsinki, 21st–22nd October 2010 Tiina Ison, Senior Analyst, National Library of Finland Outline 1. Front End - National Digital Library and Long Term Preservation (KDK/PAS) 2. Back End - Digitisation Production Process, METS Profiles 3. Descriptive Metadata 4. Administrative/Technical Metadata 5. Structural Metadata 6. Wrapping things together: METS Profile 7. Processes towards distrubed work, crowd soucing, annotaiton and ontologies 1. Frond End: National Digital Library and LongTerm Preservation Infrastructure Infrastructure Intiatives: National Digital LibraryNational Long-Term Prservation http://www.kdk2011.fi Rights Management ... METS profiles Libraries / Archives / Museums BACK END SYSTEMS In their digitisation production memory institutions produce authentic, trustworthy digitised content and collections OPM-KD Project 2007-2009, digitisation production revewed http://www.kansalliskirjasto.fi/extra/vanhat_bulletinit/b ulletin09/article6.html Ministry of Education www.kdk2011.fi/fi/tietoa-hankkeesta www.minedu.fi Kansallisen Digitaalisen Kirjaston Arkkitehtuuri http://www.kdk2011.fi/images/stories/Kokonaisarkkitehtuuri-yleiskuva-fi_iso.jpg 2. Back End: Digitisation Production Processes, METS Profiles Articles Illustrations Poems LEVEL OF MARK UP Structural metadata METS, ALTO POST PROCESSING Administrative/technical metadata MIX/PREMIS CATALOUGING Newspapers Serials Books Parchments Notes Maps SOURCE MATERIAL Audio PHYSICAL COLLECTIONS Standards & OAI-PMH complient METS SIP packages METS EXPORT Packesges include: JPEG2000 SCANNING Descriptive metadata DIGITAL RESOURCE COMPREHENSIVE DIGITIAL COLLECTIONS MARC21/MODS Two Bibliographic Records OCR TXT as ALTO XML PDF JPEG(150) METSXML MARCXML 3. Descriptive Metadata Catalogued Items Un-catalogued Items – Minimal bibligraphic record Bar Code ID’s – Unique ID’s for Physical Items Ingest of bibliographic metadata into digitisation produciton MARC21 conversion into MARCXML (MODS) Two bibliographic recrods – physical and digital (link 776) Post cataloguing for minimal records Enrichmnent of catalogue CATALOUGING 4. Administrative/Technical Metadata SCANNING An XML Schema designed for expressing technical metadata for digital still images Technical Metadata for Digital Still Images - (NISO Z39.87 Data Dictionary) MIX: Image width, Color space, color profile, Scanner metadata, Digital camera settings Preservation Metadata/Premis (information about actions on object, on even, on technical environment) Rights Metadata (access restriction) Persistent ID’s 5. Structural Metadata Navigation, use and access ? Logical Structure Physical Structure METS structMap – relatinships between parts POST PROCESSING 6. Level of Structural Mark Up LEVEL OF MARK UP Material types books , serials, newspaoers, audio, projects Granularity - different level of structural mark up - i.e. article, illustration, poem Granularity - all material types: pages, footnotes, running title, tables, advertisemnts, image (captions and categories) Labour intensive Phased approach in production Crowd sourcing 7. Wrapping things together; METS Profiles METS profiles for different material types • monographs, serials, newspapers, audio… Export files : JPEG2000, lossless, PDF, OCR TXT as ALTO XML, JPEG (150dpi), METSXML and MARCXML METS container or wrapper provides a SIP package for delivery and exchange of digital objects accross systems that is OAI-PMH compliant. Wraps descriptive, administrative and structural metadata + PREMIS. • MODS and MARCXML for descriptive and bibliographical metadata (http://www.loc.gov/standards/mods/) (http://www.loc.gov/standards/marcxml/) • MIX for image technical metadata (http://www.loc.gov/standards/mix/) • PREMIS for preservation metadata (http://www.loc.gov/standards/premis/) (standardi salkku) 8. Processes towards distributed work, crowd sourcing, annotation and ontolgies Content and context as part of digitisation processes… Automatic and semiautomatic proccess for data extraction … Distributed work processes i.e. for: •Mark up level •OCR correction •Controlled annotation •Social tagging OCR Correction THANK YOU