Transcript Document
Geoscience Knowledge Representation Using the SWEET Ontologies Rob Raskin Jet Propulsion Laboratory Transforming Data into Knowledge Data Basic Elements Services Storage Interoperability Volume/Density Statistics Analysis Methodology Information Knowledge Bytes Numbers Models Facts Ingest Archive Visualize Infer Understand Predict File Database HDF-EOS GIS MIS Ontology Mind Syntactic OPeNDAP WMS/WCS Semantic High/Low Low/High Checksum Moments Descriptive Inferential Fourier Wavelet EOF SSA Exploratory-analysis Model-based-mining Syntax Semantics What is Knowledge? Facts, relations, meanings, contexts Organized information Core ingredient in “common sense” Common understanding In a form to apply reasoning/inference Dynamic Expandable Semantic Understanding is Difficult! Sea surface temperature: measured 3 m above surface Sea surface temperature: measured at surface Variable t: temperature Variable t: time Data quality= 5 Let’s eat, Grandma. Let’s eat Grandma. Time flies like an arrow. Fruit flies like a pie. “Mission accomplished. Major combat operations in Iraq have ended” LA Times headline Database vs Knowledge Base Database Entities and Relations Closed world All facts included Knowledge base Classes and Properties Collection of facts Captures corporate memory Open world Facts not stated may be either true or untrue PO.DAAC Knowledge Bases Public access People Roles/Tasks Data Processing Data Products Metadata Tools/ Services Web Pages Science Concepts Missions Instruments Organizations Applications Announcements Inquiries Computers Documents (Docushare) Relations People have roles Instruments measure science parameters Inquiries relate to data products etc. Example of KnowledgeAssisted Service Yellow Page Lookup: cars vs automobiles Hotels vs motels vs resorts Semantic-based Service Example: Google Type into Google: “gymnasiums in Seattle” Google understands that Generates map of Seattle with dots locating gyms Seattle is a place Gymnasiums is a place-based service Google understands semantics so that the search results also could include locations near Seattle Similar services (e.g., health club) Assertion of Facts as Triples Subject-Verb-Object representation Flood subClassOf WeatherPhenomena HDF subClassOf FileFormat Pressure subClassOf PhysicalProperty Ocean AIRS hasSubstance Water measures Temperature Applications Software tools can find “meaning” in resources for Discovery Fusion Lineage … Requirements Data products associated with objects in “science concept space” Data services associated with objects in “service concept space” Richer descriptions than DIFs Richer descriptions than SERFs Search/fusion tools that exploit ontologies Semantic Web Vision Web page creators place XML tags around technical terms on web pages XML tags point to knowledge base where term is “defined” Search tools use this information to provide value-added services Common search engines (Google) use these capabilities only minimally, at present Ontologies Current preferred method to store “facts” General definition: “all that is known” Computer science definition: Machine-readable definition of terms and how they relate to one another As with a dictionary, terms are defined in terms of other terms Provide shared understanding of concepts Support knowledge reuse Support machine-to-machine communications with deeper semantics than controlled vocabulary XML-based Ontology Languages XML satisfies desired properties for language syntax Readable by both humans and machines However, there are too many possible ways that XML tags can be named and used No standardization of XML tag meanings as in HTML (<b> </b> pair => renders in bold) Additional standardized semantics needed to exploit shared understanding of concepts RDF and OWL W3C has adopted languages that specialize XML Resource Description Formulation (RDF) Ontology Web Language (OWL) Languages predefine specific tags RDF: Class, subclass, property, subproperty, … RDF and OWL form a nested collection of languages, each roughly a specialization of the preceding language with further shared understanding XML RDF RDFS OWL Lite OWL DL OWL Full Semantic Web for Earth and Environmental Terminology (SWEET) SWEET is a concept space Enables scalable classification of Earth system science concepts Anybody can import, expand, and specialize the work of others Currently being expanded to Space science No need to regenerate a physics, chemistry, or math ontology Concept space is translatable into other languages/cultures using “sameAs” notions SWEET Ontologies and Their Interrelationships Integrative Ontologies Living Substances Non-Living Substances Faceted Ontologies Natural Phenomena Physical Processes Human Activities Earth Realm Physical Properties Data Space Time Numerics Units SWEET as an Upper Level Earth Science Ontology Math Space Time Physics Chemistry import Property EarthRealm Process, Phenomena Substance Data SWEET import Stratospheric Chemistry Biogeochemistry Specialized domains Why an Upper-Level Ontology for Earth System Science? Many common concepts used across Earth Science disciplines (such as properties of the Earth) Provides common definitions for terms used in multiple disciplines or communities Provides common language in support of community and multidisciplinary activities Provides common “properties” (relations) for tool developers Reduced burden (and barrier to entry) on creators of specialized domain ontologies Only need to create ontologies for incremental knowledge How SWEET was Initially Populated Initial sources GCMD Over 10,000 datasets Over 1000 keywords Data providers submit far more than the 1000 terms for “free-text” search CF Over 500 keywords Very long term names surface_downwelling_photon_spherical_irradiance_in_sea_w ater Decomposed into facets Spatial Ontology Concepts of 0-D, 1-D, 2-D, and 3-D objects Default coordinate system: lat/lon/up Polygons used to store spatial extents Spatial attributes added (population, area, etc.) Scientific applications include: geology to represent 3-D structure Numerical Ontologies Numerics SpatialEntities Extents: interval, point, 0, positiveIntegers, … Relations: lessThan, greaterThan, … Extents: country, Antarctica, equator, inlet, … Relations: above, northOf, … TemporalEntities Extents: duration, century, season, … Relations: after, before, … Numerical Ontologies (cont.) Numeric concepts defined in OWL only through standard XML XSD spec Numerical relations defined in SWEET Intervals defined as restrictions on real line lessThan, max, … Cartesian product (multidimensional spaces) added in SWEET Numeric ontologies used to define spatial and temporal concepts Conceptual Ontologies Phenomena ElNino, Volcano, Thunderstorm, Deforestation) Each has associated, spatial/temporal extent, EarthRealms, PhysicalProperties etc. Specific instances included Human Activities e.g., 1997-98 ElNino Fisheries, IndustrialProcessing, Economics, Public Good State History or state of planet or component SWEET Users ESML- Earth Science Markup Language ESIP - Earth Science Information Partner Federation GEON- Geosciences Network GENESIS- Global Environmental & Earth Science Information System IRI- International Research Institute (Columbia) LEAD- Linked Environments for Atmospheric Discovery MMI- Marine Metadata Initiative NOESIS PEaCE- Pacific Ecoinformatics and Computational Ecology SESDI- Semantically Enabled Science Data Integration VSTO- Virtual Solar-Terrestrial Observatory Collaboration Web Site Discussion tools Version Control/ Configuration Management Trace dependencies on external ontologies Tools to search for existing concepts in registered ontologies Ontology Validation Procedure Blog, wiki, moderated discussion board W3C note is formal submission method Registry/discovery of ontologies Support workflows/services for ontology development Community Issues Content Standards and Conventions Agreement on standards for use of OWL Fuzzy representation conventions Review Board Maintain alignment given expansion of classes and properties Who will oversee and maintain for perpetuity (or at least through the next funding cycle) ESIP Federation? ESSI? Global Support Provide tools to visualize and appreciate the big picture Update/Matching Issues No removal of terms except for spelling or factual errors Must avoid contradictions Additions can create redundancy if sameAs not used Humans must oversee “matching” Subscription service to notify affected ontologies when changes made CF has established moderator to carry out analogous additions OWL “import” imports entire file Associate community with ontology terms Community tagging Best Practices Keep ontologies small, modular Be careful that “Owl:Import” imports everything Use higher level ontologies where possible Identify hierarchy of concept spaces Model schemas Try to keep dependencies unidirectional