Foundations II: Ontology Engineering Class Session 3 Deborah McGuinness and Joanne Luciano with Peter Fox and Li Ding CSCI-6962-01 September 20, 2010

Download Report

Transcript Foundations II: Ontology Engineering Class Session 3 Deborah McGuinness and Joanne Luciano with Peter Fox and Li Ding CSCI-6962-01 September 20, 2010

Foundations II: Ontology
Engineering
Class Session 3
Deborah McGuinness and Joanne Luciano
with Peter Fox and Li Ding
CSCI-6962-01
September 20, 2010
1
Review of reading Assignment
• Semantic Web for the Working Ontologist
(Allemang and Hendler), first few chapters.
• Rector et al. OWL Pizzas: Practical
Experience of Teaching OWL-DL: Common
Errors & Common Patterns.
• Any comments, questions?
• Homework assignment due at 1200h today
2
Semantic Web Methodology and
Technology Development Process
•
•
Establish and improve a well-defined methodology vision for
Semantic Technology based application development
Leverage controlled vocabularies, et c.
Rapid
Open World:
Evolve, Iterate, Prototype
Redesign,
Redeploy
Leverage
Technology
Infrastructure
Adopt
Science/Expert
Technology
Approach Review & Iteration
Use Tools
Analysis
Use Case
Small Team,
mixed skills
Develop
model/
ontology
3
Semantic Web Layers
4
http://www.w3.org/2003/Talks/1023-iswc-tbl/slide26-0.html, http://flickr.com/photos/pshab/291147522/
Ontology Spectrum
An ontology specifies a rich description of the
•
•
•
•
Terminology, concepts, nomenclature
Properties explicitly defining concepts
Relations among concepts (hierarchical and lattice)
Rules distinguishing concepts, refining definitions and relations
(constraints, restrictions, regular expressions)
relevant to a particular domain or area of interest.
www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html
slide from Kendall/McGuinness SemTech Tutorial
Ontology-based Technologies
• Ontologies provide a common vocabulary for use by independently
developed resources, processes, services
• Agreements among organizations sharing common services can be made
with regard to their usage; the meaning of relevant concepts can be
expressed unambiguously
• By composing / mapping ontologies and mediating terminology across
participating events, resources and services, independently-developed
services can work together to share information and processes consistently,
accurately, and completely
• Ontologies also ensure
– Valid conversations among agents to collect, process, fuse, and
exchange information
– Accurate searching by ensuring context using concept definitions and
relations instead of/in addition to statistical relevance of keywords
slide from Kendall/McGuinness SemTech Tutorial 2008
Background Knowledge
• We need to provide machine understandable
encodings of terms that are used in
applications
• Approaches to drive ontology creation
– Bottom up (using data from databases or
scraping)
– **Mid level (using use cases and knowledge of
the subject area)
– Top down (using foundational or upper level
ontologies and building “down”)
Reuse existing knowledge
• Standards exist in most domains; many of which are
overlapping
• Identify the set that is most relevant to the problem
and business issue
• A component-based approach helps deal with
overlapping standards; complex relationships can
and must be defined such that term usage and
overlap is unambiguous and machine interpretable
• Brainstorming with domain experts can be useful to
start; then refine and iterate to the level required by
the application
adapted from Kendall/McGuinness SemTech Tutorial
2009
Use Case Example
• We will look at one example use case and the
thought process involved in generating the
plan for the ontology encoding from the
SESDI project (Semantically-Enabled
Scientific Data Integration)
Selected VxyO Motivation: Mt. Spurr,
AK. 8/18/1992 eruption, USGS
http://www.avo.alaska.edu/image.php?id=319
Eruption cloud movement from
Mt.Spurr, AK,1992
USGS
Tropopause
http://aerosols.larc.nasa.gov/volcano2.swf
Atmosphere Use Case
• Determine the statistical signatures of both
volcanic and solar forcings on the height of
the tropopause
From paleoclimate researcher – Caspar Ammann – Climate and Global
Dynamics Division of NCAR - CGD/NCAR
Layperson perspective:
- look for indicators of acid rain in the part of the atmosphere we
experience…
(look at measurements of sulfur dioxide in relation to sulfuric acid after
volcanic eruptions at the boundary of the troposphere and the stratosphere)
Nasa funded effort with Fox - NCAR, Sinha - Va. Tech, Raskin – JPL,
McGuinness
Gather the Thought Process
•
•
•
•
Ask which questions are being focused on
Ask for an answer to the questions
Ask how the questions are answered
Ask for criteria for a “good” answer
•
•
•
•
•
•
•
•
•
Use Case detail: A volcano erupts
Preferentially it’s a tropical mountain (+/- 30 degrees of the equator) with ‘acidic’ magma;
more SO2, and it erupts with great intensity so that material and large amounts of gas are
injected into the stratosphere.
The SO2 gas converts to H2SO4 (Sulfuric Acid) + H2O (75% H2SO4 + 25% H2O). The half
life of SO2 is about 30 - 40 days.
The sulfuric acid condensates to little super-cooled liquid droplets. These are the volcanic
aerosol that will linger around for a year or two.
Brewer Dobson Circulation of the stratosphere will transport aerosol to higher latitudes. The
particles generate great sunsets, most commonly first seen in fall of the respective
hemisphere. The sunlight gets partially reflected, some part gets scattered in the forward
direction.
Result is that the direct solar beam is reduced, yet diffuse skylight increases. The scattering is
responsible for the colorful sunsets as more and more of the blue wavelength are scattered
away.in mid-latitudes the volcanic aerosol starts to settle, but most efficient removal from the
stratosphere is through tropopause folds in the vicinity of the storm tracks.
If particles get over the pole, which happens in spring of the respective hemisphere, then they
will settle down and fall onto polar ice caps. Its from these ice caps that we recover annual
records of sulfate flux or deposit.
We get ice cores that show continuous deposition information. Nowadays we measure sulfate
or SO4(2-). Earlier measurements were indirect, putting an electric current through the ice and
measuring the delay. With acids present, the electric flow would be faster.
What we are looking for are pulse like events with a build up over a few months (mostly in
summer, when the vortex is gone), and then a decay of the peak of about 1/e in 12 months.
The distribution of these pulses was found to follow an extreme value distribution (Frechet)
with a heavy tail.
Use Case detail: … climate
•
•
•
•
•
•
•
•
•
•
So reflection reduces the total amount of energy, forward scattering just changes
the beam, path length, but that's it.
The dry fogs in the sky (even after thunderstorm) still up there, thus stratosphere
not troposphere.
The tropical reservoir will keep delivering aerosol for about two years after the
eruption.
The particles are excellent scatterers in short wavelength. They do absorb in NIR
and in IR. Because of absorption, there is a local temperature change in the lower
stratosphere.
This temperature change will cause some convective motion to further spread the
aerosol, and second: Its good factual stuff. Once it warms up, it will generate a
temperature gradient. Horizontal temperature gradients increase the
baroclinicity and thus storms, and they speedup the local zonal winds. This
change in zonal wind in high latitudes is particularly large in winter. This increased
zonal wind (Westerly) will remove all cold air that tries to buildup over winter in
high arctic.
Therefore, the temperature anomaly in winter time is actually quite okay.
Impact of volcanoes is to cool the surface through scattering of radiation.
In winter time over the continents there might be some warming. In the
stratosphere, the aerosol warm.
The amount of GHG emitted is comparably small to the reservoir in the air.
The hydrologic cycle responds to a volcanic eruption.
Stepping back
• We have identified a number of noun phrases
and verbs that will be needed if we are to
answer the questions
• Noun phrases are typically modeled as
classes
• Verbs are typically modeled as properties
• Constraints are typically modeled as value
(and other) restrictions
Starting Points
• When building a background ontology for an
application, we need to decide whether it is
best to start from scratch or to reuse other
ontologies.
• Look around for existing resources
• These can be:
– Existing ontologies
– Database schemas
– Controlled vocabularies
– Table of contents like material (on a web page, in
a book, catalog, etc.
How to find starting points
•
•
•
•
•
Web searches for content area
SWOOGLE
Talk to experts
Standards bodies (IEEE, OMG, etc.)
In this case, SWEET – Semantic Web Earth
and Environmental Terminology was a
reasonable starting point
– Why – because it was reasonably well used, it
included terminology we needed, it incorporated
some standard terminologies we cared about
Atmosphere (portions from SWEET)
Atmosphere II
More on Scoping
• Focus initially on:
– Class hierarchy
– Important relationships (yielding properties and
sometimes property hierarchies)
– Important restrictions (yielding classes to be used
as value restrictions)
• Acknowledge other important issues such as:
– Required vs. optional (yielding cardinality
restrictions)
– Disjointness
– Processes
Representing processes
23
Developing ontologies in VSTO
• Use cases and small team (7-8; 2-3 domain experts, 2
knowledge experts, 1 software engineer, 1 facilitator, 1
scribe)
• Identify classes and properties (leverage controlled
vocab.)
– Start with narrower terms, generalize when needed or
possible
– Adopt a suitable conceptual decomposition (e.g. SWEET)
– Import modules when concepts are orthogonal
• Review, vet, publish
• Only code them (in RDF or OWL) when needed
(CMAP, …)
• Ontologies: small and modular
24
Use Case example
• Plot the neutral temperature from the Millstone-Hill
Fabry Perot, operating in the vertical mode during
January 2000 as a time series.
• Plot the neutral temperature from the MillstoneHill Fabry Perot, operating in the vertical mode
during January 2000 as a time series.
• Objects:
–
–
–
–
–
–
–
Neutral temperature is a (temperature is a) parameter
Millstone Hill is a (ground-based observatory is a) observatory
Fabry-Perot is a interferometer is a optical instrument is a instrument
Vertical mode is a instrument operating mode
January 2000 is a date-time range
Time is a independent variable/ coordinate
Time series is a data plot is a data product
25
Class and property example
• Parameter
– Has coordinates (independent variables)
• Observatory
– Operates instruments
• Instrument
– Has operating mode
• Instrument operating mode
– Has measured parameters
• Date-time interval
• Data product
26
Modeling Advice
• As we model, we want to think about how we
will represent the information.
• When we clean things up, we will want to
follow best practices:
– Consistent
– Understandable
– Extensible
– Longevity (e.g., prices on wines in wine agent
may need to change frequently and may be best
in a separate file)
Domain Modeling
• Next simple domain modeling and evaluation
using a simplified example from the domain
of wine and foods
General Nature of Descriptions
a WINE
a LIQUID
a POTABLE
General Categories
grape: chardonnay, ... [>= 1]
sugar-content: dry, sweet, off-dry
color: red, white, rose
price: a PRICE
winery: a WINERY
Structured
Components
grape dictates color (modulo skin)
harvest time and sugar are related
Interconnections
Between Parts
General Nature of Descriptions
Class
Superclass
Number / Card
Restrictions
Roles /
Properties
Value
Restrictions
a WINE
a LIQUID
a POTABLE
General Categories
grape: chardonnay, ... [>= 1]
sugar-content: dry, sweet, off-dry
color: red, white, rose
price: a PRICE
winery: a WINERY
Structured
Components
grape dictates color (modulo skin)
harvest time and sugar are related
Interconnections
Between Parts
Ontology Development
• Define domain terms and inter-relationships
– Define concepts in the domain (classes, nouns)
– Identify subclass/superclass relationships
– Identify attributes/properties/slots (verbs)
– Identify any general properties (relations, functions,
verbs)
– Restrict slot values
– Define individuals
– Define relationships between individuals (filling in
slots)
More:
http://www.bell-labs.com/project/classic/papers/sowabook.ps.gz
slide from Kendall/McGuinness SemTech Tutorial 2009
Classes & Class Hierarchy
• A class is a concept in the domain
– Vintage – a wine made from grapes grown in a specified year
– A class of properties (flavor, body, color, sugar…)
• A class is a collection of elements with similar properties
– White wine – wines made from white grapes
– White table wine – wines made from white grapes that are not
appellations or regional (not “quality wine” in the EU)
• A class contains necessary conditions for membership (specific
network broadcast properties, frequency, time & location)
• Instances of classes
– Marietta Old Vines Red -> Red Wine
– Forman Vineyards -> Winery
slide from Kendall/McGuinness SemTech Tutorial 2009
Class Inheritance
• Classes are organized into subclass-superclass (or generalizationspecialization) hierarchies
• True subclass relationships are the basis of a formal is-a hierarchy
Classes are “is-a” related if an instance of the subclass is an
instance of the superclass
• Classes may be viewed as sets
• Subclasses of a class are comprised of a subset of the superset
• Examples
– RedWine is a subclass of Wine
Every red wine is a wine or every instance of a red wine (e.g.,
Marietta Old Vines Red) is an instance of wine
– NapaValleyWine is a subclass of CaliforniaWine
Every wine from Napa Valley is a wine from California
Levels in the Class Hierarchy
• Class inheritance is Transitive
– A is a subclass of B (white wine, dessert wine are
subclasses of wine)
– B is a subclass of C (viognier is a subclass of
white wine, late harvest wine is a subclass of
dessert wine)
– therefore A is a subclass of C (late harvest
viognier is a subclass of white wine, dessert wine
and wine)
Properties & Slots
• Slots in a class definition describe attributes of members of
a class
each wine will have color, sugar content, flavor, body, etc.
• Types of properties
– “intrinsic” properties: flavor and color of wine
– “extrinsic” properties: name and price of wine
– parts: ingredients in a recipe
– relations to other objects: producer of wine (winery)
• Data and object properties
– simple (datatype) contain primitive values (strings,
numbers)
– complex properties contain other objects (e.g., a winery
instance)
Class & Slot Inheritance
• A subclass inherits all the slots from its super class
If a wine has a name and flavor, a red wine also has a name
and flavor
• If a class has multiple super classes, it inherits slots and
restrictions from all of them
Port is both a dessert wine and a red wine. It inherits “sugar
content: sweet” from the dessert wine and “color:red” from
red wine
slide from Kendall/McGuinness SemTech Tutorial 2009
Property or Slot Constraints
• Constraints on properties describe or limit the set
of possible values
– A channel adapter in a message bus must be associated
with at least one channel
– A policy applies for exactly one frequency range
• Slot cardinality – the number of values a slot can or
must have
– Cardinality – cardinality N means that the slot must have
exactly N values
– Minimum cardinality - 1 means that the slot must have a
value (required), 0 means that the slot value is optional
– Maximum cardinality - 1 means that the slot can have at
most one value (single-valued slot), N means that the slot
can have up to N values (N > 1, multi-valued slot)
Slot Value Constraints
• Slot value type – defines the set of possible
values for the property
–
–
–
–
String: a string of characters (“Château Lafite”)
Number: an integer or a float (15, 4.5)
Boolean: a true/false flag
Enumerated type: a list of allowed values (red, white,
rose)
– Filler: a single value (e.g., the color slot for a
RedWine must be filled with the single value “red”)
– Object type – a class defined in an ontology (e.g.,
Winery is the value restriction on the hasMaker slot
on the class Wine)
slide from Kendall/McGuinness SemTech Tutorial 2009
Domain & Range Properties
• In OWL and many other KR languages, relations (properties,
slots) are strictly binary
• The domain & range represent the source & target
arguments, respectively, for the property
• Domain of a slot – the class (or classes) that may have the
slot -Wine is the domain of the slot hasWineColor
• Range of a slot – the class (or classes) to which slot values
belong - everything that fills the hasWineColor slot is an
instance of the enumerated class {red, white, rose}
• Some KR languages that inherently support n-ary relations,
such as CL, do not make this distinction
– More flexible, intuitively more like mathematics, where
functions have ranges (or return types) but not all
relations are functions
– Requires additional relations to specify argument order,
which can be critical for ontology alignment
slide from Kendall/McGuinness SemTech Tutorial 2009
Property Inheritance
• A subclass inherits all the slots of its
superclass(es)
• A subclass can add constraints to
“narrow” the set of allowed values
– Make the cardinality range smaller
– Replace a class in the range with a subclass
slide from Kendall/McGuinness SemTech Tutorial 2009
Individuals or Instances of
Classes
• An Individual (instance, object in other
paradigms)
– Any class that an individual is a member of, or is an
individual of, is a type of the individual
– Any superclass of a class is an ancestor (or type) of
the individual
• Specify slot values for the individual
– Slot values should conform to the constraints such as
range, value type, cardinality restrictions, etc.
slide from Kendall/McGuinness SemTech Tutorial 2009
Vehicle Example: OWL Individuals
BASF
Dupont
Daimler-Chrysler
Boeing
BMW
* Adapted from Evan Wallace, NIST
slide from Kendall/McGuinness SemTech Tutorial 2009
OWL Statements
Dupont
Boeing
BASF
Daimler-Chrysler
BMW
a Mini
Cooper S
a Dakota
* Adapted from Evan Wallace, NIST
slide from Kendall/McGuinness SemTech Tutorial 2009
OWL ObjectProperty
Dupont
Boeing
<owl:ObjectProperty rdf:ID="builtBy">
<rdfs:range rdf:resource="#Enterprise"/>
<rdfs:domain rdf:resource="#DurableGood"/>
<owl:inverseOf rdf:resource="#hasBuilt"/>
</owl:ObjectProperty>
a Mini
Cooper S
BASF
Daimler-Chrysler
BMW
a Dakota
VIN
* Adapted from Evan Wallace, NIST
slide from Kendall/McGuinness SemTech Tutorial 2009
OWL ObjectProperty
range
Dupont
Boeing
BASF
Daimler-Chrysler
BMW
<owl:ObjectProperty rdf:ID="builtBy">
<rdfs:range rdf:resource="#Enterprise"/>
<rdfs:domain rdf:resource="#DurableGood"/>
<owl:inverseOf rdf:resource="#hasBuilt"/>
</owl:ObjectProperty>
a Mini
Cooper S
a Dakota
* Adapted from Evan Wallace, NIST
slide from Kendall/McGuinness SemTech Tutorial 2009
OWL ObjectProperty
range
Dupont
Boeing
Daimler-Chrysler
BMW
<owl:ObjectProperty rdf:ID="builtBy">
<rdfs:range rdf:resource="#Enterprise"/>
<rdfs:domain rdf:resource="#DurableGood"/>
<owl:inverseOf rdf:resource="#hasBuilt"/>
</owl:ObjectProperty>
domain
BASF
a Mini
Cooper S
* Adapted from Evan Wallace, NIST
slide from Kendall/McGuinness SemTech Tutorial 2009
a Dakota
Inverse Properties
domain
Dupont
Boeing
<owl:ObjectProperty rdf:ID=“hasBuilt">
<rdfs:range rdf:resource="#DurableGood"/>
<rdfs:domain rdf:resource="#Enterprise"/>
<owl:inverseOf rdf:resource="#builtBy"/>
</owl:ObjectProperty>
range
BASF
Daimler-Chrysler
BMW
a Mini
Cooper S
* Adapted from Evan Wallace, NIST
slide from Kendall/McGuinness SemTech Tutorial 2009
a Dakota
Inverse Properties
• Inverse slots contain redundant information, but
– Allow acquisition of the information in either direction
– Enable additional verification
– Allow presentation of information in both directions
• The actual implementation may vary from system to system
– Are both values stored?
– When are the inverse values filled in?
– What happens if we change the link to an inverse slot?
• Repository models often provide support for traversing
relationships (domain, domainOf; range, rangeOf), allowing
where-used kinds of searches
• One of the most common uses of
owl:inverseFunctionalProperty is to conceptualize relational
database keys
slide from Kendall/McGuinness SemTech Tutorial 2009
More on Properties
• Symmetric
Deborah
Peter
• Transitive
RPI
TWC
* Adapted from Evan Wallace, NIST
slide from Kendall/McGuinness SemTech Tutorial 2009
SoScience
hasPart
Class Descriptions
1. class identifier
2. enumeration
3. property restriction
4. intersection
5. union
6. complement
slide from Kendall/McGuinness SemTech Tutorial 2009
Class Descriptions – Property
Restriction
property P
Individual
of Class C
• quantified property restriction (type)
– Universally quantified – allValuesFrom
– Existentially quantified - someValuesFrom
• hasValue property restriction (value)
• property cardinality restriction (# of values)
* Adapted from Evan Wallace, NIST
slide from Kendall/McGuinness SemTech Tutorial 2009
Class Axioms
• subsumption (necessary)
– A ⊆ B where B
is a class description
partial or primitive class
B
A
• definition (necessary and sufficient)
– C ≡ D where D
is a class description
complete or defined class
D
C
* courtesy of Evan Wallace, NIST
slide from Kendall/McGuinness SemTech Tutorial 2009
Disjoint Classes
A
B
• Classes are disjoint if they cannot have common instances
• Disjoint classes cannot have any common subclasses either
• For example, if winery and wine are disjoint, then there is no
instance that is both a winery and a wine. Similarly, there is no
class that is both a subclass of winery and simultaneously a
subclass of wine
• Disjointness is often used to aid consistency checking
• Disjointness is also helpful in teasing out
subtle distinctions among classes across
multiple ontologies
slide from Kendall/McGuinness SemTech Tutorial 2009
Siblings in the Class Hierarchy
• All siblings should be
specified at roughly
the same level of
generality
• Compare to section
and subsections in a
book
slide from Kendall/McGuinness SemTech Tutorial 2009
Class Specification
• If a class has only one
child, there may be a
modeling problem – often
a sign that a definition is
incomplete
• If the only Red Burgundy
we have is Côtes d’Or,
why introduce the
subclass?
Class Specification (2)
• Subclasses of a class
usually have
– Additional properties
– Additional slot restrictions
– Participate in different
relationships
• Compare to bullets in a
bulleted list
slide from Kendall/McGuinness SemTech Tutorial 2009
Cyclic Definitions
• Cycles are common in many KR
systems, though rarely “a good
thing”
• Cycles are disallowed by some
tools because they prohibit
“code generation”, including
RDF/OWL
• Classes A, B, and C have
equivalent sets of instances
– By many definitions, A, B, and
C are equivalent
– Use owl:equivalentClass
instead of creating cycles
slide from Kendall/McGuinness SemTech Tutorial 2009
Creating Levels and Subclasses
• If a class has a large
number of subclasses, it
may be useful to define
intermediate levels
• For example, in the
domain of wines, there
are natural groupings
around wine color
• However, if no natural
classification exists, the
long list may be
appropriate
slide from Kendall/McGuinness SemTech Tutorial 2009
Inheritance, Naming,
Synonyms
Class
instance-of
Instance
MariettaOldVinesRed
• A “wine” is not a subclass of
“wines”
• A particular vintage should be
classified as an instance of the
class Wines
• Class names should be either
– all singular
– all plural
• Synonym names for the same
concept are not different
classes
slide from Kendall/McGuinness SemTech Tutorial 2009
Inheritance, Naming,
Synonyms (2)
• Many systems, metadata
standards support
synonymous terms as part of
a class definition
Class
instance-of
Instance
MariettaOldVinesRed
• OWL allows defining
necessary and sufficiency
condition definitions thereby
allowing synonym definitions
to be “first class” terms
slide from Kendall/McGuinness SemTech Tutorial 2009
Class vs. Property Value
• Do concepts with different slot values become
restrictions for different slots?
• How important is the distinction for the domain?
• Class definitions for most domains should be fairly
stable – i.e., they should not change frequently once
the definitions are established and individuals
created
slide from Kendall/McGuinness SemTech Tutorial 2009
Class vs. Individual
• Individual instances are the most specific objects
in an ontology
• If concepts form a natural hierarchy, represent
them as classes
• If they will have instances below them, represent
them as classes
slide from Kendall/McGuinness SemTech Tutorial 2009
Group Exercise
•
•
•
•
Domain Modeling Exercise
Wine Agent Revisited
VSTO revisited
(or another topic of class choosing)
Initial Question
• Questions to answer:
– Which wine goes with meal x
– What is statistical signature (of x at height y)
– What is neutral temperature (and how should it
be plotted)
– What components should I buy in my home
theater system
– What components should customer x buy in their
switching system
– …..
Logistics
• Hand in assignment by 6 pm if you have not
already done so
• Reading assignment for next week
– reading on use cases. (note that there are
mandatory and optional readings this week)
• Next week we will do an in-class group
exercise on use cases and…