Document 7505619
Download
Report
Transcript Document 7505619
Metadata Architecture
at StatCan
MSIS 2008
Luxembourg, April 7-9, 2008
Karen Doherty
Director General Informatics Branch
Statistics Canada
Table of Contents
•
•
•
•
Building Blocks
Viewing Metadata Via STCWiki
Metadata Architecture
Conclusion
2
Background
• Large collection of data offerings led to
the need for metadata on the surveys and
administrative data sources
• A lot of effort was spent initially on
defining the surveys to support the
dissemination function
• Focus has shifted to the descriptions of
the variables and code sets used
3
Business Drivers
• Many programs now load their data into
warehouses to support analysis and data
confrontation
• Users were asking for the supporting metadata
to be available from within the warehouses
• Users wanted to be able to submit corrections
to the metadata as they were working on the
data
4
Building Blocks
• Metadata collection
– Integrated Metadata Base (IMDB)
• Data manipulation and data display tools
– Data Warehouse Framework
– EzWeb
– STCwiki
5
Integrated Metadata Base
• Contains information on our 590 active surveys
• Benefits:
– Improves the interpretability of our surveys
– Assists in assuring the coherence of our data
– Promotes knowledge sharing within StatCan and
with external users
– Preserves corporate memory
– Promotes the reuse and standardization of metadata
assets including definitions and code sets
6
Original Vision
for the IMDB
7
Integrated Metadata Base
• Architecture:
– Oracle Database
– Java
• Model based on ISO/IEC 11179 Metadata
Registries and the Corporate Metadata
Repository (CMR) from the US Census
Bureau
8
Integrated Metadata Base
• Data dimension model :
– Describes the data
– Based on ISO/IEC 11179
– Data elements (variables) specified by an
object class with properties and values (code
sets, classifications)
– Can be used to describe any set of objects
not just metadata
9
Integrated Metadata Base
• ISO 11179 data element definition :
– Object class: type of object being described (person,
establishment, household, etc.)
– Property: attribute that describes the object (sex,
age, etc.)
– Conceptual Domain: list of possible settings (value
meanings) for a property (property Sex has two
possible meanings Male and Female)
– Value Domain: code set used to represent the value
meanings (1 = Male, 2 = Female, etc.)
10
Data Warehouse
Framework
• Standard development framework for
data warehouses at Statcan
• Based on Microsoft .net / SQLServer
• Users can access metadata from within
the warehouse
11
EZWeb & STCwiki
• EzWeb
– used to develop Intranet sites in a very standardized
way
– incorporates MS Excel pivot tables to allow users to
view data in a warehouse via OLAP
– can navigate easily from one OLAP report to another
• STCwiki
– used to display metadata from the IMDB
– allows users to collaborate and to submit proposed
changes to the IMDB
– based on MediaWiki (used by Wikipedia)
12
Access from
a Data Warehouse
A user viewing
data in a
warehouse by
age range and
year can see
the associate
metadata by
clicking on the
Metadata icon
(top menu).
13
Metadata Browsing
in STCwiki
The user can
view any of the
metadata
associated with
the table they
were looking at
– in this case
the definitions of
the age range
variable.
14
Metadata Browsing
in STCwiki
The user can
view any
metadata stored
in IMDB – all is
accessible from
STCwiki.
15
Put it All Together
• So now we have the building blocks, how
do we put it together into a coherent
architecture?
16
Objectives
• Leverage investment in IMDB to describe not
only statistical metadata but also metadata on IT
systems and the enterprise architecture
• Develop user-friendly interfaces to metadata from
within a warehouse
• Improve support for coding activities and
classifications systems
• Allow users to submit proposed changes to the
metadata in the IMDB
17
The IMDB is Expanding
Expansion
plans for
this year
are
highlighted
in blue.
18
Multiple Formats
Warehouse
data and IMDB
metadata can
be combined
and rendered
in any format
via converter
modules.
19
Metadata Architecture
Target
architecture
Take data and
metadata and
render/publish it
in a variety of
modes via
software based
format
converters.
Most of this
architecture
exists today at
StatCan.
20
Conclusion
• IMDB has the design required to support a full
range of metadata functionality at StatCan
• A significant amount of variable-related
information already exists in the IMDB
• Work for 2008/09
– Completing the process for reviewing and approving
metadata changes from wiki users
– Completing the Classification Management System
– Mapping out a proposal for funding to develop a format
converter (DDI or SDMX)
21