Transcript Document
Health GIS 2008,Bangkok,Thailand Data Warehousing For JalaSRI To Help Transition To Better Governance And Data Empowerment JalaSRI Watershed Surveillance & Research Institute Jalgaon, India Vinay Dharmadhikari & Sanjay Pawde 1 Outline 1. Background 2. Motivation 3. Scope 4. Data Storage And Processing Strategies 5. Data warehousing 6. Design Strategy And Sequence Of Steps 7. Data Warehouse Configuration 8. Data Warehouse Architecture Review & Design 9. Illustrative Data Sources In Context Of JalaSRI 10. Illustrative Data Requirements 11. The Path Forward..! 2 Background There is a need for making available,to the decision makers,planners,researchers and public institutions, the necessary data, and data handling tools and techniques. The ultimate goal is to develop an integrated system for data sharing, data accessing and data use, for solving locale- specific problems. Geo-Informatics as conceived here will address all the vital elements viz. geographic measurements, geo-accounting, spatial analysis as well as integrated spatio-temporal decision-making. 3 Motivation The data management practices at district level are not yet fully geared to collect and the information needed. The conventional methods of data collection/ collation, storage are not amenable for easy quick updation , retrieval and holistic analysis. District level data management requires an integrated approach data analysis tools and a large matrix of sectoral data, in digital format. 4 Motivation (contd…) There is a critical need of data empowerment for people, communities and Institutions of self-governance for enabling informed Decision-making. Identification of subject relevant information/data for JalaSRI. Identification of relevant valid data sources and requirements of historical and real time data with consistent formats to reduce data redundancy. 5 Scope In this initiative, subject relevant requirements are being progressively identified, recorded and refined by the JalaSRI team. Based on the information and the data analysis, the various JalaSRI working groups are in process of developing appropriate recommendations. Socio Economic impact, and on ground evaluation are being loaded into the data warehouse 6 Types Of Data Encountered @JalaSRI • Tabular (Ex: Transaction data) – Relational – Multi-dimensional • Spatial (Ex: Remote sensing data) • Temporal (Ex: Log information) – Streaming (Ex: multimedia, network traffic) – Spatio-temporal (Ex: GIS) • Tree (Ex: XML data) • Graphs • Sequence • Text, Multimedia … 7 Evolving Spatial & Non-Spatial Databases At JalaSRI Forest & Biodiversity Data Weather Data Watershed Data GIS Data Agriculture Data Spatial & Non Spatial Data Warehouse Health Data Poverty & Unemployment Data Socio-economic Data Financial Data 8 Data Storage And Processing Strategies Design to ensure adequate storage of data and efficient transaction- processing environment. The main concern in JalaSRI database (DB) system is to ensure concurrent access and recovery techniques that guarantees data consistency. On-Line Application Processing Systems designed to manage high number of concurrent transactions and offer the functionality of on-line interactions. 9 Data Storage And Processing Strategies (contd..) Information System Design for multiple views,snapshots and manipulations of data. Separate logically the Information System and Operational Database System. Ultimate goal is to the evolve to a Decision Support System (DSS). 10 Why Data Warehousing ? Operational decision making • • • • Significance of change / difference ? Trends ? Temporal pattern ? Spatial variation ? Data Warehouse Statistical summarization Executive decision / policy making Charts / graphs Maps • Relationships with demographics ? • Relationships with social fabric ? • Relationships with neighborhood characteristics ? • Relationships with physical environment ? 11 JalaSRI- Data Warehousing Data Warehouse (DW) at JalaSRI has been opted as the core technology for DSS to validate and help improve the local wisdom. Design of DW with the capability of interface with modern geo-informatics tools and software. Data warehouse is designed as 'reliable source' for such scientific data. At JalaSRI, DW is perceived and designed as critical software infrastructure which- collect, clean,integrate and organize the data. 12 JalaSRI- Data Warehousing (contd..) JalaSRI has started building the DW as an overall strategy and continuously evolving and refining process. The Data Warehouse environment design will enable JalaSRI with capabilities of – trend identification, forecasting, summarization of significant data competitive analysis, and targeted market research. 13 JalaSRI Data Warehouse Services Metacontent maintenance Version integration Topology integration Visualization & reporting Schema transform Load & QA Best path Feature extraction Generalization Spatial Warehouse Data Server 14 Design Strategy Adopted Accurately identify the information that must be contained in the Data Warehouse. Identify , prioritize and manage the scope of the subject areas to be included in the Data Warehouse. Design for scaleable DW architecture. Identify and select the hardware -software - middleware components Design DW to Extract, cleanse, aggregate, transform and validate the data to ensure accuracy and consistency. Define the correct level of summarization to support decisionmaking. 15 Design Strategy Adopted (contd..) Provide user-friendly, powerful desktop tools. Educate the user and community. Establish the processes for maintaining, enhancing, and ensuring the ongoing evolution and applicability of the Warehouse. Establish a Data Warehouse Help Desk. 16 Design Strategy And Sequence Of Steps (contd..) Design of the Data Warehouse is around the major subject areas of the JalaSRI. The data within the Data Warehouse design is integrated. All data in Data Warehouse is being validated and ensured to be accurate and time consistent. 17 JalaSRI Data Warehouse Configuration A Data Warehouse design time configuration, also known as the logical architecture, includes the following components: One Enterprise Data Store (EDS) - a central repository, which supplies atomic (detail level) integrated information to the whole organization. One Operational Data Store - a "snapshot" of a moment in time's enterprise-wide data One Data Mart - summarized subsets of the enterprise's data specific to a functional area or department,geographical region, or time period One Metadata Store - catalogue(s) of reference information about the primary data. Metadata is divided into two categories: information for technical use, and information for end-users. 18 Logical View Multi-tier JalaSRI Data Warehouse GIS & applications DB server Integration Tools Tools for modeling, cleaning, integrating and loading data. File server Access Tools Tools for query, analysis and reporting. (Web-based preferred) Appl. Server ORACLE 10 i + with Spatial enhancements Tier 2 Application Servers File manager Tier 1 Data files Data Management Tools Application Environments Data Access Protocols & APIs Network Interface APIs - (OGDI, OGC & CGI) Meta-Data Management (Repository) Tier 3 Data Management & Data Server Environment 19 JalaSRI DW Architecture Review & Design The logical architecture includes – a central Enterprise Data Store, an Operational Data Store, one Data Marts per subject area, and one Metadata store. After the logical configuration, Data Architecture, Application Architecture , Technical Architecture and Support Architecture is to be defined and designed to physically implement it. 20 DW Architecture Review & Design (contd..) Conduction of Gap analysis. The Data Architecture to define the quality and management standards for data and metadata. The Application Architecture to control the movement of data from source to user. 21 DW Architecture Review & Design (contd..) The Technical Architecture to provide the underlying computing infrastructure that will enable the data and application architectures. The Support Architecture will include the software components for performance management. Architecture Review and Design for development and refinement of the overall Data Warehouse. 22 9. Illustrative Data Sources In Context Of JalaSRI S1 -AISLUS- All India Soil And Land Use Survey, M/O Agriculture S2 - NBSSLUP - National Bureau Of Soil Survey And Land Use Planning, Indian Council of Agriculture Research S3- NNRMS- National Natural Resources Management System, and SRSAC- State Remote Sensing Application Center, D/O Space S4- Population Census /NSS S5- Agriculture Census S6- Animal Husbandry Census S7- BPL Census S8- Land Records S9- Net Area Sown , crop-wise S10- Flood Control Agencies S11- Meteorological Data S12- Open domain Analysis of Paper on Watershed Surveillance 23and Inventions 10. Illustrative Data Sources In Context Of JalaSRI S13- Agriculture Research Outputs S14- Irrigation Department Records S15- Fertilizer Company Soil Tests S16- Seed Companies Data S17- Employment Exchanges S18- Industry Associations S19- Vocational Education Surveys S20- Department Of Forest Data S21- Bio Diversity Registers From PRIs S22- Ecological NGOs (WWF etc) S23- Department Of CO-OP / Registrar S24- Rural Banks/ Credit Societies S25- TIFR , TISS,Data-bases S26- Labor Bureau S27- Factory Statistics from ASI 24 10. Identified Data Requirements For JalaSRI R1-Latest maps of various scale R2-Satellite imaginaries of various seasonal times R3-Land use and soil Maps R4-Employment seekers R5- District irrigation and water needs R6- Crop information, new varieties ,seed/fertilizers,pest management R7- Land allotment to landless R8- Credit delivery R9- Disease Incidences R10- Water quality R11- Sanitation data 25 10. Identified Data Requirements For JalaSRI R12- No of wells with locations and water levels R13- Fertilizers usage data R14- Crop area sown R15- Climate records R16- Water table depths R17- Acreage under HYV R18- Child labour/bonded labour R19- Species availability / abundance /scarcity R20- Web link to relevant Data Warehouses like PASDA,EPA,USGS etc 26 11. Future Direction The field of spatio-temporal data warehouse is new, it is still not very well exploited, and it needs to integrate the knowledge from three different research topics: data warehouses, spatial databases, and temporal databases. While JalaSRI researchers will have free access to all raw data and analyses, other collaborators will be given access, only through a designated JalaSRI interface, and a policy on attribution and feeding back use/publication of material using our data, will be enunciated. Others may be given access only to public domain material. Studies and comparisons will be undertaken on other existing datawarehouses, like PASDA of Penn state University and of EPA and USGS of government of USA, etc; and learnings will be made use of. 27 The JalaSRI Data Warehousing Path… ! Decision Support Use Decision Maker 5 1 Acquire / Enhance User Interface Knowledge / Intelligence Analyst Characteristics / Associations / Patterns / Trends Control / Interact Subject Expert 4 1 Knowledge discovery and construction Contribute to Domain Knowledge Base Concepts / Metadata 3 Intermediate results DataData Mining Possible refinement (Also see mining Figure 13.3) Flat files 2 1 Selection and transformation Data Warehouse 1 Integration and cleaning Transactional Databases Deploy 28 REFERENCES 1. Agrawal R., Gupta A., Sarawagi S. Modeling Multidimensional Data. In Proc.of 13th Int. Conf. on Data Engineering, ICDE, 1997. 2. Los Alamos National Laboratory. Earth & Environmental Sciences. GISLab. Spatial Data Warehouse. http://www.gislab.lanl.gov/data_warehouse.html, 2003. 3. http://www.esri.com.software/arcgis/arcinfo. http://www.esri.com.software/arcgis/arcinfo, 2003. 4. Bauer A., Hümmer W., Lehner W. An Alternative Relational OLAP Modeling Approach. In Proc. of the 2nd Int. Conf. on Data Warehousing and Knowledge Discovery, DeWaK, 2000. 5. http://www.isprs.org/commission4/proceedings/paper.html, 2002. 29 REFERENCES 6. Berson A., Smith S. Data Warehousing, Data Mining and OLAP. Mc Graw- Hill, 1997. 7. Borges K., Laender A. and Davis C. Spatial Data Integrity Constraints in Object-Oriented Geographic Data Modeling. In Proc. of the ACM Symposium on Advances in Geographic Information Systems, ACM GIS, 1999 30 Thank you and Looking forward for feedback/collaborations ! 31