Transcript Document
New and easier ways of working with aggregate data and geographies from UK censuses Justin Hayes UK Data Service Census Support Overview • The UK Data Service • UK Data Service Census Support • Census background • Aggregate data • Geographies • Dissemination challenges • InFuse • Data model • Geography model • Live demonstration! The UK Data Service • An ESRC initiative integrating several previous resources • A single, comprehensive and integrated point of access to a wide range of social science data • Access beyond traditional academic audience where possible • Support, training and guidance • ukdataservice.ac.uk UK Data Service Census Support (CS) • A specialist unit of the UK Data Service • Access to, and support for use of data from the last five UK censuses (1971 – 2011) • Add value by making census outputs easy to find, understand and use • Extend audience beyond ‘experts’ • Long history of innovation • census.ukdataservice.ac.uk ukdataservice.ac.uk CS@Mimas • Aggregate component of census outputs Justin Hayes Richard Wiseman Rob Dymond-Green CS@Mimas • Aggregate component of census outputs Justin Hayes Richard Wiseman Rob Dymond-Green UK Censuses • Decennial questionnaire surveys • Entire UK population every ten years* since 1801 • Questions about people and households • 2011 Census cost ~ £500m • Primary evidence for government policy and spending • Wide range of high quality demographic and socio-economic characteristics • What? - Detailed combinations of characteristics • Where? - Small areas • When? - Long history • Rich secondary source of information • Open Government License! UK 2011 Census • • • • • • • 27 March 2011 Three UK census agencies (ONS, NRS, NISRA) New questions and variables Targeted enumeration Online and postal completion Sophisticated quality assurance Best census ever! Census aggregate outputs • Counts of people and households* with particular combinations of characteristics for particular geographical areas • Females aged 16-74 in employment in associate professional and technical occupations and usually resident in wards in the County of Devon • Derived from unit-level questionnaire responses • • • • Variables and categories Sex - Male and Female and All Age - single and multiple years Ethnicity and Occupation – standard classifications • Traditionally specified by tables combining one or more variables Aggregate specification tables Census aggregate data Age : Age 16 to 74 - Economic activity : in employment the week before the census - Occupation : 3. Associate professional and technical occupations - Sex : Female - Unit : Persons Age : Age 16 to 74 - Economic activity : in employment the week before the census - Occupation : All categories\ Occupation - Sex : Female - Unit : Persons Census aggregate data Aggregate data 2011 Census geographies • Subdivisions of the UK into smaller areas • Sets of similar areas called geographies • • • • Functional and statistical geographies Local government districts Wards and electoral divisions Expecting around 100 different geographies • Hierarchies of geographies with nesting areas • Administrative • Statistical • Health, Electoral, Postcode, etc UK administrative geographies UK statistical geographies Dissemination challenges • Size and complexity of planned outputs • Ongoing releases from three different agencies • Inconsistencies in definitions • Categorisation differences within and between countries • Table universes • Inconsistent labelling • Incomplete geographical availability of data • Disclosure control • Lower Threshold (LT), Higher Threshold (HT) and other data • Thousands of separate datasets • Restricted global operation and understanding InFuse • • • • • Live service with 2001 census data since 2012 2011 data since 2013 Tip of the iceberg! Data model Geography model InFuse data model • Single multidimensional dataset • Deconstruction, rationalisation and re-integration of variables and categories • • • • All UK table specifications processed Integration of table universes as variables Enforce consistency across dataset Library of variables and categories to describe all counts • Re-insertion of counts into model • Retain original cell identifiers • Attachment of metadata 2011 census variables • 97 variables and counting! InFuse geography model(s) • Raw geography model • All original geographies and their areas • Direct and indirect hierarchical relationships • Simplified geography model • Combinations of equivalent geographies into geography sets with UK coverage where possible • Condensed standard/merged geographies in England • Selections of areas across the UK • Multiple geographies in one operation • Geography jumps in interface • Currently administrative and statistical geographies • More to follow Raw geography entities and relationships InFuse administrative and statistical geographies InFuse features • Open access • All data is open via Open Government Licence • Global search across entire UK dataset • Variable combinations • No tables! • Guide users to find data • Populated variable combinations • Available geographies • More data for more geographies • All LT and HT data available for all areas above LT • Improved contextual information • No data fast! InFuse 2011 demonstration • http://infuse.mimas.ac.uk/ What’s next? • Big data release imminent! • Progressive release of UK 2011 outputs • Scottish and Northern Ireland • Integrated boundary data in GIS formats • Interface design and features • More contextual information After that? • • • • • Integration of multiple censuses Non-census data External access to API for application development Development of data and geography models Continued engagement with NSIs • Data production using multidimensional approach • Automated disclosure control • No all or nothing table constraints • Use InFuse! • Let us know what you think • [email protected]