Building Peta-Byte Data Stores Jim Gray @ Claus Shira Anniversary European Media Lab 12 February 2001
Download ReportTranscript Building Peta-Byte Data Stores Jim Gray @ Claus Shira Anniversary European Media Lab 12 February 2001
Building Peta-Byte Data Stores Jim Gray @ Claus Shira Anniversary European Media Lab 12 February 2001 How Much Information Is there? • Soon everything can be recorded and indexed • Most data never be seen by humans • Precious Resource: Human attention Auto-Summarization Auto-Search is key technology. www.lesk.com/mlesk/ksg97/ksg.html 24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli Everything ! Recorded All Books Yotta Zetta Exa MultiMedia Peta All LoC books (words) .Movi e A Photo A Book Tera Giga Mega Kilo ops/s/$ Had Three Growth Phases Now doubling every year 1890-1945 Mechanical Relay 7-year doubling 1945-1985 Tube, transistor,.. 2.3 year doubling 1985-2000 Microprocessor 1.0 year doubling 1.E+09 ops per second/$ doubles every 1.0 years 1.E+06 1.E+03 1.E+00 1.E-03 doubles every 7.5 years doubles every 2.3 years 1.E-06 1880 1900 1920 1940 1960 1980 2000 Gilder’s Law: 3x bandwidth/year for 25 more years • Today: – 10 Gbps per channel (per lambda) – 4 channels per fiber: 40 Gbps – 32 fibers/bundle = 1.2 Tbps/bundle • • • • In lab 3 Tbps/fiber (400 x WDM) In theory 25 Tbps per fiber 1 Tbps = USA 1996 WAN bisection bandwidth Aggregate bandwidth doubles every 8 months! 1 fiber = 25 Tbps Redmond/Seattle, WA Information Sciences Institute Microsoft Qwest University of Washington Pacific Northwest Gigapop New York HSCC (high speed connectivity consortium) DARPA Arlington, VA San Francisco, CA 5626 km 10 hops Storage capacity beating Moore’s law Disk TB Shipped per Year 1E+7 ExaByte 1E+6 • 3 k$/TB today (raw disk) • 3 M$ /PB 1998 Disk Trend (Jim Porter) http://www.disktrend.com/pdf/portrpkg.pdf. 1E+5 disk TB growth: 112%/y Moore's Law: 58.7%/y 1E+4 1E+3 1988 Moores law Revenue TB growth Price decline 1991 1994 1997 58.70% /year 7.47% 112.30% (since 1993) 50.70% (since 1993) 2000 Microsoft TerraServer: http://TerraServer.Microsoft.com/ • Build a multi-TB SQL Server database • Data must be – – – – 1 TB Unencumbered Interesting to everyone everywhere And not offensive to anyone anywhere – – – – 1.5 M place names from Encarta World Atlas 7 M Sq Km USGS doq (1 meter resolution) 10 M sq Km USGS topos (2m) 1 M Sq Km from Russian Space agency (2 m) • Loaded • On the web (world’s largest atlas) • Sell images with commerce server. TerraServer 4.0 Configuration 3 Active Database Servers SQL\Inst1 - Topo & Relief Data Compaq Compaq Compaq Controller Controller Controller E L S Compaq Compaq DL360 DL360 DL360 DL360 DL360 DL360 DL360 DL360 SQL\Inst2 – Aerial Imagery SQL\Inst3 – Aerial Imagery Logical Volume Structure One rack per database All volumes triple mirrored (3x) MetaData on 15k rpm 18.2 GB drives Image Data on 10k rpm 72.8 GB drives MetaData 101GB Image1-10 3.4 TB cooked 10 x 339 GB volumes Spread across 3 servers 2x4 to photo servers 1x2 for topo/relief server File Group Admin Gazetteer Image Meta Search Grand Total Controller F G Controller H I Rows (millions) 1 17 254 254 46 572 Controller Controller M N T U Controller Controller O P V U Total Size (GB) 0 GB 5 GB 2,237 GB 70 GB 10 GB 2,322 GB Compaq 8500 SQL\Inst1 Compaq 8500 SQL\Inst2 Compaq 8500 Web Servers 8 2-proc “Photon” DL360 SQL\Inst3 Compaq 8500 Passive Srvr Data Size (GB) 0.1 GB 1 GB 2,220 GB 53 GB 5 GB 2,280 GB Index Size (GB) 0 GB 3 GB 17 GB 17 GB 5 GB 42 GB TerraServer.Microsoft.NET A Web Service Before .NET Html Page Internet Image Tile Web Browser TerraServer Web Site TerraServer SQL Db With .NET Application Program Internet GetAreaByPoint GetAreaByRect TerraServer GetPlaceListByName Web GetPlaceListByRect GetTileMetaByLonLatPt GetTileMetaByTileId GetTile ConvertLonLatToNearestPlace ConvertPlaceToLonLatPt . . . Service TerraServer SQL Db TerraServer Recent/Current Effort • • • • • • • • Added USGS Topographic maps (4 TB) High availability (4 node cluster with failover) Integrated with Encarta Online The other 25% of the US DOQs (photos) Adding digital elevation maps Open architecture: publish SOAP interfaces. Adding mult-layer maps (with UC Berkeley) Geo-Spatial extension to SQL Server Astronomy is Changing (and so are other sciences) • • • • • • The World Virtual Observatory Doubles every 2 years. Astronomers have a few PB Data is public after 2 years. So: Everyone has ½ the data Some people have 5%more “private data” So, it’s a nearly level playing field: – Most accessible data is public. • Cyberspace is the new telescope: – Multi-spectral, very deep,… • Computer Science challenge: Organize these datasets Provide easy access to them. The Sloan Digital Sky Survey Goal: Create a detailed multicolor map of the Northern Sky over 5 years Special 2.5m telescope Two surveys in one: Photometric survey in 5 bands. Spectroscopic redshift survey. Huge CCD Mosaic 30 CCDs 2K x 2K (imaging) 22 CCDs 2K x 400 (astrometry) Two high resolution spectrographs 2 x 320 fibers, with 3 arcsec diameter. R=2000 resolution with 4096 pixels. Spectral coverage from 3900Å to 9200Å. Automated data reduction Over 70 man-years of development effort. (Fermilab + collaboration scientists) Very high data volume 40 TB of raw, 3TB cooked data (all public). The University of Chicago Princeton University The Johns Hopkins University The University of Washington Fermi National Accelerator Laboratory US Naval Observatory The Japanese Participation Group The Institute for Advanced Study SLOAN Foundation, NSF, DOE, NASA The Cosmic Genome Project The SDSS will create the ultimate map of the Universe, with much more detail than any other measurement before daCosta etal 1995 deLapparent, Geller and Huchra 1986 Gregory and Thompson 1978 SDSS Collaboration 2002 Area and Size of Redshift Surveys 1.00E+09 SDSS photo-z 1.00E+08 No of objects 1.00E+07 SDSS main SDSS abs line 1.00E+06 SDSS red 1.00E+05 CfA+ SSRS 2dF LCRS 1.00E+04 SAPM 1.00E+03 1.00E+04 2dFR 1.00E+05 1.00E+06 QDOT 1.00E+07 1.00E+08 Volume in M pc 3 1.00E+09 1.00E+10 1.00E+11 Experiment with Relational DBMS • See if SQL’s Good Indexing and Scanning Compensates for Poor Object Support. • Leverage Fast/Big/Cheap Commodity Hardware. • Ported 40 GB Sample Database (from SDSS Sample Scan) to SQL Server 2000 • Building public web site and data server 20 Astronomy Queries • Implemented spatial access extension to SQL (HTM) • Implement 20 Astronomy Queries in SQL (see paper for details). • 15M rows 378 cols, 30 GB. Can scan it in 8 minutes (disk IO limited). • Many queries run in seconds • Create Covering Indexes on queried columns. • Create ‘Neighbors’ Table listing objects within 1 arcminute (5 neighbors on the average) for spatial joins. • Install some more disks! Query to Find Gravitational Lenses Find all objects within 1 arc-minute of each other that have very similar colors (the color ratios u-g, g-r, r-i are less than 0.05m) 1 arc-minute SQL Query to Find Gravitational Lenses Find nearby objects with similar color ratios. select count(*) from Objects L, Objects O, neighbors N where L.Obj_id = N.Obj_id and O.Obj_id = N.neighbor_Obj_id and L.Obj_id < O.Obj_id -- no dups and ABS((L.u-L.g)-(O.u-O.g))<0.05 -- similar color and ABS((L.g-L.r)-(O.g-O.r))<0.05 – ratios and ABS((L.r-L.i)-(O.r-O.i))<0.05 – (=dif of log) and ABS((L.z-L.r)-(O.z-O.r))<0.05 Finds 5223 objects, executes in 6 minutes. SQL Results so far. • Have run 17 of 20 Queries so far. Working on spectra load and queries now. • Most Queries IO bound, ( 80MB/sec on 4 disks in 6 minutes) • Covering indexes reduce execution to < 30 secs. • Common to get Grid Distributions: select convert(int,ra*30)/30.0, as ra_bucket convert(int,dec*30)/30.0, as dec_bucket count(*) as bucket count from Galaxies where (u-g) > 1 and r < 21.5 group by ra_bucket, dec_bucket Summary • Technology: – 1M$/PB: store everything online (twice!) – Gigabit to the desktop : store it anywhere So: You can store everything, Anywhere in the world Online everywhere • Research driven by apps: – TerraServer – National Virtual Astronomy Observatory.