X-RAY: A Non-Invasive Exclusive Caching Mechanism for RAIDs Lakshmi N. Bairavasundaram Muthian Sivathanu Andrea C.
Download ReportTranscript X-RAY: A Non-Invasive Exclusive Caching Mechanism for RAIDs Lakshmi N. Bairavasundaram Muthian Sivathanu Andrea C.
X-RAY: A Non-Invasive Exclusive Caching Mechanism for RAIDs Lakshmi N. Bairavasundaram Muthian Sivathanu Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau ADvanced Systems Laboratory Computer Sciences Department University of Wisconsin – Madison Introduction Caching in modern systems Level 1: File system (FS) cache Multiple levels Storage: 2-level hierarchy Application File system cache Software-managed Main memory of host/client LRU-like cache replacement Level 2: RAID cache Host Firmware-managed Memory inside RAID system Usually LRU replacement RAID RAID cache ....... Introduction – contd. LRU Read Block no. 10 Replace LRU block Cache placement on read LRU 39 23 23 …….. …….. 45 45 10 MRU Read Block no. 10 Introduction – contd. LRU Replace LRU block Cache placement on read Read Block no. 10 FS Cache LRU …. …….. 10 11 10 12 MRU 2 levels of LRU Redundant contents Read Block no. 10 RAID Cache LRU LRU …. 10 11 …….. 12 10 MRU MRU Read Block no. 10 Introduction – contd. LRU LRU …. 10 11 …. 10 12 MRU 2 levels of LRU Cache placement on read Replace LRU block FS Cache Redundant contents Goal: Exclusive caching RAID Cache LRU 11 12 MRU Improved RAID Caching Multi-Queue (Zhou et al. 2001) Add frequency component to cache policy Not strictly exclusive! DEMOTE (Wong and Wilkes 2002) Change interface to disk File system issues “cache place” command Has perfect information and hence perfectly exclusive caches Interface changes – difficult to deploy Ideal RAID Cache Exclusive caching File system and RAID caches should have different contents Global LRU Known to work well RAID cache should be a victim cache No interface changes …. RAID Cache LRU FS Cache MRU Victim Block …… Block Read X-RAY Observes disk traffic Host Reads and writes to data and metadata Builds a model of the FS cache Uses semantic knowledge Predicts size and contents of FS cache File system cache Identifies set of exclusive blocks Reads blocks from disk into cache Result RAID Recent victims of the FS cache A nearly exclusive cache without interface changes X-RAY Model of FS cache RAID cache Talk Outline Introduction File Systems Information and Inferences X-RAY Cache Design Results Conclusion File System Operation Applications perform file reads and writes File system (Unix) Translates file accesses to disk block requests Metadata To maintain application data on disk and manage disk blocks Periodically written to disk Examples: inodes, bitmap blocks File System Operation Inode Pointers to data blocks File access information Latest access time File Inode Pointers to data blocks Data Blocks File System Operation File access Use inode to obtain pointers to disk data blocks Read corresponding blocks from disk if they are not in FS cache Update the access time information in inode Metadata updates Periodically check for “dirty” inodes and write to disk The Problem To observe disk traffic and infer the contents of FS cache Why difficult? FS cache size changes over time Shares main memory with virtual memory system The Problem To observe disk traffic and infer the contents of FS cache Why difficult? FS cache size changes over time Disk cannot observe all FS-level accesses 12 11 Read block: 10 FS Cache LRU 12 11 10 MRU Disk Read RAID FS Cache Model 10 11 12 LRU MRU The Problem To observe disk traffic and infer the contents of FS cache Why difficult? FS cache size changes over time Disk cannot observe all FS-level accesses Read block: 10 13 FS Cache LRU 10 11 12 MRU Disk Read RAID FS Cache Model 10 LRU 11 12 MRU The Problem To observe disk traffic and infer the contents of FS cache Why difficult? FS cache size changes over time Disk cannot observe all FS-level accesses Read block: FS Cache LRU 12 10 13 MRU RAID FS Cache Model 11 LRU 12 13 MRU The Problem To observe disk traffic and infer the contents of FS cache Why difficult? FS cache size changes over time Disk cannot observe all FS-level accesses Read block: FS Cache LRU 12 10 13 MRU RAID Key observation We need information about accesses that hit in FS cache File system maintains access information in inodes FS Cache Model 11 LRU 12 13 MRU Talk Outline Introduction File Systems Information and Inferences X-RAY Cache Design Results Conclusion Information Obtain information from observing disk traffic Knowledge of file system structures and operations File system maintains time of last access in inodes Periodic inode writes Assuming whole file access, all blocks are in FS cache Assume file system cache policy is LRU Inferences Read for data block Block will be placed in file system cache (MRU block) Read for previously read data block Block became victim in file system cache Blocks with an earlier access time should also be victims Inode write: new access time , no disk read observed All blocks belonging to file are in FS cache Other blocks with later access time should also be present Talk Outline Introduction File Systems Information and Inferences X-RAY Cache Design Results Conclusion Design Block number Recency list (R-list) List of data blocks ordered by access time LRU A, 1 Cache Begin (CB) pointer Divides R-list into inclusive and exclusive regions RAID Cache contents Subset of blocks in exclusive region Access time B, 1 Exclusive region Blocks the RAID should cache C, 2 D, 3 CB E, 3 F, 5 Inclusive region Blocks expected to be in FS cache MRU Disk Read Read Block ‘D’ ; time = 6 LRU A,1 B,1 Exclusive region C,2 CB D,3 E,3 F,4 Inclusive region MRU Disk Read Read Block ‘D’ ; time = 6 LRU A,1 B,1 C,2 D,3 E,3 Exclusive region Inclusive region CB F,4 MRU Disk Read Read Block ‘D’ ; time = 6 LRU A,1 B,1 C,2 E,3 Exclusive region F,4 Inclusive region CB D,6 MRU Inode Write – Access time change Inode “23” : access time = 6 Semantic knowledge Inode “23” == blocks D & E LRU A,1 B,1 Exclusive region Blocks D, E : access time = 6 C,2 D,3 E,4 F,5 G,7 Inclusive region CB MRU Inode Write – Access time change Inode “23” : access time = 6 LRU A,1 B,1 Blocks D, E : access time = 6 C,2 D,3 Exclusive region E,4 F,5 Inclusive region CB G,7 MRU Inode Write – Access time change Inode “23” : access time = 6 Blocks D, E : access time = 6 D,6 LRU A,1 B,1 C,2 E,6 F,5 Exclusive region Inclusive region CB G,7 MRU X-RAY Cache RAID Cache (size = 2 blocks) LRU A,1 B,1 C,2 F,5 Exclusive region D,6 E,6 Inclusive region CB Keep track of additions to window in exclusive region G,7 MRU X-RAY Cache RAID Cache (size = 2 blocks) LRU A,1 B,1 C,2 F,5 D,6 Exclusive region E,6 Inclusive region CB Read newly-added blocks from disk Replace blocks no longer in the window Additional disk bandwidth Idle time, extra internal bandwidth, freeblock scheduling G,7 MRU Talk Outline Introduction File Systems Information and Inferences X-RAY Cache Design Results Tracking FS Cache Contents RAID Cache Performance Conclusion Results – Tracking Accurate size and content prediction Highly responsive to FS cache size changes Tolerates changes in inode write interval Partial file reads X-RAY performs well if percentage of partially accessed files is < 40% (typical traces have less than 30%) Results – Cache Performance Performs better than LRU and Multi-Queue Close to DEMOTE, in spite of imperfect information Hit rate advantage translates to lower read latency Additional Results File system cache policy is not LRU Clock, 2Q X-RAY performs nearly as well as before It performs better than both LRU and Multi-Queue Idle time requirements X-RAY reads blocks into cache only during idle time It performs well if idle time is greater than one-third of actual idle time observed in the trace More in the paper … Conclusion Easy deployment is an important goal in developing technology Higher-level systems maintain various pieces of information about data they manage Provide low-level systems with basic semantic knowledge Semantic intelligence for managing RAID caches Avoid interface changes – use non-invasive mechanisms Use access information in metadata to track file system cache contents and cache exclusive blocks In spite of imperfect information, X-RAY performs nearly as well as changing the interface Semantically-smart Disk Systems Availability, security and performance improvements Questions ? ADvanced Systems Laboratory (ADSL) Computer Sciences, University of Wisconsin-Madison http://www.cs.wisc.edu/adsl