Transcript Efficient Processing of Spatial Joins Using R
The R+-Tree
A Dynamic Index for Multi-Dimensional Objects Timos K. Sellis et al.
VLDB 1987
Jae-hoon Kim
1
PNU
Introduction
STEM
DBMS store one-dimensional data
Integers Real numbers Strings
DBMS do not handle sufficiently multi-dimensional data
Boxes Polygons Points in multi-dimensional space
2
PNU
Method for Multi-dimensional Data
STEM
Common case of multi-dimensional data is points
Main idea is divide the whole space into disjoint sub-region
Sub-region contains no more than C points
C is capacity of disk page
Insertion of new points → partitioning of a region (split)
3
PNU
Classification of known methods
STEM
Position
Fixed : position of the splitting hyperplane is predetermined (grid file) Adaptable : data points determine the position of the hyperplane (k-d tree) Dimensionality
1-d cut : k-d tree
K-d cut : quad-tree, oct-tree Locality
Grid : splits not only the affected region, but also all the regions Brickwall : restrict the splitting hyperplane to extend solely inside the region Method
point quad-tree k-d tree grid file K-D-B-tree
Position
adaptable adaptable fixed adaptable
Dimension
k-d 1-d 1-d 1-d
Locality
brickwall brickwall grid brickwall 4
PNU
Methods for Rectangles
Transform into points in a higher dimension space
2 d rectangle → a point in 4-d space
k-d trees, or grid file after a rotation of the axes
Use space filling curve
Map a k-d space to a 1-d space Transform k-dimensional object to line segment (z-transform)
Divide the original space into sub-regions
Disjoint : can use method mentioned before
Overlapping : cut in two pieces and tag R-tree : First proposed use of overlapping sub-region STEM
5
PNU
R-Tree
Extension of b-tree
a1
Height balanced tree
Nodes are consist of MBR
Guarantee that space utilization is at least 50%
a2
STEM
6
PNU
R-Tree Split
Requirement of “good” split
Minimize the whole area Minimize the overlap New entry STEM
7
PNU
R-Tree Insert & Split 1 A 3 2 4 5 A B 1 2 3 4 5 5 6 7 8 8 7 6 B
STEM
8
PNU
Bad Search in R-Tree 1 A 3 2 4 5 1 2 3 4 A B 5 6 7 8 8 7 6 B
STEM
9
PNU
R+-Tree
STEM
Variant of R-tree
Avoid overlapping of internal nodes by inserting an object into multiple leaves
Leaf node : (oid, RECT)
RECT : (x low , x high , y low , y high )
Intermediate node : (p, RECT) p → pointer to a lower level node
10
PNU
Properties of R+-Tree
STEM
Properties
Subtree rooted at the node pointed to by p contains a rectangle R if and only if R is covered by RECT → only exception is when R is at a leaf node Intermediate node (p1, RECT1) and (p2, RECT2) → overlap between RECT1 , RECT2 is “0” Root has at least two children unless it is a leaf All leaves are at the same level
11
PNU
R+-Tree 1 A 3 2 4 C 5 A B C 1 2 3 4 6 7 8 4 5 8 B 7 6
STEM
12
PNU
Operations to keep the R+-tree
STEM
Searching operation
First decompose the search space into disjoint sub-region Descend the tree until the actual data object are found in the leaves
Insertion operation
Searching the tree and adding the rectangle in leaf nodes Difference from R tree → add to more than one leaf node
Deletion operation
Locating the rectangle that must be deleted and then removing it from leaf node
Node Splitting operation
Two sub-nodes cover disjoint areas Contrary to R tree → downward propagation
13
PNU
Packing Algorithm
STEM
Packing algorithm
Pack attempts to set up an R+-tree with good search performance Partition, Sweep, Pack
Selection of x_ or y_ cut for Partition
Nearest neighbor Minimal total x- and y- displacement Minimal total space coverage accured by the two sub-regions Minimal number of rectangle splits Reduce the height expansion of R+-tree Reduce the coverage of “dead space”
14
PNU
Operations to build the R+-tree
STEM
Partition operation
Decompose the total space into a locally optimal (search performance) Use the sweep routine that parallel to x or y axis
Sweep operation
Used to scan the rectangles and identify points where space partitioning is possible
Pack operation
Pack is to organize a R+-tree depends on a set S of rectangles and the fill-factor ff of the tree. Recursively pack the entries of each level of the tree from bottom up
In each level, partitioning non-leaf nodes and some of the rectangles have been split because of the chosen partition, recursively propagate the split downward and if necessary propagate the changes upward also.
15
PNU
Analysis
STEM Disk access for Two-Size Segments : Point Query Disk access for Two-Size Segments : Segment Query
16
PNU
Summary
Advantage of R+-tree
Improved search performance, especially in point query More than 50% saving in disk access
Disadvantage of R+-tree
Tree height is more than R-tree Use more space (duplicate node) STEM
17