National GIS Project
Download
Report
Transcript National GIS Project
Spatial Databases
- Indexing
Spring, 2015
Ki-Joune Li
PNU
STEM
What is Indexing ?
Indexing : Fight against TIME
Example
Suppose that you have a Hamlet, and
you want to know the name of Hamlet’s father.
Without Index : Full (Sequential) Scan of the book
With Index : Direct Access to the Page
Hamlet
PNU
STEM
Some Constraints
Modern Database
Very Huge Volume : e.g. several peta bytes
Storage on Disk
Inevitable
But slow (cf. main memory) : msec. vs. nano sec.
Even in Main Memory Database System
What should we do ?
Minimize the number of Disk Access
PNU
STEM
The Objective of Indexing
Query Condition
Index
Disk Address
(Block Number)
Indexing
Database
in Disk
PNU
STEM
Classification of Indexing
According to the type of query and data
Alphanumeric query
Image
Spatial
What is the nearest post office to the Louvre Museum ?
Spatial predicate
Spatial Query
Spatial
Index
Disk Address
(Block Number)
Database
in Disk
PNU
STEM
Spatial Query
Sophisticated
Types of Spatial Query
One Scan Query
Multi-Scan Query : Join
Region Query : Containment, Intersection
K-Nearest Neighbor Query
Spatial Join
Distance Join
Spatial Query Processing
Tightly coupled with Spatial Indexing Method
PNU
STEM
Spatial Processing Strategy
Filtering and Refinement Strategy
Spatial
Query
Index
Verification of
Geometry
Candidates
Simplification
of Geometry
Filtering
Complete
Data
Refinement
1. More Light Index : e.g. < 1 M bytes
2. Remove Unnecessary Disk Accesses
Result
PNU
STEM
Classification of Spatial Indexing Methods
Hashing and Indexing
Index (in wide sense)
Space Decomposition vs. MBR
Hashing, Indexing (in narrow sense)
Decomposition of a space : Whole Space
Bounding Rectangle : Only Interesting Area
Dimensionality
No Transformation
to Higher Dimension
To Lower Dimension : Linearization
PNU
STEM
Indexing vs. Hashing
Hashing
1. b = h(r.key)
2. Store(r, b)
Block number is determined by hashing function or mechanism
Only for primary index
Search by a hashing function
Indexing (in narrow sense)
1. b = Store(r )
2. Insert(B, (r.key, b) )
Block number is independent from indexing mechanism
For primary or secondary index
Search by a data structure called index
PNU
STEM
Decomposition vs. Bounding Region
Decomposition
Bounding Region
PNU
STEM
Decomposition Methods
Grid File : An Extension of Hashing to 2-D
Variation
Fixed Grid
Grid File
Multi-Level Grid File
Hierarchical Data Structure
KD-tree
Quadtree
skd-tree
etc.
PNU
STEM
Fixed Grid
Most Simple Method
Minimum Data for Hashing
1 Disk Page
Query Window
40
1. Find intersecting grids
2. Find corresponding blocks
30
3. Read objects from the blocks
20
4. Refinement
10
0
0
10
20
30
40
50
PNU
STEM
Problems of Fixed Grid
Only for Point Object
Object with measure : duplicated storage
Degrade performance
Query Window
Large Dead Space
Causes
Unnecessary
Disk Accesses
40
30
20
Not very Flexible
On Distribution
10
0
0
10
20
30
40
50
PNU
STEM
Grid File
To overcome problems of Fixed Grid
Reduce Dead Space within a cell
Increase Blocking Factor
Query Window
Directory
40
Grid
Boundary
Block#
28
A
(0,0),(15,20)
Page 0
20
B
(15,0),(30,20)
Page 1
...
...
...
I
(30,28),(50,40)
Page 15
0
0
15 20
30
50
PNU
STEM
Blocking Factor
A Key Factor on performance
Number of Objects in a Disk Block
Bf
N bloc k s
Number of Disk Accesses
DA
N Tota lO bje c ts
N S e le c te d
Bf
How to increase Bf ?
Increase Block Size : not always possible
Packing
PNU
STEM
Problems of Fixed Grid
Only for Point Object
Still Large Dead Space
Large Size of Directory
Directory
Grid
Boundary
Block#
A
(0,0),(15,20)
Page 0
B
(15,0),(30,20)
Page 1
...
...
...
I
(30,28),(50,40)
Page 15
PNU
Hierarchical Decomposition
To overcome the size of directory in Grid File
Hierarchical Structure of Directory
Acceleration of Search
STEM
PNU
STEM
KD-tree : Index
Extension of Binary Tree to K-Dimension (K=2 for us)
Example : suppose Bf =3
B
A Directory
E
=<
x=20
<
y=10
y=20
15
10
A
x=30
B
E
D
A
C
20
C
D
30
Each leaf node points to the disk page
PNU
STEM
KD-tree : Search
B
E
=<
x=20
<
y=10
y=20
15
10
A
x=30
B
E
D
A
C
20
30
C
D
PNU
STEM
Weak Points of KD-tree
Only for Point Objects
Dead Space
How to Store Tree Structure on Disk Space
Blocking Problem
Widely used for main memory index
Rarely used for disk resident index
B
E
Unbalanced Tree
Zipf’s Law (or 80/20 law)
Most events are concentrated
Leads highly skewed tree
D
A
C
PNU
STEM
Quadtree
Extension of KD-tree :
KD-tree : binary split
Quadtree 4-way equi-split instead
Example : Bf =3
C
D
F
A
B
E
H
J
B
F
C
D
E
G
H
I
J
G
A
I
Each leaf node points to the disk page
PNU
STEM
Weak Points of Quadtree
Same Problems of KD-tree
Only for Point Objects
Dead Space
How to Store Tree Structure on Disk Space
In addition to the lack of flexibility
Blocking Problem
Widely used for main memory index
Rarely used for disk resident index
Unbalanced Tree
Zipf’s Law (or 80/20 law)
Most events are concentrated
Leads highly skewed tree
PNU
STEM
Point Quadtree
A Simple Variation of Quadtree
Specification of Partition Point instead of equi-split
More Adaptive to the distribution of objects
Less Skewed
(10,20)
(5,25)
A
(5,25)
F
(35,10)
(10,20)
B
(35,10)
C
D
E
G
H
I
J
PNU
STEM
Linear Quadtree : Space-Filling Curve
Quadtree but another representation
Linearization by Space-Filling Curve
11
6
13
N-order
Hilbert
Column-wise
Linearize points(or cells) by their peano-key
PNU
STEM
Linear Quadtree
Example : N-order curve
Computation of Peano-Key : Bit-Interleaving
11
1. Binary representation of coordinates (10,01)
10
2. Bit-Interleaving
x=1
0
y=
0
1
01
00
Peano key
00
01
10
11
=1 0 0 1
=9
PNU
STEM
MBR Methods
MBR (Minimum Bounding Box)
Two dimensional geometric simplification of objects
Not the Whole space,
only in the region occupied by objects
(X1max, X2max )
(X1min, X2min)
R-tree and its variants
PNU
STEM
R-tree
B
R-tree
C
E
A
H
F
G
I
B
C
D
D
E
J
K
F
G
H
I
J
A
Leaf node points to the disk page
2-D Objects
Construction of R-tree : Sequence of Insertion
Upward Split
K
PNU
STEM
Splitting in R-tree
Split MBR in the case of overflow
Line sweeping : Compare Cost-X and Cost-Y
Splitting Line
New MBR
• Cost Measure
Area,
Perimeter
Overlapping Area
PNU
STEM
R-tree : Query Processing
B
C
E
H
F
B
I
G
J
A
E
D
F
C
G
H
D
I
J
K
A
Query
Region W
Candidate
Read its exact geometry from databaseCandidate
Refinement
Sample : http://www.dbnet.ece.ntua.gr/~mario/rtree/
K
PNU
STEM
Strength of R-tree
For point and non-point Objects
Good for non-uniform distribution
Paged Tree
Hierarchical Structure but Balanced
Less Dead Space than Decomposition Methods
B
C
E
H
F
I
G
J
E
D
D
K
A
C
PNU
STEM
Weak Points of R-tree : Overlapping Area
Overlapping : False Matching
Query
Region
M
A
B
J
A
B
K
C
D
E
C
L
F
G
H
I
G
L
J
D
H
I
K
E
False Matching : Visit unnecessary node
Performance Degradation
F
M
PNU
STEM
Weak Points of R-tree : Dead Space
Query
Region
A
B
C
G
L
J
D
E
H
I
K
F
M
At least one visit at this node (K) even though there is nothing
PNU
STEM
Weak Points of R-tree : Bad Split
50:50 Split
Good Split
Bad Split
1. Make them as COMPACT as possible
2. Preserve spatial proximity as possible
PNU
Improvement of R-tree
Minimize
Overlapping area
Dead Space
Or Make it more COMPACT
Preserve Spatial Proximity
Two approaches
Packing (or Bulk Loading)
Good Split or Insertion Strategies
STEM
PNU
STEM
R*-tree : An Improvement of R-tree
Re-Insertion Strategy on Overflow
Overflow
Newly Inserted Object
Delete and Re-Insert this
PNU
STEM
R*-tree : An Improvement of R-tree
Re-Insertion Strategy on Overflow
More Compact
Re-Inserted Object
PNU
STEM
R*-tree : An Improvement of R-tree
R*-tree
Compact
Small Overlapping Area
Small Sum of MBR area or perimeters
Small Dead Space
Stable : Not very affected by the order of insertions
The most widely used spatial indexing method
PNU
STEM
Packing R-tree : Improvement of R-tree
Preprocessing for making R-tree more compact
Hilbert R-tree
STR (Sort-Tile Recursive)
Uniformization
Instead of Sequential Insertions
PNU
STEM
Hilbert Packing
Hilbert Curve
A Space Filling Curve
N-order
Hilbert
Column-wise
Linearize spatial objects by their peano-key
PNU
Hilbert Packing
Hilbert Packing
Sort objects by Hilbert key
Packing by round-robin way
Maximize storage utilization
Minimum Dead Space, and Sum of MBR area
Example: Bf =3
STEM
PNU
STEM
STR (Sort-Tile Recursive)
Basic idea : “tile” the data spacer / n
slices
r : number of rectangles
n : blocking factor
P ( leaf node page ) = r / n
Example
Suppose r = 25, n =3
nTile = 9,
nV = 3, nH = 3
using vertical
PNU
STEM
Comparison : Hilbert Packing vs. STR
HP
Large Objects
STR
HP
Points
STR
PNU
Uniformization
Non-Uniform Distribution
Uniformization Technique
Negative Effect on the performance
But in real applications : Non-Uniform
Step 1 : Transform Non-Uniform data to Uniform by STR
Step 2 : Apply R-tree (or Fixed Grid)
Step 3 : Transform Query Region
Strength
High Storage Utilization
Very Simple and Good Performance
STEM
PNU
STEM
Uniformization
Non Equi-Width
Equi-Width
1. Area of each cell : identical
2. Number of objects within each cell : almost identical
PNU
STEM
Uniformization : Example
Original
By Delaunay
Triangulation
By STR
PNU
STEM
Uniformization : Example
400
80
350
70
300
60
250
50
200
40
150
30
100
S 19
50
S 19
10
By STR
17
13
9
1
17
13
5
9
Original
S1
S 10
0
5
S 10
0
1
20
S1
PNU
STEM
Query Processing by R-tree :
Nearest Neighbor
Searching Space
Minimum
Query Point
2nd Distances in 2-D
PNU
STEM
Query Processing by R-tree :
Nearest Neighbor
Branching
Branching
Pruning
Minimum
PNU
STEM
Transformation to Higher Space
Transformation to Higher Dimension
Transform non-point object to point object
Reuse of spatial indexing methods (e.g. Grid File) applicable
only to point objects to non-point objects
Example
Max
C
B
B
A
Amin
A
C
Amax
Min
PNU
STEM
Corner Transformation
From 2-D to 4-D
(Xmax, Ymax)
(Xmin, Ymin)
1. Simplification by MBR
2. MBR ((Xmin, Ymin), (Xmax, Ymax)) to Point (Xmin, Ymin, Xmax, Ymax)
PNU
STEM
Query Processing for Corner
Transformation : 1-D Example
Query :
Find Contained Objects
A
VI
A
IV III
V
II
I
W
Amin
Max
Min
Amax
Region I
Region II
Region III
Region IV
Region V
Region VI
: Wmax < Amin
:WA
: Amax < Wmin
: Amin < Wmin, Amax < Wmax
: Wmin < Amin, Wmax < Amax
:AW
PNU
STEM
Transformation to Lower Dimension :
Linear Quadtree
1. Simplification of Geometry
(22, 0)
(23, 0)
(28, 1)
2. Compute Peano Key
with lower-left corner
3. If necessary, divide it and
give peano key to each
4. Define the size of each
piece according to the
number of quadrants
4. Insert them into B-tree
5. Query Processing by B-tree
(0, 2)