PPT slides, 498 KB

Download Report

Transcript PPT slides, 498 KB

Pointer Range
Analysis
Suan Hsi Yong
Susan Horwitz
UNIVERSITY OF WISCONSIN – MADISON
1
Range Analysis
• Computes the
range of possible
values for variables
in the program
i = 7
i  [7,7]
i = 3
i  [3,3]
if(…)
i  [3,3]
i  [3,7]
a[i] = 0
2
Range Analysis
• Can eliminate unnecessary bounds checks/
detect potential out-of-bounds errors
char a[12], i;
i = 3;
i  [3,3]
if(...){
i  [7,7]
i = 7;
}
i  [3,7]
a[i] = 0;
(in bounds)
3
Range Analysis
• Can identify non-aliasing array accesses
(for parallelization, optimization)
i = 3;
i  [3,3]
j = 7;
j  [7,7]
a[i] = ...;
a[j] = ...;
4
Range Analysis
• Previous work focused on Fortran, Java
– array accesses restricted to array subscripts
– array types well-defined, restricted
• Our work: Extend to C
– pointers and pointer arithmetic:
an alternative way to access arrays
– type casting: pointer of one type can point to
array of a different type
– malloc array type? what is an array?
5
Dataflow Analysis
• Facts: mapping from variables to
ranges
{vr}
• r represents superset of possible
values of v
• lattice of ranges:
– T = empty set,  = universal set
–  = set union
6
Example with Pointers
char a[12], *p;
p = &a[3];
p  a, [3,3]
if(…){
p = &a[7];
}
p  a, [7,7]
*p = 0;
p  a, [3,7]
a
0
1
2
3
4
5
6
7
8
9
10
11
12
(in bounds)
7
Location-Offset Representation
x, [l,u]
represents the range
from &x + l
to &x + u
x
l·x
u·x
• Cannot handle:
– multiple targets
– mismatched-type arithmetic
8
Multiple Targets
char a[12], b[8];
char *p;
p = &a[3];
p  a, [3,3]
if(…){
p = &b[7];
}
p  b, [7,7]
*p = 0;
p
(not in bounds)
9
Mismatched-type Arithmetic
int a[2];
a
char *p, *q;
p =(char*)a;
q = p + 6;
*q = 0;
p  a, [0,0]
(&a + 0 · int)
q  a, [1,2]
(&a + 6 · char)
= (&a +  · int)
(not in bounds)
0
1
2
0
1
2
3
4
5
6
7
8
p
q
10
Descriptor-Offset Representation
x:[], [l,u]
represents the range
from &x + l · 
to &x + u · 
x
l·
u·
·
where x is treated as an array
with  elements each of type 

11
Descriptor-Offset Representation
• x:[], [l,u] represents the range from
&x + l ·  to &x + u · 
• ?:[], [l,u] represents pointer to
unknown location of type []
plus [ l · , u ·  ]
• NULL, [l,u] represents integer range [l,u]
12
Multiple Targets
char a[12], b[8];
char *p;
p = &a[3];
p  a, [3,3]
if(…){
p = &b[7];
}
p  b, [7,7]
*p = 0;
p
(not in bounds)
13
Multiple Targets
char a[12], b[8];
char *p;
p = &a[3];
p  a:char[12], [3,3]
if(…){
p = &b[7];
}
p  b:char[8], [7,7]
*p = 0;
p  ?:char[8], [3,7]
(in bounds)
14
Mismatched-type Arithmetic
int a[2];
a
char *p, *q;
0
p  a, [0,0]
1
q = p + 6;
q  a, [1,2]
2
*q = 0;
(not in bounds)
p =(char*)a;
0
1
2
3
4
5
6
7
8
p
q
15
Mismatched-type Arithmetic
int a[2];
a
char *p, *q;
0
p =(char*)a;
p  a:int[2], [0,0]
1
q = p + 6;
2
*q = 0;
0
1
2
3
4
5
6
7
8
p
q
q  a:char[8], [6,6]
(in bounds)
16
Outline
• Range Representation
– Integer Intervals [l,u]
– Location-Offset x, [l,u]
– Descriptor-Offset x:[], [l,u]
• Pointer Arithmetic
– Well-typed
– Mismatched-typed
• Experiments
• Conclusion
17
Pointer Arithmetic
• Most challenging aspect of pointer
range analysis
• e.g., p = q + i
– must map p to range representing possible
values of q+i
• complication from type casting:
– pointer of one type can point to array of a
different type
18
C Additive Operators
• +ii : int  int → int
• -ii : int  int → int
• +pi : *  int → *
p+pi i  p + (i · )
• -pi : *  int → *
p-pi i  p – (i · )

• -pp
: *  * → int

p-pp
q  (p – q) / 
• (not defined: * + *, int - *)
19
Well-typed Arithmetic
char
q = p +pi i
before:
p  a:char[9], [4,5]
i  NULL, [2,3]
a
p
after:
char
q  a:char[9], [4,5] +pi NULL, [2,3]
0
1
2
3
4
5
6
7
8
9
20
Well-typed Arithmetic
char
q = p +pi i
before:
p  a:char[9], [4,5]
i  NULL, [2,3]
a
p
q
after:
char
q  a:char[9], [4,5] +pi NULL, [2,3]
= a:char[9], [4+2,5+3]
= a:char[9], [6,8]
0
1
2
3
4
5
6
7
8
9
2
3
21
Mismatched-type Arithmetic
char
a:int[3], [0,1] +pi NULL, [3,5]
Perform one of two transformations:
 convert l.h.s. range to be char-typed
char
 x1:char['], [l', u' ] +pi NULL, [3,5]
 convert +pi to be int-typed, adjust r.h.s
int
 x1:int[3], [0,1] +pi NULL, [l',u' ]
22
Transformation (1)
convert l.h.s. to be char-typed
char
a:int[3], [0,1] +pi NULL, [3,5]

a:char[12], [0,4] +char
pi NULL, [3,5]
II
a:char[12], [3,9]
0
1
2
3
a

0
1
2
3
4
5
6
7
8
9
10
11
12
4
5
23
Transformation (2)
convert +pi to be int-typed
char
a:int[3], [0,1] +pi NULL, [3,5]

int
a:int[3], [0,1] +pi NULL, [0,2]
II
a:int[3], [0,3]
a
0
1
2
3
4
5
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
24
Choice between (1) and (2)
2
• x1:1[1], [l1,u1] +pi NULL, [l2,u2]
• Transformations (1) and (2) may lose
precision
– (1) is better if 1  2,
(2) is better if 1  2
• if exact sizes are known and one is a
multiple of the other, can choose one
that does not lose precision
25
Portable Transformations
• exact sizes of types may not be known
– e.g., source level analysis
• can still make safe approximations using
portable information, e.g.,
• 1 =  char    short    int    long 
•  char     
for all non-void type 
•  []  =    · 
• we assume all pointers are the same size
26
Transformation (1a)
convert l.h.s. to be char-typed
char
a:int[3], [0,1] +pi NULL, [3,5]

a:char[3], [0,+] +char
pi NULL, [3,5]
II
a:char[3], [3,+]
0
1
2
3
4
5
a

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20 27
Outline
• Range Representation
– Integer Intervals [x,y]
– Location-Offset a, [x,y]
– Descriptor-Offset a:[], [x,y]
• Pointer Arithmetic
– Well-typed
– Mismatched-typed
• Experiments
• Conclusion
28
Pointer Range Analysis
• Forward dataflow analysis:
inter-procedural, context-insensitive
• Infinite descending chains:
use widening/narrowing
• Exact sizes not assumed (portable)
• Goal: find in-bounds dereference:
*p where p  x:[], [l,u]
such that l  0, u  
29
fi n
m cager
at cm
xm
ppult
aem
gr cfra s
ob c
ne
t
peree tiler
rimad
et d
em er
3
md
st
bi tsp
hesor
poaltht
co w e
m r
pr bh
e
m ss
88
ks li
i
pem
rl
g
i o
vojpeg
rte
g cx
eq a c
ua rt
k
me
bz cf
ip
g
pa zi2
r p
amser
m
vp
tw pr
cr olf
a
m fty
es
gaa
p
In-bounds Dereferences (%)
(Cyclone)
(Olden)
(Spec 95)
(Spec 2000)
50
40
30
20
10
0
(Average = 18.9%)
30
fi n
m cager
at cm
xm
ppult
aem
gr cfra s
ob c
ne
t
r
pe ee tiler
rimad
e d
emter
3
md
st
t
s
bi p
hesor
poaltht
co w e
m r
pr bh
e
m ss
88
ks li
i
pem
rl
g
i o
vojpeg
rte
g cx
eq a c
ua rt
k
me
bz cf
ip
g
pa zi2
r p
amser
m
vpp
tw r
cr olf
a
m fty
es
gaa
p
In-bounds Dereferences (%)
(Cyclone)
(Olden)
Arrays (avg. 6.8%)
(Spec 95)
(Spec 2000)
50
40
30
20
10
0
Arrays+Pointers (avg. 18.9%)
31
fi n
m cager
at cm
xm
ppult
aem
gr cfra s
ob c
ne
t
peree tiler
rimad
et d
em er
3
md
st
bi tsp
hesor
poaltht
co w e
m r
pr bh
e
m ss
88
ks li
i
pem
rl
g
i o
vojpeg
rte
g cx
eq a c
ua rt
k
me
bz cf
ip
g
pa zi2
r p
amser
m
vpp
tw r
cr olf
a
m fty
es
gaa
p
In-bounds Dereferences (%)
(Cyclone)
(Olden)
Loc-Offset (avg. 16.8%)
(Spec 95)
(Spec 2000)
50
40
30
20
10
0
Descr-Offset (avg. 18.9%)
32
Other Results
• Results essentially the same when using
non-portable type assumptions:
– only 5 more in-bounds dereferences found
• Analysis times (on 1GHz PII, 500MB RAM)
– mesa (60K LOC, 8 hrs)
– gcc (200K LOC, 45 min)
– gap (71K LOC, 10 min)
– vortex (67K LOC, 4 min)
– most are fast (< 1 sec)
33
Conclusion
• C complicates range analysis with
pointers, pointer-arithmetic, casting
• Descriptor-Offset representation
a:[], [x,y] is better than the
intuitive Location-Offset a, [x,y]
representation in practice
• Results almost as precise using
portable assumptions
34
Future Extensions
• Symbolic range representation,
constraints between ranges
– to capture correlation between variables
• String manipulation
– track range of “string length” attribute
– important in security (to identify potential
buffer overruns)
35
Pointer Range
Analysis
THE END
Suan Hsi Yong
Susan Horwitz
UNIVERSITY OF WISCONSIN – MADISON
36
Related Work
• Generalized Constant Propagation
[Verbrugge+’96]
• Symbolic Range Analysis with Linear
Programming [Rugina/Rinard’00]
• Value Set Analysis [Balakrishnan/Reps’04]
• String manipulation for Buffer Overrun
Detection [Wagner+’00, Dor+’01]
37
Comparisons
• Location needed for pointer comparison
p  a:char[9], [0,+]
q  a:char[9], [5,5]
if(p <= q){
p  a:char[9], [0,5]
q  a:char[9], [5,5]
…
38
Comparisons
• Location needed for pointer comparison
p  ?:char[9], [0,+]
q  ?:char[9], [5,5]
if(p <= q){
p  ?:char[9], [0,+]
q  ?:char[9], [5,5]
…
39
fi n
m cager
at cm
xm
ppult
aem
gr cfra s
ob c
ne
t
r
pe ee tiler
rimad
e d
emter
3
md
st
t
s
bi p
hesor
poaltht
co w e
m r
pr bh
e
m ss
88
ks li
i
pem
rl
g
i o
vojpeg
rte
g cx
eq a c
ua rt
k
me
bz cf
ip
g
pa zi2
r p
amser
m
vpp
tw r
cr olf
a
m fty
es
gaa
p
In-bounds Dereferences (%)
(Cyclone)
(Olden)
No Predicates (avg. 14.1%)
(Spec 95)
(Spec 2000)
50
40
30
20
10
0
Handle Predicates (avg. 18.9%)
40
Loop Example
for(p = &a[0]; p < &a[10]; ++p){
*p = ...;
p  a:char[10], [0,9]
}
41
Questions/Answers
• Dataflow fact domain
– variables and structure fields
– arrays: single representative
– malloc objects: single representative per
callsite
• Assignment via *p
– use One-level Flow Points-to Analysis
– extension: use results of this analysis
• but: must find fixed-point first?
42