Software analysis foundation & homework-2

0x00 context-insensitive pointer analysis

This homework will accomplish a context-insensitive pointer analysis, and build a call graph. This call graph will be more precise than call graph build by CHA. This pointer analysis algorithm handles 4 type pointers (local variables[Assign]/instance field[Store and Load]/instance call[]/static field/array index) and static method call.

basic knowledge

Pointer analysis will affect many types of statements, look at this table.

statement type instance
New x = new T()
Assign x = y
Store x.f = y
Load y = x.f
Call r = x.k(a, …)

Some domain and Notation characteristics are filled in below table:

domain name notations explanation
variables $x,y \in V$ nop
fields $f,g\in F$ nop
objects $o_i,o_j \in O$ nop
instance field $o_i.f,o_j.g \in O \times F$ nop
pointers $Pointer = V \cup (O \times F)$ nop
pointers-to relations $pt: Pointer \rightarrow P(O)$ P(O) denotes powerset of O. If we have pt(x), it denotes: many objects have been give to x (x may points to many objects), such as x=k;x=o.l();…, so many objects denotes k,o.l(),…. These objects is pt(x).

After above introduction, we can build rules for different statement types:

image-20240130182513865

Pointer analysis is to propagate points-to information among pointers (variables & fields). For example, in Load scenario, $o_j \in pt(o_i.f)$ will changed to $o_j \in pt(y)$. So the key to implementation is determined: when $pt(x)$ is changed, propagate the changed part to related pointers of $x$.

Pointer flow graph (PFG) of a program is a directed graph that express how objects flow among the pointers in the program. PFG’s nodes is $pointer$, and edges is $pointer \times pointer$. Edge $x \rightarrow y$ represents that the objects pointed by x may flow to y. Look at below table, we can know better:

image-20240130184316878

For example, we can build a call graph for the following program:

image-20240130185239374

With PFG, pointer analysis can be solved by computing transitive closure of the PFG, which means: if $e$ is reachable from $b$ on the PFG, then the objects pointed by $b$ may flow to and also be pointed by $e$.

Then, we may conclude method to implement pointer analysis. (1) Build pointer flow graph (PFG). (2) Propagate points-to information on PFG. But these two steps are not separated, but mutually dependent. The example above seems like do step (2), but it also dependent to $o_i \in pt(c), o_i \in pt(d)$. In conclusion, PFG is dynamically updated during pointer analysis.

The pointer analysis algorithm is displayed below:

image-20240130191654851

I think we may feel confused when we first see this algorithm. Let’s understand it. Worklist contains the point-to information to be processed, each worklist entry $[n,pts]$ is a pair of pointer $n$ and points-to set $pts$, which means that pts should be propagated to pt(n). AddEdge (red box) is build edge between two pointers, and if $s \rightarrow t$, we will know that $s$’s pointer-to should passed to $t$.

And as we get an item from worklist, we want to calculate what we need to pass (differential propagation), that means $\Delta=pts-pt(n)$. Then, we will call Propagate (green box). In this function, we need do: (1) rich $pt(n)$ using $\delta$. (2) for $n$’s successors, we need add successors and $\Delta$ into worklist. We know that it has 4 types of statements, black box will handle New and Assign type, while other code will handle Store and Load types.

In the above pages, we discuss local variables/instance field’s pointer analysis. Then, we discuss pointer analysis in the presence of method invocation. We know that if we want to do pointer analysis, we need call graph. Even in inter-procedural pointer analysis, we also need call graph. There are two ways to build call graph, for example, we have code b = a.bar().

(1) CHA. It resolves call targets based on declared type $a$, this way is not imprecise, introduce spurious call graph edges and points-to relations.

(2) Pointer analysis. It resolves call targets based on $pt(a)$, this way is more precise than CHA, this call graph is also dynamic build.

The graph below shows Call rule:

image-20240130221520243

As shown above, dotted line ($x\rightarrow m_{this}$) is not added to edges. Why? Assume $pt(x)={new\quad A, new\quad B, new\quad C}; x.foo()$. If we add edges, we need to add 3 edge (for A/B/C), which will introduce spurious points-to relations. If we combine above algorithm with inter-procedural pointer analysis, we need build call graph to do pointer analysis, algorithm will change to:

image-20240201184026425

In this homework, we may need to analysis more types statement: static related, array index and static method call. For static related, the rule is:

image-20240201192709926

For array index, we don’t distinguish array index (location). If $o_i$ represents an array object, then we use $o_i[*]$ to represent a pointer that points to all objects in array. The rule is:

image-20240201193341092

Static method handling is similar to instance method, except: (1) we needn’t to do $dispatch$ to resolve callee method. (2) we needn’t to pass receiver object. The rule is:

image-20240201193803565

I pass this section because we can see in link1.

code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
class Solver {
// ...

private void addReachable(JMethod method) {
// TODO - finish me
if (!callGraph.contains(method)) {
// add m to RM, S = S merges with S_m
callGraph.addReachableMethod(method);

List<Stmt> methodStmt = method.getIR().getStmts();
for (Stmt s: methodStmt) {
if (s instanceof New) {
((New)s).accept(stmtProcessor);
} else if (s instanceof Copy) {
((Copy)s).accept(stmtProcessor);
} else if (s instanceof LoadField) {
((LoadField)s).accept(stmtProcessor);
} else if (s instanceof StoreField) {
((StoreField)s).accept(stmtProcessor);
} else if (s instanceof Invoke) {
((Invoke)s).accept(stmtProcessor);
}
}
}
}


private class StmtProcessor implements StmtVisitor<Void> {
@Override
public Void visit(New stmt) {
// x = new T();
Optional<LValue> tmp1 = stmt.getDef();
if (tmp1.isPresent()) {
LValue tmp2 = tmp1.get();
if (tmp2 instanceof Var) {
workList.addEntry(pointerFlowGraph.getVarPtr((Var)tmp2), new PointsToSet(heapModel.getObj((New)stmt)));
}
}
return null;
}

@Override
public Void visit(Copy stmt) {
// x = y;
Var tmp1 = stmt.getLValue();
Var tmp2 = stmt.getRValue();
addPFGEdge(pointerFlowGraph.getVarPtr(tmp2), pointerFlowGraph.getVarPtr(tmp1));
return null;
}

@Override
public Void visit(LoadField stmt) {
// x = T.a;
FieldRef tmp1 = stmt.getFieldRef();
if (tmp1.isStatic()) {
JField tmp2 = tmp1.resolve();
Var tmp3 = stmt.getLValue();
addPFGEdge(pointerFlowGraph.getStaticField(tmp2), pointerFlowGraph.getVarPtr(tmp3));
}
return null;
}

@Override
public Void visit(StoreField stmt) {
// T.a = x;
FieldRef tmp1 = stmt.getFieldRef();
if (tmp1.isStatic()) {
JField tmp2 = tmp1.resolve();
Var tmp3 = stmt.getRValue();
addPFGEdge(pointerFlowGraph.getVarPtr(tmp3), pointerFlowGraph.getStaticField(tmp2));
}
return null;
}

@Override
public Void visit(Invoke stmt) {
// x = T.b(b1, b2, ...);
if (stmt.isStatic()) {
JMethod m = resolveCallee(null, stmt);
if (!callGraph.getCalleesOf(stmt).contains(m)) {
Edge<Invoke, JMethod> e = new Edge<>(getCallKind((stmt).getInvokeExp()), stmt, m);
callGraph.addEdge(e);
addReachable(m);

// argument
List<Var> args_source = stmt.getInvokeExp().getArgs();
List<Var> args_target = m.getIR().getParams();
assert args_source.size() == args_target.size();
for (int ii = 0; ii < args_source.size(); ii++) {
Var source = args_source.get(ii);
Var target = args_target.get(ii);
addPFGEdge(pointerFlowGraph.getVarPtr(source), pointerFlowGraph.getVarPtr(target));
}

// return
if (stmt.getLValue() != null) {
List<Var> ret_vars = m.getIR().getReturnVars();
for (Var v: ret_vars) {
addPFGEdge(pointerFlowGraph.getVarPtr(v), pointerFlowGraph.getVarPtr(stmt.getLValue()));
}
}
}
}
return null;
}
}

private void addPFGEdge(Pointer source, Pointer target) {
// TODO - finish me
if (!pointerFlowGraph.getSuccsOf(source).contains(target)) {
pointerFlowGraph.addEdge(source, target);
if (!source.getPointsToSet().isEmpty()) {
workList.addEntry(target, source.getPointsToSet());
}
}
}

private void analyze() {
// TODO - finish me
while (!workList.isEmpty()){
//remove
var entry = workList.pollEntry();
//propagate
PointsToSet subSet = propagate(entry.pointer(),entry.pointsToSet());

//load store
if(entry.pointer() instanceof VarPtr varPtr){
Var var = varPtr.getVar();
for(Obj obj : subSet){
//x.f = y
for(StoreField storeField : var.getStoreFields()){
if(!storeField.isStatic())
addPFGEdge(pointerFlowGraph.getVarPtr(storeField.getRValue()),pointerFlowGraph.getInstanceField(obj,storeField.getFieldRef().resolve()));
}
//y = x.f
for(LoadField loadField : var.getLoadFields()){
if(!loadField.isStatic())
addPFGEdge(pointerFlowGraph.getInstanceField(obj,loadField.getFieldRef().resolve()),pointerFlowGraph.getVarPtr(loadField.getLValue()));
}
//store array x[i] = y
for(StoreArray storeArray : var.getStoreArrays()){
addPFGEdge(pointerFlowGraph.getVarPtr(storeArray.getRValue()),pointerFlowGraph.getArrayIndex(obj));
}
//load array y = x[i]
for(LoadArray loadArray : var.getLoadArrays()){
addPFGEdge(pointerFlowGraph.getArrayIndex(obj), pointerFlowGraph.getVarPtr(loadArray.getLValue()));
}
//call (not static)
processCall(var,obj);
}
}

}
}

private PointsToSet propagate(Pointer pointer, PointsToSet pointsToSet) {
// TODO - finish me

// get diff
PointsToSet diff = new PointsToSet();
if (!pointsToSet.isEmpty()) {
PointsToSet tmp1 = pointer.getPointsToSet();
for (Obj i : pointsToSet) {
if (!tmp1.contains(i)) {
diff.addObject(i);
}
}
}

if (!diff.isEmpty()) {
PointsToSet tmp1 = pointer.getPointsToSet();
for (Obj i : diff) {
tmp1.addObject(i);
}
Set<Pointer> tmp2 = pointerFlowGraph.getSuccsOf(pointer);
for (Pointer i: tmp2) {
workList.addEntry(i, diff);
}
}
return diff;
}

private void processCall(Var var, Obj recv) {
// TODO - finish me

for(Invoke i : var.getInvokes()) {
JMethod m = resolveCallee(recv, i);
workList.addEntry(pointerFlowGraph.getVarPtr(m.getIR().getVar(0)), new PointsToSet(recv));
if (!callGraph.getCalleesOf(i).contains(m)) {
Edge<Invoke, JMethod> e = new Edge<>(getCallKind((i).getInvokeExp()), i, m);
callGraph.addEdge(e);
addReachable(m);

// argument
List<Var> args_source = i.getInvokeExp().getArgs();
List<Var> args_target = m.getIR().getParams();
assert args_source.size() == args_target.size();
for (int ii = 0; ii < args_source.size(); ii++) {
Var source = args_source.get(ii);
Var target = args_target.get(ii);
addPFGEdge(pointerFlowGraph.getVarPtr(source), pointerFlowGraph.getVarPtr(target));
}

// return
if (i.getLValue() != null) {
List<Var> ret_vars = m.getIR().getReturnVars();
for (Var v: ret_vars) {
addPFGEdge(pointerFlowGraph.getVarPtr(v), pointerFlowGraph.getVarPtr(i.getLValue()));
}
}
}
}
}

public static CallKind getCallKind(InvokeExp invokeExp) {
if (invokeExp instanceof InvokeVirtual) {
return CallKind.VIRTUAL;
} else if (invokeExp instanceof InvokeInterface) {
return CallKind.INTERFACE;
} else if (invokeExp instanceof InvokeSpecial) {
return CallKind.SPECIAL;
} else if (invokeExp instanceof InvokeStatic) {
return CallKind.STATIC;
} else if (invokeExp instanceof InvokeDynamic) {
return CallKind.DYNAMIC;
} else {
throw new AnalysisException("Cannot handle InvokeExp: " + invokeExp);
}
}
}

image-20240208155318581

留言

© 2024 wd-z711

⬆︎TOP