This homework will accomplish a context-insensitive pointer analysis, and build a call graph. This call graph will be more precise than call graph build by CHA. This pointer analysis algorithm handles 4 type pointers (local variables[Assign]/instance field[Store and Load]/instance call[]/static field/array index) and static method call.
basic knowledge
Pointer analysis will affect many types of statements, look at this table.
statement type
instance
New
x = new T()
Assign
x = y
Store
x.f = y
Load
y = x.f
Call
r = x.k(a, …)
Some domain and Notation characteristics are filled in below table:
domain name
notations
explanation
variables
$x,y \in V$
nop
fields
$f,g\in F$
nop
objects
$o_i,o_j \in O$
nop
instance field
$o_i.f,o_j.g \in O \times F$
nop
pointers
$Pointer = V \cup (O \times F)$
nop
pointers-to relations
$pt: Pointer \rightarrow P(O)$
P(O) denotes powerset of O. If we have pt(x), it denotes: many objects have been give to x (x may points to many objects), such as x=k;x=o.l();…, so many objects denotes k,o.l(),…. These objects is pt(x).
After above introduction, we can build rules for different statement types:
Pointer analysis is to propagate points-to information among pointers (variables & fields). For example, in Load scenario, $o_j \in pt(o_i.f)$ will changed to $o_j \in pt(y)$. So the key to implementation is determined: when $pt(x)$ is changed, propagate the changed part to related pointers of $x$.
Pointer flow graph (PFG) of a program is a directed graph that express how objects flow among the pointers in the program. PFG’s nodes is $pointer$, and edges is $pointer \times pointer$. Edge $x \rightarrow y$ represents that the objects pointed by x may flow to y. Look at below table, we can know better:
For example, we can build a call graph for the following program:
With PFG, pointer analysis can be solved by computing transitive closure of the PFG, which means: if $e$ is reachable from $b$ on the PFG, then the objects pointed by $b$ may flow to and also be pointed by $e$.
Then, we may conclude method to implement pointer analysis. (1) Build pointer flow graph (PFG). (2) Propagate points-to information on PFG. But these two steps are not separated, but mutually dependent. The example above seems like do step (2), but it also dependent to $o_i \in pt(c), o_i \in pt(d)$. In conclusion, PFG is dynamically updated during pointer analysis.
The pointer analysis algorithm is displayed below:
I think we may feel confused when we first see this algorithm. Let’s understand it. Worklist contains the point-to information to be processed, each worklist entry $[n,pts]$ is a pair of pointer $n$ and points-to set $pts$, which means that pts should be propagated to pt(n). AddEdge (red box) is build edge between two pointers, and if $s \rightarrow t$, we will know that $s$’s pointer-to should passed to $t$.
And as we get an item from worklist, we want to calculate what we need to pass (differential propagation), that means $\Delta=pts-pt(n)$. Then, we will call Propagate (green box). In this function, we need do: (1) rich $pt(n)$ using $\delta$. (2) for $n$’s successors, we need add successors and $\Delta$ into worklist. We know that it has 4 types of statements, black box will handle New and Assign type, while other code will handle Store and Load types.
In the above pages, we discuss local variables/instance field’s pointer analysis. Then, we discuss pointer analysis in the presence of method invocation. We know that if we want to do pointer analysis, we need call graph. Even in inter-procedural pointer analysis, we also need call graph. There are two ways to build call graph, for example, we have code b = a.bar().
(1) CHA. It resolves call targets based on declared type $a$, this way is not imprecise, introduce spurious call graph edges and points-to relations.
(2) Pointer analysis. It resolves call targets based on $pt(a)$, this way is more precise than CHA, this call graph is also dynamic build.
The graph below shows Call rule:
As shown above, dotted line ($x\rightarrow m_{this}$) is not added to edges. Why? Assume $pt(x)={new\quad A, new\quad B, new\quad C}; x.foo()$. If we add edges, we need to add 3 edge (for A/B/C), which will introduce spurious points-to relations. If we combine above algorithm with inter-procedural pointer analysis, we need build call graph to do pointer analysis, algorithm will change to:
In this homework, we may need to analysis more types statement: static related, array index and static method call. For static related, the rule is:
For array index, we don’t distinguish array index (location). If $o_i$ represents an array object, then we use $o_i[*]$ to represent a pointer that points to all objects in array. The rule is:
Static method handling is similar to instance method, except: (1) we needn’t to do $dispatch$ to resolve callee method. (2) we needn’t to pass receiver object. The rule is:
privatevoidaddReachable(JMethod method) { // TODO - finish me if (!callGraph.contains(method)) { // add m to RM, S = S merges with S_m callGraph.addReachableMethod(method);
List<Stmt> methodStmt = method.getIR().getStmts(); for (Stmt s: methodStmt) { if (s instanceof New) { ((New)s).accept(stmtProcessor); } elseif (s instanceof Copy) { ((Copy)s).accept(stmtProcessor); } elseif (s instanceof LoadField) { ((LoadField)s).accept(stmtProcessor); } elseif (s instanceof StoreField) { ((StoreField)s).accept(stmtProcessor); } elseif (s instanceof Invoke) { ((Invoke)s).accept(stmtProcessor); } } } }
privateclassStmtProcessorimplementsStmtVisitor<Void> { @Override public Void visit(New stmt) { // x = new T(); Optional<LValue> tmp1 = stmt.getDef(); if (tmp1.isPresent()) { LValuetmp2= tmp1.get(); if (tmp2 instanceof Var) { workList.addEntry(pointerFlowGraph.getVarPtr((Var)tmp2), newPointsToSet(heapModel.getObj((New)stmt))); } } returnnull; }
@Override public Void visit(Copy stmt) { // x = y; Vartmp1= stmt.getLValue(); Vartmp2= stmt.getRValue(); addPFGEdge(pointerFlowGraph.getVarPtr(tmp2), pointerFlowGraph.getVarPtr(tmp1)); returnnull; }
@Override public Void visit(LoadField stmt) { // x = T.a; FieldReftmp1= stmt.getFieldRef(); if (tmp1.isStatic()) { JFieldtmp2= tmp1.resolve(); Vartmp3= stmt.getLValue(); addPFGEdge(pointerFlowGraph.getStaticField(tmp2), pointerFlowGraph.getVarPtr(tmp3)); } returnnull; }
privatevoidaddPFGEdge(Pointer source, Pointer target) { // TODO - finish me if (!pointerFlowGraph.getSuccsOf(source).contains(target)) { pointerFlowGraph.addEdge(source, target); if (!source.getPointsToSet().isEmpty()) { workList.addEntry(target, source.getPointsToSet()); } } }
privatevoidanalyze() { // TODO - finish me while (!workList.isEmpty()){ //remove varentry= workList.pollEntry(); //propagate PointsToSetsubSet= propagate(entry.pointer(),entry.pointsToSet());
private PointsToSet propagate(Pointer pointer, PointsToSet pointsToSet) { // TODO - finish me
// get diff PointsToSetdiff=newPointsToSet(); if (!pointsToSet.isEmpty()) { PointsToSettmp1= pointer.getPointsToSet(); for (Obj i : pointsToSet) { if (!tmp1.contains(i)) { diff.addObject(i); } } }
if (!diff.isEmpty()) { PointsToSettmp1= pointer.getPointsToSet(); for (Obj i : diff) { tmp1.addObject(i); } Set<Pointer> tmp2 = pointerFlowGraph.getSuccsOf(pointer); for (Pointer i: tmp2) { workList.addEntry(i, diff); } } return diff; }
privatevoidprocessCall(Var var, Obj recv) { // TODO - finish me
for(Invoke i : var.getInvokes()) { JMethodm= resolveCallee(recv, i); workList.addEntry(pointerFlowGraph.getVarPtr(m.getIR().getVar(0)), newPointsToSet(recv)); if (!callGraph.getCalleesOf(i).contains(m)) { Edge<Invoke, JMethod> e = newEdge<>(getCallKind((i).getInvokeExp()), i, m); callGraph.addEdge(e); addReachable(m);