Software analysis foundation & homework-2

0x00 context-insensitive pointer analysis

This homework will accomplish a context-insensitive pointer analysis, and build a call graph. This call graph will be more precise than call graph build by CHA. This pointer analysis algorithm handles 4 type pointers (local variables[Assign]/instance field[Store and Load]/instance call[]/static field/array index) and static method call.

basic knowledge

Pointer analysis will affect many types of statements, look at this table.

statement type	instance
New	x = new T()
Assign	x = y
Store	x.f = y
Load	y = x.f
Call	r = x.k(a, …)

Some domain and Notation characteristics are filled in below table:

domain name	notations	explanation
variables	$x,y \in V$	nop
fields	$f,g\in F$	nop
objects	$o_i,o_j \in O$	nop
instance field	$o_i.f,o_j.g \in O \times F$	nop
pointers	$Pointer = V \cup (O \times F)$	nop
pointers-to relations	$pt: Pointer \rightarrow P(O)$	P(O) denotes powerset of O. If we have pt(x), it denotes: many objects have been give to x (x may points to many objects), such as x=k;x=o.l();…, so many objects denotes k,o.l(),…. These objects is pt(x).

After above introduction, we can build rules for different statement types:

Pointer analysis is to propagate points-to information among pointers (variables & fields). For example, in Load scenario, $o_j \in pt(o_i.f)$ will changed to $o_j \in pt(y)$. So the key to implementation is determined: when $pt(x)$ is changed, propagate the changed part to related pointers of $x$.

Pointer flow graph (PFG) of a program is a directed graph that express how objects flow among the pointers in the program. PFG’s nodes is $pointer$, and edges is $pointer \times pointer$. Edge $x \rightarrow y$ represents that the objects pointed by x may flow to y. Look at below table, we can know better:

For example, we can build a call graph for the following program:

With PFG, pointer analysis can be solved by computing transitive closure of the PFG, which means: if $e$ is reachable from $b$ on the PFG, then the objects pointed by $b$ may flow to and also be pointed by $e$.

Then, we may conclude method to implement pointer analysis. (1) Build pointer flow graph (PFG). (2) Propagate points-to information on PFG. But these two steps are not separated, but mutually dependent. The example above seems like do step (2), but it also dependent to $o_i \in pt(c), o_i \in pt(d)$. In conclusion, PFG is dynamically updated during pointer analysis.

The pointer analysis algorithm is displayed below:

I think we may feel confused when we first see this algorithm. Let’s understand it. Worklist contains the point-to information to be processed, each worklist entry $[n,pts]$ is a pair of pointer $n$ and points-to set $pts$, which means that pts should be propagated to pt(n). AddEdge (red box) is build edge between two pointers, and if $s \rightarrow t$, we will know that $s$’s pointer-to should passed to $t$.

And as we get an item from worklist, we want to calculate what we need to pass (differential propagation), that means $\Delta=pts-pt(n)$. Then, we will call Propagate (green box). In this function, we need do: (1) rich $pt(n)$ using $\delta$. (2) for $n$’s successors, we need add successors and $\Delta$ into worklist. We know that it has 4 types of statements, black box will handle New and Assign type, while other code will handle Store and Load types.

In the above pages, we discuss local variables/instance field’s pointer analysis. Then, we discuss pointer analysis in the presence of method invocation. We know that if we want to do pointer analysis, we need call graph. Even in inter-procedural pointer analysis, we also need call graph. There are two ways to build call graph, for example, we have code b = a.bar().

(1) CHA. It resolves call targets based on declared type $a$, this way is not imprecise, introduce spurious call graph edges and points-to relations.

(2) Pointer analysis. It resolves call targets based on $pt(a)$, this way is more precise than CHA, this call graph is also dynamic build.

The graph below shows Call rule:

As shown above, dotted line ($x\rightarrow m_{this}$) is not added to edges. Why? Assume $pt(x)={new\quad A, new\quad B, new\quad C}; x.foo()$. If we add edges, we need to add 3 edge (for A/B/C), which will introduce spurious points-to relations. If we combine above algorithm with inter-procedural pointer analysis, we need build call graph to do pointer analysis, algorithm will change to:

In this homework, we may need to analysis more types statement: static related, array index and static method call. For static related, the rule is:

For array index, we don’t distinguish array index (location). If $o_i$ represents an array object, then we use $o_i[*]$ to represent a pointer that points to all objects in array. The rule is:

Static method handling is similar to instance method, except: (1) we needn’t to do $dispatch$ to resolve callee method. (2) we needn’t to pass receiver object. The rule is:

I pass this section because we can see in link1.

code

class Solver {
	// ...

    private void addReachable(JMethod method) {
        // TODO - finish me
        if (!callGraph.contains(method)) {
            // add m to RM, S = S merges with S_m
            callGraph.addReachableMethod(method);

            List<Stmt> methodStmt = method.getIR().getStmts();
            for (Stmt s: methodStmt) {
                if (s instanceof New) {
                    ((New)s).accept(stmtProcessor);
                } else if (s instanceof Copy) {
                    ((Copy)s).accept(stmtProcessor);
                } else if (s instanceof LoadField) {
                    ((LoadField)s).accept(stmtProcessor);
                } else if (s instanceof StoreField) {
                    ((StoreField)s).accept(stmtProcessor);
                } else if (s instanceof Invoke) {
                    ((Invoke)s).accept(stmtProcessor);
                }
            }
        }
    }


    private class StmtProcessor implements StmtVisitor<Void> {
        @Override
        public Void visit(New stmt) {
            // x = new T();
            Optional<LValue> tmp1 = stmt.getDef();
            if (tmp1.isPresent()) {
                LValue tmp2 = tmp1.get();
                if (tmp2 instanceof Var) {
                    workList.addEntry(pointerFlowGraph.getVarPtr((Var)tmp2), new PointsToSet(heapModel.getObj((New)stmt)));
                }
            }
            return null;
        }

        @Override
        public Void visit(Copy stmt) {
            // x = y;
            Var tmp1 = stmt.getLValue();
            Var tmp2 = stmt.getRValue();
            addPFGEdge(pointerFlowGraph.getVarPtr(tmp2), pointerFlowGraph.getVarPtr(tmp1));
            return null;
        }

        @Override
        public Void visit(LoadField stmt) {
            // x = T.a;
            FieldRef tmp1 = stmt.getFieldRef();
            if (tmp1.isStatic()) {
                JField tmp2 = tmp1.resolve();
                Var tmp3 = stmt.getLValue();
                addPFGEdge(pointerFlowGraph.getStaticField(tmp2), pointerFlowGraph.getVarPtr(tmp3));
            }
            return null;
        }

        @Override
        public Void visit(StoreField stmt) {
            // T.a = x;
            FieldRef tmp1 = stmt.getFieldRef();
            if (tmp1.isStatic()) {
                JField tmp2 = tmp1.resolve();
                Var tmp3 = stmt.getRValue();
                addPFGEdge(pointerFlowGraph.getVarPtr(tmp3), pointerFlowGraph.getStaticField(tmp2));
            }
            return null;
        }

        @Override
        public Void visit(Invoke stmt) {
            // x = T.b(b1, b2, ...);
            if (stmt.isStatic()) {
                JMethod m = resolveCallee(null, stmt);
                if (!callGraph.getCalleesOf(stmt).contains(m)) {
                    Edge<Invoke, JMethod> e = new Edge<>(getCallKind((stmt).getInvokeExp()), stmt, m);
                    callGraph.addEdge(e);
                    addReachable(m);

                    // argument
                    List<Var> args_source = stmt.getInvokeExp().getArgs();
                    List<Var> args_target = m.getIR().getParams();
                    assert args_source.size() == args_target.size();
                    for (int ii = 0; ii < args_source.size(); ii++) {
                        Var source = args_source.get(ii);
                        Var target = args_target.get(ii);
                        addPFGEdge(pointerFlowGraph.getVarPtr(source), pointerFlowGraph.getVarPtr(target));
                    }

                    // return
                    if (stmt.getLValue() != null) {
                        List<Var> ret_vars = m.getIR().getReturnVars();
                        for (Var v: ret_vars) {
                            addPFGEdge(pointerFlowGraph.getVarPtr(v), pointerFlowGraph.getVarPtr(stmt.getLValue()));
                        }
                    }
                }
            }
            return null;
        }
    }

    private void addPFGEdge(Pointer source, Pointer target) {
        // TODO - finish me
        if (!pointerFlowGraph.getSuccsOf(source).contains(target)) {
            pointerFlowGraph.addEdge(source, target);
            if (!source.getPointsToSet().isEmpty()) {
                workList.addEntry(target, source.getPointsToSet());
            }
        }
    }

    private void analyze() {
        // TODO - finish me
        while (!workList.isEmpty()){
            //remove
            var entry = workList.pollEntry();
            //propagate
            PointsToSet subSet = propagate(entry.pointer(),entry.pointsToSet());

            //load store
            if(entry.pointer() instanceof VarPtr varPtr){
                Var var = varPtr.getVar();
                for(Obj obj : subSet){
                    //x.f = y
                    for(StoreField storeField : var.getStoreFields()){
                        if(!storeField.isStatic())
                            addPFGEdge(pointerFlowGraph.getVarPtr(storeField.getRValue()),pointerFlowGraph.getInstanceField(obj,storeField.getFieldRef().resolve()));
                    }
                    //y = x.f
                    for(LoadField loadField : var.getLoadFields()){
                        if(!loadField.isStatic())
                            addPFGEdge(pointerFlowGraph.getInstanceField(obj,loadField.getFieldRef().resolve()),pointerFlowGraph.getVarPtr(loadField.getLValue()));
                    }
                    //store array x[i] = y
                    for(StoreArray storeArray : var.getStoreArrays()){
                        addPFGEdge(pointerFlowGraph.getVarPtr(storeArray.getRValue()),pointerFlowGraph.getArrayIndex(obj));
                    }
                    //load array y = x[i]
                    for(LoadArray loadArray : var.getLoadArrays()){
                        addPFGEdge(pointerFlowGraph.getArrayIndex(obj), pointerFlowGraph.getVarPtr(loadArray.getLValue()));
                    }
                    //call (not static)
                    processCall(var,obj);
                }
            }

        }
    }

    private PointsToSet propagate(Pointer pointer, PointsToSet pointsToSet) {
        // TODO - finish me

        // get diff
        PointsToSet diff = new PointsToSet();
        if (!pointsToSet.isEmpty()) {
            PointsToSet tmp1 = pointer.getPointsToSet();
            for (Obj i : pointsToSet) {
                if (!tmp1.contains(i)) {
                    diff.addObject(i);
                }
            }
        }

        if (!diff.isEmpty()) {
            PointsToSet tmp1 = pointer.getPointsToSet();
            for (Obj i : diff) {
                tmp1.addObject(i);
            }
            Set<Pointer> tmp2 = pointerFlowGraph.getSuccsOf(pointer);
            for (Pointer i: tmp2) {
                workList.addEntry(i, diff);
            }
        }
        return diff;
    }

    private void processCall(Var var, Obj recv) {
        // TODO - finish me

        for(Invoke i : var.getInvokes()) {
            JMethod m = resolveCallee(recv, i);
            workList.addEntry(pointerFlowGraph.getVarPtr(m.getIR().getVar(0)), new PointsToSet(recv));
            if (!callGraph.getCalleesOf(i).contains(m)) {
                Edge<Invoke, JMethod> e = new Edge<>(getCallKind((i).getInvokeExp()), i, m);
                callGraph.addEdge(e);
                addReachable(m);

                // argument
                List<Var> args_source = i.getInvokeExp().getArgs();
                List<Var> args_target = m.getIR().getParams();
                assert args_source.size() == args_target.size();
                for (int ii = 0; ii < args_source.size(); ii++) {
                    Var source = args_source.get(ii);
                    Var target = args_target.get(ii);
                    addPFGEdge(pointerFlowGraph.getVarPtr(source), pointerFlowGraph.getVarPtr(target));
                }

                // return
                if (i.getLValue() != null) {
                    List<Var> ret_vars = m.getIR().getReturnVars();
                    for (Var v: ret_vars) {
                        addPFGEdge(pointerFlowGraph.getVarPtr(v), pointerFlowGraph.getVarPtr(i.getLValue()));
                    }
                }
            }
        }
    }

    public static CallKind getCallKind(InvokeExp invokeExp) {
        if (invokeExp instanceof InvokeVirtual) {
            return CallKind.VIRTUAL;
        } else if (invokeExp instanceof InvokeInterface) {
            return CallKind.INTERFACE;
        } else if (invokeExp instanceof InvokeSpecial) {
            return CallKind.SPECIAL;
        } else if (invokeExp instanceof InvokeStatic) {
            return CallKind.STATIC;
        } else if (invokeExp instanceof InvokeDynamic) {
            return CallKind.DYNAMIC;
        } else {
            throw new AnalysisException("Cannot handle InvokeExp: " + invokeExp);
        }
    }
}

留言

作者: wd-z711
文章链接: https://wd-2711.tech/
版权声明: 本博客所有文章除特别声明外，均采用 CC BY-NC-ND 4.0 许可协议。转载请注明出处!

⬆︎TOP

tai_e-static-analysis-with-homework-2

Software analysis foundation & homework-2

0x00 context-insensitive pointer analysis

basic knowledge

related class

code

留言