NJU Static Program Analysis 05: Data Flow Analysis III

Abstract

Understand the functional view of iterative algorithm
The definitions of lattice and complete lattice
Understand the fixed-point theorem
How to summarize may and must analyses in lattices (in next lecture)
The relation between MOP and the solution produced by the iterative algorithm (in next lecture)
Constant propagation analysis (in next lecture)
Worklist algorithm (in next lecture)

Notes

In this lecture, we will learn about some mathematical, formal and theoretical foundations about data flow analysis. To start with, recalling the following concepts is necessary:

Partial Order

A poset is defined as a pair ((P, sqsubseteq)), where (P) is a set, and (sqsubseteq) is a binary relation that defines a partial ordering over (P) which has the properties of Reflexivity, Antisymmetry and Transitivity.
Upper and Lower Bounds

Given a poset ((P, sqsubseteq)) and its subset (S, S subseteq P), we can say that:
- an upper bound of (S) is defined as a (u, u in P) satisfying (forall x in S, x sqsubseteq u).
- a lower bound of (S) is defined as a (l, l in P) satisfying (forall x in S, l sqsubseteq x).
- the least upper bound(lub, or join) of (S), written (sqcup S) satisfy that for every upper bound (u) of (S), (sqcup S sqsubseteq u).
- the greatest lower bound(glb, or meet) of (S), written (sqcap S), satisfy that for every lower bound (l) of (S), (l sqsubseteq sqcap S).
Note that not every poset has the lub or glb, and if a poset has, it will be unique.
The proof can be simple. Firstly for example for a poset ((P, subseteq)), where (P = left{ left{a, b,c ight}, left{a, c ight}, left{a ight}, left{c ight} ight}), the poset do not have the glb. Secondly, for a poset (P) assume (exist sqcap P_1 ot= sqcap P_2), then we have (sqcap P_1 sqsubseteq sqcap P_2) and (sqcap P_1 sqsubseteq sqcap P_2), which indicating a contradiction that (sqcap P_1 = sqcap P_2).

What's more, if (S = {a, b}), then (sqcap S) can be written as (a sqcap b), (sqcup S) can be written as (a sqcup b).
Lattice

Given a poset ((P, sqsubseteq)), (forall~ a, b in P, exist~ a sqcup b, a sqcap b), we call (P) a lattice.
- Semilattice
  
  Given a poset ((P, sqsubseteq)), (forall~a,b in P),
  - if only (a sqcup b) exists, then ((P, sqsubseteq)) is called a join semilattice。
  - if only (a sqcap b) exists, then ((P, sqsubseteq)) is called a meet semilattice.
- Complete Lattice
  
  If a lattice ((P, sqsubseteq)) satisfy that (forall S subseteq P, exist~sqcap S, sqcup S), it is called a complete lattice, in which ( op = sqcup P) called top and (ot = sqcap P) called bottom.
  
  ((, subseteq)) is not a complete lattice, considering (N subseteq , ot exist sqcup N (+ infty)). Moreover, every finite lattice is a complete lattice, but not every complete lattice is finite.
  
  For most of our data flow analysis, we focus on the finite complete lattices.
- Product Lattice
  
  Given lattices (L_1 = (P_1, sqsubseteq), L_2 = (P_2, sqsubseteq), dots, L_n = (P_n, sqsubseteq)), if (forall i, exist sqcup L_i, sqcap L_i), then we can have a product lattice (L^n = (P,sqsubseteq)), where (P = P_1 imes P_2 imes dots imes P_n).
  
  Moreover, there are
  - ((x_1,dots,x_n) sqsubseteq (y_1,dots,y_2) Leftrightarrow (x_1 sqsubseteq y_1) and dots and (x_n sqsubseteq y_n))
  - ((x_1,dots,x_n) sqcap (y_1,dots,y_2) = (x_1 sqcap y_1, dots, x_n sqcap y_n))
  - ((x_1,dots,x_n) sqcup (y_1,dots,y_2) = (x_1 sqcup y_1, dots, x_n sqcup y_n))
  If every (L_i) is complete, then (L^n) is complete.

On the abstraction basis of lattice, a data flow analysis can be viewed as a framework ((D,L,F)) consisting:

(D): a direction of data flow, whether forwards or backwards.
(L): a lattice including domain of the values (V) with two operators (sqcap) and (sqcup).
(F): a family of transfer functions (F: V o V).

Add some details to the framework, we can get a formal expression of our iterative algorithm:

Given a CFG with (k) nodes, the iterative algorithm updates ({ m OUT}[n]) for every node (n) in each iteration.
Assume the domain of the values in data flow analysis is (V), then we can define a k-tuple

[({ m OUT}[n_1], { m OUT}[n_2],dots,{ m OUT}[n_k]) ]
as an element of set ((V_1 imes V_2 imes dots imes V_k)) denoted as (V^k), to hold the values of the analysis after each iteration.
Each iteration can be considered as taking an action to map an element of (V^k) to a new element of (V^k), through applying the transfer functions and control-flow handing, abstracted as a function (F:V^k o V^k)
Then the algorithm outputs a series of k-tuples iteratively until a k-tuple is the same as the last one in two consecutive iterations.

In this context, the algorithm would begin with the state (X_0 = (ot, ot, dots, ot)) and iteratively apply a function (F) to the current state. Then we got (X_1 =(v_1^1, v_2^1, dots, v_k^1) o X_2 = (v_1^2, v_2^2, dots, v_k^2) o dots o X_s), where for each (i), (X_{i+1} = F(X_i)). And for the final state, there is (F(X_s) = X_s), that is to say, (X_s) is a fixed point of the function (F).

After all the work done previously, we can eventually start to use mathematical tricks to answer a few important questions:

Is the algorithm guaranteed to terminate or reach the fixed point, or does it always have a solution?

To answer this question, we should inspect the monotonicity of the function (F), as we do in Lecture 03.

A function (F: L o L) is monotonic if (forall~x, y in L, x sqsubseteq y Rightarrow f(x) sqsubseteq f(y)). ((L) is a lattice)
If so, is there only one solution or only one fixed point? If more than one, is our solution the best one(most precise)?

We have a Fixed-Point Theorem that:

Given a complete lattice ((L, sqsubseteq)), if (F: L o L) is monotonic and (L) is finite, then
- the least fixed point of (F) can be found by iterating (F(ot), F(F(ot)), dots, F^k(ot)) until a fixed point is reached.
- the greatest fixed point of (F) can be found by iterating (F( op), F(F( op)), dots, F^k( op)) until a fixed point is reached.
We can give a proof for the least fixed point, that of the greatest is similar.

(P.f.)
1. We firstly prove the existence of a fixed point.
  
  By the definition of (ot) and (F: L o L), we have:
  
  [ot sqsubseteq F(ot) ]
  As (F) is monotonic, we have:
  
  [F(ot) sqsubseteq F(F(ot)) sqsubseteq dots sqsubseteq F^i(ot) ]
  As (L) is finite, for some (k), we have:
  
  [F^{ m Fix} = F^k(ot) = F^{k+1}(ot) ]
  Thus we found the fixed point.
2. Secondly, we prove the fixed point is the least one.
  
  Assume we have another fixed point (x). By the definition of (ot), we have (ot sqsubseteq x).
  
  As (F) is monotonic, using induction, we can easily get
  
  [forall~i,F^i(ot) sqsubseteq F^i(x) ]
  Thus (exist~k,F^k(ot) sqsubseteq F^k(x) = x), which means
  
  [F^{ m Fix} = F^k(ot) sqsubseteq x ]
  So that the fixed point is uniquely the least one.
Using the fixed point theorem, we cal easily describe that the iterative algorithm will finally arrive at a best fixed point on the lattice, if (F) is monotonic.