dissertation/resolvebackground.tex~ at master · hamptonsmith/dissertation · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
%----------------------------------------------------------------------------
\chapter{RESOLVE Background and Specification Engineering}\label{ch:resolveBackground}
%----------------------------------------------------------------------------

This chapter contains necessary background information for understanding our verification technique.  We begin with an overview of the RESOLVE system, and then discuss how we expect to leverage our techniques toward more easily-verified software using specification engineering.


%-----------------------------------------------------------------------------
\section{RESOLVE Verification System}
%-----------------------------------------------------------------------------

The RESOLVE\cite{RESOLVE} Verifying Compiler is an attempt to create a full verification pipeline, beginning with an integrated specification and programming language and continuing through to a back-end prover.  It has been applied in industrial settings\cite{hollingsworth2000experience} and used extensively as a platform for education\cite{leonard2009injecting, sitaraman2009engaging, sitaraman2001formal}.  The focus of RESOLVE is on modular verification and scalability.  These are addressed by ensuring each component is verified in isolation and need not be re-verified regardless of the context of deployment.  This means that encapsulation must be strictly assured, which requires that certain treacherous programmatic constructs such as aliases must be tightly controlled.

As an example, we consider a Stack abstraction.  We are interested in designing a system for complete verification---including constraints like memory, so we will use a bounded stack that takes as a parameter its maximum depth.  Listing \ref{lst:stack} shows part of a RESOLVE concept describing such a component.

\lstinputlisting[language=resolve,caption={A Stack Concept\label{lst:stack}}]{Stack_Template.co}

RESOLVE permits parameterized types.  In this case, \texttt{Entry} is a generic type, permitting \texttt{Stack}s to be created to contain any other RESOLVE type.  \texttt{Max\_Depth} is an integer indicating the maximum number of elements that can ever be on a stack.  The \texttt{evaluates} parameter mode indicates that the parameter is pass-by-value (parameters are ordinarily pass-by-reference.)

Applying modular lessons from the world of programming to mathematics, we store theories in separate files with their own scopes that can be imported individually\cite{smith08}.  The \emph{uses} line includes one such theory---the theory of strings, i.e., finite sequences.

The \texttt{Family} clause introduces a family of abstract types called \texttt{Stack}s, modeled conceptually by strings of \texttt{Entry}s.  The \texttt{exemplar} clause introduces a name to be used for an example stack, which is then used in the \texttt{constraint} clause to indicate that not \emph{all} strings of \texttt{Entry}s are valid stacks, but rather only those of length \texttt{Max\_Depth} and less.  The \texttt{initialization} clause notes that all \texttt{Stack}s are assured to be empty when they are first created, which is to say it specifies the constructor operation.

We then define some operations on \texttt{Stack}s---\texttt{Push()} and \texttt{Pop()}, with their associated pre- and post-conditions, introduced by \texttt{requires} and \texttt{ensures} clauses, respectively.  Parameters to operations are preceded by \emph{parameter passing modes}, which summarize the effect the operation will have on a parameter---a parameter that's in \texttt{alters} mode is passed a meaningful value that might assume  an arbitrary value by the end of the call; an \texttt{updates} parameter is given a meaningful value that will be changed to a new meaningful value as specified in the \texttt{ensures} clause;  the input value of a \texttt{replaces} parameter is ignored and replaced with a meaningful value by the end of the call as specified in the \texttt{ensures} clause.

The \texttt{requires} and \texttt{ensures} clauses are mathematical assertions and their variables refer to the mathematical values of the parameters.  So, while \texttt{Push} operates on programmatic \texttt{Stack}s, \texttt{S} in its \texttt{ensures} clause refers to the mathematical value of the passed stack---a string of \texttt{Entry}s.  Because mathematical variables can have only one, unchanging value, we use the pound sign to indicate the value at the beginning of the call.  Thus, \texttt{\#S} refers to the value of the \texttt{Stack} at the beginning of the call and \texttt{S} refers to its value at the end.  Inside a \texttt{requires} clause, all variables refer to the values of parameters at the beginning of the call.

Angled braces and the ``\texttt{o}'' operator come from \texttt{String\_Theory} and represent the singleton-string constructor and the concatenation operator respectively.

Once a concept has been described, an implementation can be provided in a \emph{realization}.  Listing \ref{lst:linkingrealization} provides a realization of a \texttt{Stack} using an array.  Note that while we provide some syntactic sugar for arrays, a pre-processing step translates all array manipulation into interactions with a normal component and thus all reasoning is accomplished via a specification that models arrays as functions from integers to entries rather than hard-coded or specialized array reasoning.  This contrasts with every other practical programming verification system we are aware of that supports arrays, which ordinarily rely on specialized array decision procedures of the kind discussed in \cite{bradley2006s}. It provides an example of how often-primitive structures can be modelled normally within the framework of RESOLVE's flexible mathematical subsystem\footnote{In addition to this array-based implementation, we have experimented with a linked-structure component in order to provide, in this case, a linked-list based implementation.  The details of this exploration can be found in \cite{kulczyckiPointers}.}.

\lstinputlisting[language=resolve,caption={An Array Realization\label{lst:linkingrealization}}]{Array_Realiz.rb}

We represent our Stack programmatically as a \emph{record} (similar to a \emph{struct} in C) containing an array of contents and a top index.  A \texttt{convention} is a representation invariant that must hold after each method except finalization, and may be assumed before each method except initialization.  In this case, the top may be assumed to be at a valid index or  0, where it is permitted to reside if the Stack is empty.

A \texttt{correspondence} clause provides a relation between the programmatic \texttt{Stack} and its mathematical conceptualization.  In this case, we use a mathematical definition \texttt{Concatenation}, which is to string concatenation what big $\Sigma$ is to integer addition.  Here, this clause states that the mathematical value of a \texttt{Stack} is equal to the reverse of the concatenation of the elements of the \texttt{S.Contents} array from 1 to \texttt{S.Top}.

A procedure is provided for accomplishing both \texttt{Push()} and \texttt{Pop()}, taking advantage of standard integer and array operations.

Using the specifications of these operations, the RESOLVE compiler is able to generate VCs establishing the correctness of this \texttt{Stack} implementation.  The details of this generation process is the topic of \cite{hartonDissertation}.  As an example, this is the VC establishing the convention at the end of a call to \texttt{Push()}:

\lstinputlisting[language=resolve]{Pop_Convention.asrt}

Note that in Given \#10, the length of the result of the concatenation is clearly equal to \texttt{S.Top}, thus we know that \texttt{S.Top + 1} is less than \texttt{Max\_Depth}, satisfying half the goal.  Then, by Given \#9, we see that \texttt{0 <= S.Top}, so clearly \texttt{0 <= (S.Top + 1)}, satisfying the other half.

Once generated, VCs are passed to RESOLVE's integrated minimalist prover.  This prover was built as an extensible prover platform for verification experimentation\cite{smith10} and is discussed more thoroughly in Chapter \ref{ch:prover}.

It is possible to define \emph{extension operations} to a concept.  These are secondary operations that permit useful additional functionality.  As an example, \texttt{Stack\_Template} has an extension operation called \texttt{Flip}, presented here as an enhancement:

\mbox{\lstinputlisting[language=resolve]{Flipping_Capability.en}}

Just as with a concept, enhancements may have multiple associated realizations, for which the VC-generation and proving process is the same.  Consider the realization of \texttt{Flipping\_Capability} given in Listing \ref{lst:flipRealiz}.

\lstinputlisting[language=resolve,caption={A Realization of Flip\label{lst:flipRealiz}}]{Obvious_Flip_Realiz.rb}

This \texttt{Flip} procedure creates a local \texttt{Stack} called \texttt{S\_Flipped}.  It then iteratively pops each element off the original stack and into the local one.  Having done so, the \texttt{:=:} operator is used to exchange the value of \texttt{S\_Flipped} with \texttt{S}.  Using swapping as its basic data movement operator enables RESOLVE to minimize undesirable aliasing\cite{harmsSwapping, pike2009traditional} without resorting to more complex alias-management techniques like object ownership\cite{boyapati2003ownership}.

Notice that the while loop requires programmer-supplied annotations.  The \emph{changing} clause establishes a frame property: only those values explicitly listed may change.  The \emph{maintaining} clause establishes a loop invariant expressing the logic of the loop.  The \emph{decreasing} clause establishes a termination metric to allow us to prove termination.

Such annotations are standard operating procedure for verification systems and can be seen across the board at, for example, the first VSTTE verification competition\cite{klebanovVSTTEExperience}.  Some work has been done on automatically discovering, e.g., loop invariants on the programmer's behalf\cite{ernstInfer, ernst2007daikon}, however for the time being such automatic inferences remain limited, especially in the presence of data abstractions.  Regardless, RESOLVE's explicit invariants do not preclude such a method from being employed to fill them in.

Using RESOLVE's integrated minimalist prover, a component of this research, we are able to fully and mechanically verify this procedure, as will be discussed further in Chapter \ref{ch:proverEvaluation}.


%-----------------------------------------------------------------------------
\section{Specification Engineering}
%-----------------------------------------------------------------------------

In modern efforts to capture the behavior of existing software components and systems, specifications are often quite complicated, even for simple operations.  Consider Listing \ref{lst:jmladd} for the JML spec of the \texttt{add()} operation on a \texttt{List} data structure.

\lstinputlisting[language=JML,caption={JML Specification for \texttt{add()}\label{lst:jmladd}}]{JMLAdd.jml}

The JML specification library seeks to formalize the existing informal specification of the Java Runtime Library, so it is bound by design decisions made agnostic of formal specification.  This contributes to the complexity of the specifications in a number of ways:

\begin{itemize}
	\item The original informal specification provides no guidance about how to deal with collections potentially containing more than \texttt{Integer.MAX\_VALUE} elements.
	\item Language complexities like null elements and integer bounds must be taken into account.
	\item Without a correspondence established at the class level between the abstract value of the structure as a whole and its abstract fields, information that would otherwise be redundant must be encoded.  For example, certainly the fact that \texttt{this.contains(o)} follows from \texttt{\\old(this.theCollection.insert(o))}, but no correspondence exists between these two values.
\end{itemize}

Beyond these concrete issues demonstrated in this particular example, complexity arises from other areas as well:

\begin{itemize}
	\item Capturing alias behavior, including the repeated argument problem.
	\item Using separation logic to describe allowable changes in memory while the method call runs.
	\item Poorly designed APIs.
\end{itemize}

In \cite{filipovic2010blaming} the authors argue for a policy of ``blaming the client'' by suggesting that such complexities are appropriate and the burden of reasoning about them should be left with the client.  However,  because specifications form the basis of the reasoning that becomes VCs, a complex specification such as this one necessarily leads to complex VCs.  In this example, because Java's \texttt{List} is intended as a reusable component, many clients will become victim to this complexity, increasing the difficulty of verifying code over the entire lifetime of this component.  Certainly this is unacceptable in a system that is to scale.

The design of RESOLVE, on the other hand, attempts to minimize such complexities.  There are no nulls to reason about.  While integers are bounded, their bounds and constraints are asserted once in a first-class component and reasoned about at that level.  A class-level correspondence establishes how the actual fields of the representation relate to an abstract value.  RESOLVE is designed to have clean semantics\cite{kul:dis}, minimizing aliasing, marshaling reference behavior through a pointer data structure, and providing a well-defined semantic for repeated arguments.

Though no language can force its users to write good APIs, the constraints and rigor of the language contribute to encapsulated, modular design.  Consider this specification for the comparable \texttt{insert()} operation on the RESOLVE list component:

\begin{lstlisting}[language=RESOLVE]
	ensures P.Prec = #P.Prec and P.Rem = <#New_Entry> o #P.Rem
\end{lstlisting}

Just as these design decisions encourage reuse and simplify \emph{human} reasoning, they correspondingly simplify automated reasoning.  Consider the VC given in Listing \ref{lst:easyVCEg} derived from a stack reversing procedure.

\lstinputlisting[language=RESOLVE,caption={VC for the Inductive Case of Loop Invariant\label{lst:easyVCEg}}]{StackFlipVC1.asrt}

The VC is easily solvable by expanding the variable S'', then applying a few relatively straightforward transformations on strings.  The presence of these sorts of VCs originally caused us to suspect we could get away with a very simple prover---one that first expanded variables, then substituted equivalent expressions until either the goal matched some given or the expression reduced to \texttt{true}.  To test this hypothesis, we built a first version of the automated prover, discussed in the next chapter.