-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy paththesis.tex~
4108 lines (3552 loc) · 182 KB
/
thesis.tex~
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass{puthesis}
\usepackage{latexsym}
\usepackage{graphicx}
\usepackage{url} % SB
\usepackage{algorithmic}
\usepackage{algorithm}
\usepackage{times}
\usepackage{xcolor}
\usepackage{textcomp}
\usepackage{mathpartir}
\usepackage{semantic}
\usepackage{listings}
\usepackage{lstlangcoq}
\usepackage{caption}
\usepackage{stmaryrd}
\usepackage{placeins}
\usepackage{qtree}
\renewcommand{\lstlistingname}{Figure}
\input{macros}
\lstset{language=Coq,alsoletter=_,basicstyle=\sffamily,mathescape=true,columns=fullflexible}
\hyphenation{Comp-Cert}
\author{Josiah Dodds}
\adviser{Andrew Appel}
\title{Computation Improves Interactive Symbolic Execution}
\abstract{
As it becomes more prevalant throughout our lives, correct software is
more important than it has ever been before. Verifiable C is an
expressive Hoare logic (higher-order impredicative concurrent
separation logic) for proving functional correctness of C programs.
The program logic is foundational---it is proved sound in Coq
w.r.t. the operational semantics of CompCert Clight. Users apply the
program logic to C programs using semiautomated tactics in Coq, but
these tactics are very slow.
\qquad This thesis shows how to make an efficient (yet still
foundational) symbolic executor based on this separation logic by
using computational reflection in several different ways. Our
execution engine is able to interact gracefully with the user by
reflecting application-specific proof goals back to the user for
interactive proof---necessary in functional correctness proofs where
there is almost always domain-specific reasoning to be done. We use
our ``mostly sound'' type system, computationally efficient finite-map
data structures, and the MirrorCore framework for computationally
reflected logics. Measurements show a 40$\times$
performance improvement.}
%\acknowledgements{Thank you very much.}
%\dedication{To myself.}
\begin{document}
\chapter{Introduction}
The C programming language is one of the most commonly used languages
for writing critical software. C is used for operating systems,
embedded systems, and cryptography, as well as countless libraries
that need to have high performance while being compatible with as many
languages and systems as possible. The world depends on more systems
every day, and many of those systems have C code at their core.
In general, the code we depend on works well, and there is plenty of
noncritical code that can fail without causing harm. But when critical
code fails, it can be catastrophic. Space projects that cost hundreds
of millions of dollars have failed due to problems with software
\cite{polarlander, marsorbit}, radiation device software has caused
overdoses in cancer patients\cite{therac}, and millions of cars have
been recalled to fix software bugs\cite{toyotarecall}. Those scenarios
are only caused by bugs; far more systems can fail when a malicious
user attacks them. As an example, a brand of refrigerators was found
to have been taken over to send spam e-mail \cite{fridgenet}.
To believe that a C program will run correctly we must first believe a
number of other things. There are a number of components that we will
need to be able to \emph{trust} in order to trust our C program. We need
to fully understand the program itself, and believe that it will do
what we want it to. We also need to believe that when
the C code is translated into machine language, the generated machine
language has the same behavior as the original program. Finally we
need to believe that the machine executing the machine language is
doing the correct thing. We call the collection of things we must
trust the \emph{trusted computing base}. In general, the larger the
trusted computing base is, the harder it is to be confident that the
program is correct.
The need for trust follows a chain. It starts with the C program at
the top, the most human understandable part of the chain, and moves
down to the actual execution of the machine. In order to make the
trusted computing base as small as possible, we must examine and
minimize the base of every link in the chain.\footnote{If we cannot
trust the C code or the C compiler, we can review or verify the
assembly-language program; this approach is often used in practice,
but it is difficult and tedious.}.
Proof assistants are tools for stating theorems, and then writing and
checking proofs of those theorems. If you want to believe a pen and
paper mathematical proof you first need to understand the statement of
the proof, which is open to ambiguities based on (often unstated)
assumptions made by the writer, and then believe every single step of
the proof itself, even steps where the proof author says something
vague like ``clearly'' or ``obviously''. This can be an enormous
burden on the person that needs to understand how the proof was done;
new proofs can use new techniques that are impenetrable to anyone but
experts in the specific domain of the proof. This is a real problem,
because often the theorem proved is useful to many people even if the
proof itself is difficult for them to understand. If you want to use a
pen-and-paper proof that you can't understand, you must trust the
process of peer review.
A proof assistant, on the other hand, forces proof statements to be
written in well defined logic, making theorem statements completely
unambiguous. The proof assistant often provides tools for building
proofs. These tools write proofs using well defined inference rules.
The tools for building proofs don't need to be
trusted. All that needs to be trusted is a part of the proof assistant
that checks the proof when it is created. This is analogous to not
needing to trust the person writing a pen and paper proof, as long as
you can check the proof that they wrote. In summary, if a proof is
stated and proved in a proof assistant, to believe the proof we must
trust:
\begin{enumerate}
\item The statement of the theorem being proved, and
\item the correctness of the checker.
\end{enumerate}
Regarding the correctness of the checker (the kernel of the proof
assistant) the best place to look in order to believe the proofs is to
peer review. In this case though, it is not a few anonymous reviewers
that are looking at an individual proof, it is the entire community
that is interested in the correctness of the proof assistant. This is
an advantage of concentrating trust. The fewer things that need to be
trusted, the more attention they can get.
A verified compiler gives a proof that if it is given a valid C
program, it will generate assembly code with the same behavior as the
C program. This proof is checked by a proof assistant, meaning that
its trusted base is the proof checker and the statement of the
theorem. To be able to use the theorem effectively though, we need to
be able to show that the C program will always have a valid execution.
The results of the verified compiler tells us that the compiler won't
change the behavior of the program, but this is only really useful to
us if our program is doing the right thing in the first place. It can
be very hard to determine what a C program is doing just by examining
the source code. The C language is relatively
low-level, meaning that a program will need more details about how the
computation is done than a higher level language. The high level of
control over the details of execution is a large part of why C
continues to be popular over 40 years after it was created. It allows
programmers to write more efficient programs, but it can make it much
harder to understand what the program is actually doing.
To create the most convincing statement possible, we write a
mathematical specification of what the program should do. This
mathematical model lacks the complexity that a real program needs to
work efficiently on a computer. The specification is a new link in
the chain above the C source code. A \emph{Program Logic} is used to
prove that the result of executing the C program meets the
specification. This program logic can be implemented in a theorem
prover, and proved correct using the same specification of the C
language that the verified compiler uses. This completes the chain of
verification from specification all the way to assembly language.
The combination of the tools mentioned here gives proof that the
assembly code that the machine executes has the same behavior as an
abstract mathematical specification. To believe that the proof is
correct, all we need to trust is the mathematical specification itself
and the theorem prover's proof checker.
There is still a significant amount of work to do if you wish to prove
a program correct. First you must create an accurate specification ---
which can be difficult, especially for complex programs. Then it can
be a very involved process to prove a program meets a specification
using a program logic. Program logics implemented in proof assistants
are often \emph{interactive}. To build a proof in an interactive
program logic, you examine a proof state and then perform an action to
manipulate that state. The action will result in a new proof state,
which you can repeatedly manipulate until the proof is completed.
Program logics that are proved correct in proof assistants can be used
interactively inside of the prover. A sound logic is only useful if it
is applied correctly. In a pen-and-paper program logic, correct
application of the logic is one more thing that must be trusted. This
is not the case in a logic implemented inside of a proof assistant. In
a proof assistant, the application of the logic to a program can also
be checked, removing it from the trusted computing base.
One of the biggest problems that implementations of complex program
logics in proof assistants can run into is that they are slow. This
means that the time between giving input and receiving a new proof
state can be long. Writing a proof in a program logic is already
difficult work requiring intimate knowledge of the program, the
programming language, and the logic. Having to wait between each step
in a proof multiplies the difficulty of writing a proof. This is because proof
development is often experimental in nature. The first step is to make
a good guess at a specification. If the proof gets stuck, either the
specification or the program needs to change. When those change, the
proof will need to start over from the beginning. A slow program logic
can drastically limit the number of manipulations of the program or
the specification that can be done in any given stretch of time. If
the time is long enough, it can become difficult the person doing the
proof to remember what they were working on when they last made
progress on the proof.
This thesis shows how a proof assitant implementation of a complex
program logic in for proving the correctness of C programs can be made
significantly faster without any negative impact on the proving
experience. This results in a logic that is significantly more
intuitive and usable, and is a vital step in making the program logic
a widely usable tool.
Many of the parts discussed so far are existing work: The proof
assistant we have chosen to use is Coq, Leroy's CompCert
\cite{Leroy-Compcert-CACM} is a C compiler implemented and proved
sound in Coq, and Appel et al. \cite{appel14:plcc} have created the
Verified Software Toolchain (VST) to give a proved-correct chain from
specification language to assembly including a specification logic and
a program logic to relate the specification to a CompCert C program.
\paragraph{Contributions}
This thesis discusses the modification of a program logic to improve
usability and speed up the application of the logic. Chapter
\ref{ch:typechecking} presents a type system that operates within the
logic. This typechecker automates many of the steps normally required
when using the logic, giving a minimal goal for the user to prove if
it can't proceed automatically. Chapter \ref{ch:canonical} discusses
how the logic can be made more efficient by organzing the statements
in the logic. This increased organization allows us to use more
efficient data structures, improving performance and simplifying the
statement of our logic rules. Finally, we completely replace the
automation responsible for applying our logic to a program with
\emph{reflective} automation. This type of automation is able to take
full advantage of Coq's fastest features, giving us a resulting proof
system that typically runs at least 40x faster than the previous ones
and works on all Verifiable-C non-call assignment statements
(\lstinline|x := e|), meaning it makes progress on a single basic
block at a time. The new system is fully compatible with previous
tactics, meaning that in places where it is not usable, existing
tactics can still make progress on the proof.
\chapter{Computation in Coq}
\label{ch:computation}
The Coq proof assistant has a built-in functional programming language
called Gallina with the ability to define and match inductive
data structures. The type boolean, for example can be defined:
\begin{lstlisting}
Inductive boolean :=
| true
| false.
\end{lstlisting}
So \lstinline|boolean| is a type that can be constructed by either
\lstinline|true| or \lstinline|false|; and when given a
\lstinline|boolean|, a function can decide which of the two it is. Now
we can look at the definition of \lstinline|True| and
\lstinline|False|, which are in the type
\lstinline|Prop| instead of \lstinline|boolean|. Prop is not an
inductively defined type in Coq, so it can't be matched
against. Instead the definitions of \lstinline|True| and \lstinline|False|
are
\begin{lstlisting}
Inductive True : Prop := I : True.
\end{lstlisting}
\begin{lstlisting}
Inductive False : Prop.
\end{lstlisting}
\noindent along with an infinity of other propositions, such as
$\forall x:Z, x>5$. Even though (one might think) this is False, it is not
the same proposition.
\noindent
Gallina programs cannot pattern-match over propositions; they can
compute only over inductive data structures, and Prop is not
inductive. That doesn't mean that it is impossible to reason about
things of type \lstinline|Prop| in Coq. Instead of using Gallina to
operate on \lstinline|Prop|, Coq has tactics, which exist almost
exclusively for that reason. The \emph{tactic language} can parse
(pattern-match on) propositions. A tactic in Coq is a program that
manipulates a proof state while generating a proof object recording
its activity. A Coq proof state is a goal (e.g., \lstinline|Prop| prop
to be proved), along with a number of quantified variables of any
type, and a number of hypothesis of type \lstinline|Prop|.
This design means that tactics don't need to be sound. If they produce
a proof that checks, it doesn't matter how the proof was
created. Tactics might do a lot of work to create a small proof
term. For example a tactic might perform a backtracking proof search
and takes numerous wrong paths before finding a solution. This tactic
will have performance that matches the complexity of the proof
search. A tactic that continuously makes modifications to a proof
state, with each modification recorded to the proof object, is likely
to experience performance worse than what you would expect given the
number of operations performed. Unfortunately, the second case is
reasonably common. In these cases, the overhead of keeping and
modifying the proof object (which can take up substantial amounts of
memory) can lead to very slow tactic performance.
Logics such as VST's verifiable C program logic are in
\lstinline|Prop|. Logics in Prop are said to be shallowly embedded.
Instead of a ``deep embedding" --- syntax separated from any semantic
interpretation--a shallow embedding defines each new operator as a
semantic predicate in the language of propositions. Some systems such
as Appel's VeriSmall \cite{appel11:cpp} are deeply embedded, meaning
they define their own syntax, along with a denotation that gives
meaning to that syntax in Coq's logic.
Shallow and deep embeddings have tradeoffs in two areas:
\begin{enumerate}
\item interactivity, or how convenient it is for users to interact
with the logic, generally in the way they would expect to be able to
interact with Coq's logic, and
\item automation, or the ability of the logic writer to provide the
user of the logic with tools to efficiently reason about the logic.
\end{enumerate}
A shallow embedding will naturally take advantage of Coq's proof
automation. A language for creating and combining tactics called Ltac
can be used to write decision procedures about the logic without
requiring soundness proofs for those procedures. A deep embedding,
however, will be much harder to interact with. To have meaning as a
proof goal, a deep embedding will need to be wrapped in its denotation
function. Then every operation on the deeply embedded assertion must
be proved sound with respect to the denotation function. Take Coq's
rewrite tactic as an example. In this tactic if we wish to write a
symmetry lemma about an equality named \lstinline|add_symm|, we use
the \lstinline|Lemma| keyword and give the type of
\lstinline|add_sym|:
\begin{lstlisting}
Lemma add_symm : forall (a b : nat), a + b = b + a.
\end{lstlisting}
\noindent nat in Coq is the inductively defined natural numbers.
If we are able to give a definition for \lstinline|add_symm| with
\lstinline|forall a b, a + b = b + a|, that is a proof of the
fact. Because there are generally difficult dependent types involved
in constructing the proof term correctly, we use tactics to manipulate
a proof state until we have successfully created a proof object.
If we wish to use the tactic we proved to help solve another goal. In
Coq's interactive proving mode, goals are presented as hypothesis
(things above the line) that you know, and the conclusion that you are
trying to prove (below the line).
\begin{lstlisting}
a : nat
b : nat
c : nat
======================
a + (b + c) = a + (c + b)
\end{lstlisting}
In this case we know that we have Coq variables \lstinline|a|,
\lstinline|b|, and \lstinline|c|, and that they are all natural
numbers. We are trying to prove \lstinline|a + (b + c) = a + (c + b)|.
we can use the tactic \lstinline|rewrite (add_sym b c)|
to transform the left side to match the right side:
\begin{lstlisting}
a : nat
b : nat
c : nat
======================
a + (c + b) = a + (c + b)
\end{lstlisting}
\noindent and then use the \lstinline|reflexivity| tactic, which
checks to see if both sides of an equality are equal, solving the goal
if it discovers that they are. In this case they are syntactically
equal, so the goal is solved. We supply the arguments \lstinline|b|
and \lstinline|c| to the tactic because it is ambiguous where to
rewrite the lemma, and Coq might guess wrong unless we tell it.
If we wish to solve this goal computationally we can create a deep
embedding, along with a denotation function for nat, addition, and
equality
\begin{lstlisting}
Inductive expr' :=
| num : nat -> expr
| add : expr -> expr -> expr.
Inductive expr :=
| eq : expr' -> expr' -> expr.
Fixpoint expr'_denote e:=
match e with
| num n => n
| add e1 e2 => expr_denote e1 + expr_denote e2
end.
Definition expr_denote e :=
match e with
| eq e1 e2 => expr'_denote e1 = expr'_denote e2
end.
\end{lstlisting}
This syntax mirrors the syntax of a Coq goal, it means the same thing
but we can compute on it inside of a Gallina program. We define it in
two levels (\lstinline|expr| and \lstinline|expr'|) so that every
\lstinline|expr| has a denotation. This is not a requirement of a
reified syntax, but it simplifies the other steps.
When we apply
the denotation function to the syntax and get a \lstinline|Prop| back,
we say that the goal has been reflected. The process of
\emph{computational reflection} mirrors a shallow embedding with a
deep embedding, does work on the deep embedding, and reflects the
result back to a shallow embedding. To perform the work, we can write a
new symmetry lemma, this time on the denotation of the deep embedding:
\begin{lstlisting}
Lemma add_sym' : forall a b,
expr'_denote (add a b) = expr'_denote (add b a)
\end{lstlisting}
To use this we need a \emph{reified} version of our original goal. To
reify the goal, we create a tactic called a reifier. This must be done
using a tactic because tactics have the ability to match
\lstinline|Prop|, while Gallina does not. The following is a
reification of our original goal:
\begin{lstlisting}
a : expr
b : expr
c : expr
======================
expr_denote (eq (add a (add b c)) (add a (add c b)))
\end{lstlisting}
The lemma we wrote doesn't match the syntax of what we need to prove
so we can't use the rewrite tactic. We could unfold the definition of
\lstinline|expr_denote| and \lstinline|expr_denote'| in our goal, but
then we would lose the deep embedding. Instead, if we want this
functionality, we need to write a \emph{Coq function} that does the
rewrite and prove that function sound with respect to the denotation
function (assuming we have an equality function on our expr, which
isn't hard to write):
\begin{lstlisting}
Fixpoint symmetry' e e1 e2 :=
match e with
| add e1' e2' => if (e1 == e1' && e2 == e2')
then add e2' e1'
else add (symmetry e1')
(symmetry e2')
| x => x
end.
Definition symmetry e e1 e2 :=
match e with
| eq ex1 ex2 => eq (symmetry' ex1 e1 e2) (symmetry ex2 e1 e2)
end.
Lemma symmetry_func_sound :
forall e1 e2 e,
expr_denote (e) = expr_denote (symmetry e e1 e2)
...
\end{lstlisting}
Now we can rewrite by \lstinline|(symmetry_func_sound b c)| giving us:
\begin{lstlisting}
a : expr
b : expr
c : expr
======================
expr_denote (symmetry (eq (add a (add b c)) (add a (add c b))))
\end{lstlisting}
and simplify. We simplify by using Coq's \lstinline|simpl| tactic,
which performs $\beta\iota$ reduction. The \lstinline|simpl| tactic uses
heuristics to try to only reduce goals far enough that they are still
easily readable by a proof writer. In this case, \lstinline|simpl|
will (unseen to the user) first unfold and reduce the symmetry
definition:
\begin{lstlisting}
a : expr
b : expr
c : expr
======================
expr_denote ((eq (add a (add c b)) (add a (add c b))))
\end{lstlisting}
\noindent and then the denotation function, reflecting the goal back
to \lstinline|Prop|:
\begin{lstlisting}
a : nat
b : nat
c : nat
======================
a + (c + b) = a + (c + b)
\end{lstlisting}
\noindent where the goal can be solved by reflexivity.
Why would we ever use a deep embedding when automation is so much
work? The main reason is efficiency. An Ltac tactic adds to the size
of the proof object every time it does an operation. Proof by
computational reflection, on the other hand, can perform as many
operations as it wants without creating any proof object at
all. Instead, the proof object for computational reflection is the
soundness proof for the function that is applied to the deep
embedding. Although the example above did a very small amount of work
in the symmetry function, in general such a function could be an
entire decision procedure for a complex problem. Then we would be able
to do a large amount of work on a deeply embedded proof term while
hardly generating any proof object at all. In general, this will be
substantially more efficient than using Ltac, especially as the size
of the deeply-embedded statements grows.
Another advantage to a deeply embedded logic is that Gallina programs
can be compiled to byte-code using Coq's \lstinline|vm_compute|
tactic --- or even to machine code by way of OCaml in upcoming versions
of Coq --- which allows it to be run significantly more efficiently, at
the cost of adding a virtual machine to the trusted computing base.
The technique of computational reflection in Coq combines the
interactivity of a shallow embedding with the efficiency of a deep
embedding by performing as much automated work as possible on the deep
embedding, and returning a readable shallow embedding goal to the user
when interaction is required.
\section{Computational reflection}
\label{sec:reflection}
Proof by reflection is the technique of using proved-sound computation
on a deep embedding to make progress in a proof. To use reflection on
a shallow embedding, there is first a translation, or
\emph{reification} from the shallow embedding into a deep
embedding. Then a proved-sound function can be evaluated (preferably
using \lstinline|vm_compute| for the best performance) on the deep
embedding, finally the denotation function can be evaluated, or
\emph{reflected}, resulting in a shallow embedding again. If this is
done all in a single step, the user of a reflective never knows that
reflection is being used. That means that the efficient, proved-sound
decision procedures that can be used on deep embeddings can be used as
tactics, along side other tactics and Ltac automation that might
already be built up around a logic.
The process of translating from a shallow embedding to a deep
embedding is known as reification. Because Gallina programs can't
match on shallow embeddings, reification must be performed by a
tactic. Reification leaves us with a deep embedding (or reified
expression), which allows us to use proved-sound Gallina functions to
progress the proof state without large proof terms. When the deep
embedding is at a state that requires human work, the denotation
function can be carefully unfolded, or reflected, to return it to a
shallow embedding that is easy for the user to view and work with.
%graphic from prefpo
Deep embeddings are purely syntactic; but even shallowly embedded
languages may have substantial sublanguages that are purely
syntactic---defined by inductive data types rather than semantic
definitions. In many program logics, for example, the programming
language will not be a denotational semantics based on Gallina, but an
operational semantics over syntactic program terms. Because the syntax
is a deep embedding, it is possible to write Coq functions that reason
about them, and then prove those functions sound (usually with respect
to the operational semantics). This is a type of reflection that
doesn't require reification at all, making it more efficient and
allowing it to fit cleanly into the shallowly embedded program
logic. One such example in VST is our typechecker which is used to
show that expressions in the program successfully evaluate given the
state represented by the precondition.
\chapter{Typechecking C Expressions}
\label{ch:typechecking}
A type system is a formalization that assigns types to subexpressions
of a programming language and describes when a program uses those
expressions correctly according to their types. It is often possible to
build a decision procedure that answers whether or not a program is
well typed in a type system. This decision procedure is known as a
typechecker. A typechecker is generally meant to be used as part of
program compilation, which means it should be reasonably fast and
operate with the program as its sole input. It restricts the number
of valid programs while guaranteeing certain errors won't occur.
We have designed a typechecker specifically for efficiently and
conveniently structuring Hoare-logic proofs. The purpose of the
typechecker is to ensure that expressions evaluate in a completely
computational manner. It is a Galina program that either says an
expression evaluates, or produces an assertion that will guarantee
expression evaluation.
To see why expression evaluation is important to the logic, consider a
naive Hoare assignment rule for the C language.
\begin{mathpar}
\inferrule{}
{\tripleD{}{P[e/x]}{x:=e}{P}}\qquad\mbox{(naive-assignment)}
\end{mathpar}
\FloatBarrier
This rule is not sound with respect to the operational semantics of
C. We need a proof that $e$ evaluates to a value. It could, for
example, be a division by zero, in which case the program would crash
and no sound Hoare triples could hold. The expression $e$ might
typecheck in the C compiler, but can still get stuck in the
operational semantics
A better assignment rule requires $e$ to evaluate:
\vspace{-20pt}
\begin{mathpar}
\inferrule{e \Downarrow v}
{\tripleD{}{P[v/x]}{x:=e}{P}}\mbox{assignment-ex}
\end{mathpar}
\FloatBarrier
This rule is inconvenient to apply because we must use the operational
semantics to show that $v$ exists. In fact, any time that we wish to
talk about the value that results from the evaluation of an
expression, we must add an existential quantifier to our
assertion. Showing that an expression evaluates can require a number
of additional proofs. If our expression is \lstinline|(y / z)|, we
will need to show that our precondition implies: \lstinline|y| and
\lstinline|z| are both initialized, \lstinline|z $\not=$ 0|, and
\lstinline|$\texttildelow$(y = int_min $\wedge$ z =$\,$-1)|. The
latter case causes overflow, which is undefined in the C standard for
division. These requirements will become apparent as we apply the
semantic rules, and some of them may be trivially discharged. Even
so, the proof will be required, when it was likely computationally
obvious that it was unnecessary. A type system is the answer to this
problem, but unfortunately C has an unsound type system, when we
require soundness.
\section{A Sound C Type System}
Instead we change the typechecking game. We don't answer just
``type-checks'' or ``doesn't typechck'' because our typechecker can produce
verification conditions that say ``typechecks if this assertion
holds''. We create a type checker that computes fully, and gives
simple, minimal preconditions to ensure expression evaluation. The
preconditions may not be decidable, but they are left to the user of
the logic to solve. We define a function \lstinline|typecheck_expr|
(Section \ref{sec:typechecker}) to tell us when expressions evaluate
in a type context $\Delta$.
Now our assignment rules are,
\vspace{-20pt}
\begin{mathpar}
\inferrule{ }
{\tripleD{\Delta}{\mathsf{typecheck\_expr}(e,\Delta)~\wedge~P[e/x]}{x:=e}{P}}\mbox{tc-assignment}
\and
\inferrule{ }
{\tripleD{\Delta}{\mathsf{typecheck\_expr}(e,\Delta)~\wedge~P}{x:=e}{\exists
v. x = \mathsf{eval}(e[v/x]) \wedge P[v/x]}}\mbox{tc-floyd-assignment}
\end{mathpar}
\FloatBarrier
The $\mathsf{typecheck\_expr}$ is not a side condition, as
it is not simply a proposition (\lstinline|Prop| in Coq)
but a separation-logic predicate quantified over an environment
(Section \ref{sec:context}). This rule will be inconvenient to
apply. In general, it is unlikely that the writer of a proof will have
their assertion in the form
$\mathsf{typecheck\_expr}(e,\Delta)~\wedge~P$. Instead they will likely have
a precondition $P$ from which they can prove
$\mathsf{typecheck\_expr}(e,\Delta)$. The Hoare rule of consequence is
useful for this:
\vspace{-20pt}
\begin{mathpar}
\inferrule{P \rightarrow P' ~ \tripleD{\Delta}{P'}{c}{Q'} ~ Q' \rightarrow Q}{\tripleD{\Delta}{P}{c}{Q}}
\end{mathpar}
It allows us to replace a precondition by one we know is implied by
the current precondition. We can use this so that an preconditions doesn't
get littered with $\mathsf{typecheck_expr}$ assertions that don't add
any information to the precondition. To do this we derive a new rule using the rule of
consequence. :
\vspace{-20pt}
\begin{mathpar}
\inferrule{P \longrightarrow \mathsf{typecheck\_expr(e,\Delta)}}
{\tripleD{\Delta}{P}{x:=e}{\exists
v. x = \mathsf{eval}(e[v/x]) \wedge P[v/x]}}\mbox{tc-floyd-assignment'}
\end{mathpar}
\FloatBarrier
We use the Floyd-style forward assignment rule in our final proof
system instead of the Hoare-style weakest-precondition rule. This is
not related to type-checking; separation logic with backward
verification-condition generation gives us magic wands which are best
avoided when possible \cite{berdine05:symbolic}.
This rule asks the user to prove that the typechecking condition
holds, without requiring it to be in the precondition. To apply the
rule you must show that the precondition implies whatever condition
the type checker generates. In this way, our typechecker allows
outside knowledge about the program, including assumptions about the
preconditions to the function to be used to decide if an expression
evaluates. For example, when run on the expression $(y/z)$ the
typechecker computes to the assertion $z \not= 0 \wedge \neg (y \not=
\mathsf{int\_min} \vee z \not= -1)$ where $z$ and $y$ are not the
variables, but the values that result when $z$ and $y$ are evaluated
in some environment. The assertions $\mathsf{initialized}(y)$ and
$\mathsf{initialized}(z)$ may not be produced as proof obligations if
the type-and-initialization context $\Delta$ assures that $y$ and $z$
are initialized.
The single expression above exposed a number of things that might go wrong when
evaluating a C expression. A type system must exclude all such cases
in order to be sound. The following operations are undefined in the C
standard, and \emph{stuck} in CompCert C:
\begin {itemize}
\item shifting an integer value by more than the word size,
\item dividing the minimum int by $-1$ (overflows),
\item subtracting two pointers with different base addresses (i.e.,
from different malloc'ed blocks or from different addressable local
variables),
\item casting a float to an int when the float is out of integer range,
\item dereferencing a null pointer, and
\item using an uninitialized variable.
\end{itemize}
If an expression doesn't do any of these things and typechecks in C's
typechecker, we can expect expression evaluation to succeed. In the
tc\_floyd\_assignment above, we use a function $\mathsf{eval}$ which
is a function that has a similar purpose to CompCert's
\lstinline|eval_expr| relation. Defining evaluation as a function in
this manner makes proofs more computational---more efficient to build
and check. Furthermore, with a relation you need a new variable to
describe every value that results from evaluation, because the
evaluation might not succeed. Because we know that the execution of an
expression will succeed, we were able to simply create a function
\lstinline|eval_expr| that computes a value from an expression with no
new variable needed.
We simplify \lstinline|eval_expr| in our program logic---and make it
computational---by leveraging the typechecker's guarantee that
evaluation will not fail. Our total recursive function
\lstinline|eval_expr (e: expr) (rho: environ)|: in environment $\rho$,
expression $e$ evaluates to the value
(\lstinline|eval_expr $e$ $\rho$|). When \lstinline{CompCert.eval_expr} fails, our own
\lstinline{eval_expr} (though it is a total function) can return an
arbitrary value. We can do this because the function will be run on a
program that typechecks---the failure is unreachable in practice. We
then prove the relationship between the two definitions of evaluation
on expressions that typecheck (we state the theorem in English and in
Coq):
\newtheorem{eval_expr_relate}{Theorem}
\begin{eval_expr_relate}
For all logical environments $\rho$ that are well typed with respect to a type
context $\Delta$, if an expression $e$ typechecks with respect to $\Delta$, the
CompCert evaluation relation relates $e$ to the result of the computational
expression evaluation of $e$ in $\rho$.
\end{eval_expr_relate}
\begin{lstlisting}
Lemma eval_expr_relate :
$\forall$ $\Delta$ $\rho$ e m ge ve te, typecheck_environ $\Delta$ $\rho$ -> mkEnviron ge ve te = $\rho$ ->
denote_tc_assert (typecheck_expr $\Delta$ e) $\rho$ ->
Clight.eval_expr ge ve te m e (eval_expr e $\rho$)
\end{lstlisting}
Expression evaluation requires an environment, but when writing
assertions for a Hoare logic, we write assertions that are
functions from environments to \lstinline|Prop|. So if we wish to say
``the expression $e$ evaluates to $5$'', we write
\lstinline|fun $\rho$ => eq (eval_expr e $\rho$) 5|. Because Coq does not match or
rewrite under lambda (\lstinline|fun|), assertions of this form hinder
proof automation. Our solution is to follow Bengtson \emph{et al.}
\cite{bengtson12:Charge} in \emph{lifting} \lstinline|eq| over $\rho$:
\lstinline|`eq (eval_expr e) `5|. This produces an equivalent
assertion, but one that we are able to rewrite and match against. The
first backtick lifts \lstinline|eq| from \lstinline{val->val->Prop} to
\lstinline{(environ->val)->(environ->val)->Prop}, and the second
backtick lifts \lstinline{5} from \lstinline{val} to a constant
function in \lstinline{environ->val}.
\section{C light}
\label{sec:clight}
Our program logic is for C, but the C programming language has features that are
unfriendly to Hoare logic: \emph{side effects within subexpressions}
make it impossible to simply talk about ``the value of $e$'' and \emph{taking
the address of a local variable} means that one cannot reason straightforwardly about
substituting for a program variable
(as there might be aliasing).
The first passes of CompCert translate \emph{CompCert C} (a refined
and formalized version of C99 \cite{Compcert-compiler-website}) into
\emph{C light}. These passes remove side effects from expressions and
distinguish \emph{nonaddressable} local variables from
\emph{addressable} locals.\footnote{Xavier Leroy added the
\lstinline|SimplLocals| pass to CompCert 1.12 at our request,
pulling nonaddressable locals out of memory in C light. Prior to
1.12, source-level reasoning about local variables (represented as
memory blocks) was much more difficult.} Even though CompCert does
this automatically, we recommend that the user do this in their C
code, so that the C light translation will exactly match the original
program.
C has pointers and permits pointer dereference in subexpressions:
\begin{lstlisting}[language=C]
d = p->head+q->head
\end{lstlisting}
Traditional Hoare logic is not well suited for pointer-manipulating programs,
so we use a separation logic (Chapter \ref{ch:canonical}, with assertions such as
$(p\!\!\rightarrow \! \! \mathrm{head} \mapsto x) *(q \!\! \rightarrow \! \!
\mathrm{head} \mapsto y)$. Separation logic does not permit pointer-dereference
in subexpressions, so to reason about
\lstinline[language=C]|d = p->head+q->head|
the programmer should factor into:
\lstinline[language=c]|t = p->head; u = q->head; d=t+u;|
where dereferences occur only at top-level in assignment commands.
Adding these restrictions to C light gives us \emph{Verifiable C}, which is not
a different semantics but a proper sublanguage, enforced by our typechecker.
Some operations, like overflow on integer addition, are undefined in
the C standard but defined in CompCert. The typechecker
permits these cases. In summary, every Clight program is a CompCert C
program, and every CompCert C program can be translated to Clight with
only local transformations.. Our typechecker restricts code to a
subset of Clight that is friendly to verification.
\section{Typechecker}
\label{sec:typechecker}
The typechecker produces assertions that, if satisfied, prove that an
expression will always evaluate to a value.
In the C light abstract syntax produced by CompCert from C source code,
every subexpression is syntactically annotated
with a C-language type, accessed by \lstinline{(typeof e)}.
Thus our typing judgment does not need to be of the
form $\Delta \vdash e : \tau$, it can be
$\Delta \vdash e $, meaning that $e$ typechecks according to its
own annotation.
We define a function to typecheck expressions with respect to a type context:
\begin{lstlisting}
Fixpoint typecheck_expr ($\Delta$ : tycontext) (e: expr) : tc_assert :=
| Econst_int _ (Tint _ _ _) => tc_TT
| Eunop op a ty => tc_andp
(tc_bool (isUnOpResultType op a ty) (op_result_type e))
(typecheck_expr $\Delta$ a)
| Ebinop op a1 a2 ty => tc_andp
(tc_andp (isBinOpResultType op a1 a2 ty)
(typecheck_expr $\Delta$ a1))
(typecheck_expr $\Delta$ a2)
... end.
\end{lstlisting}
\noindent This function traverses expressions emitting \emph{reified}
conditions that ensure that the expressions will evaluate to a value
in a correctly typed environment. The typechecker is actually a
mutually recursive function: one function typechecks r-values and the
other typechecks l-values. Here, we only describe the type-checking
of r-value expressions; the type-checking of l-values is similar.
Although CompCert's operational semantics are written as an inductive
relation in Coq, it also has computational parts. For example,
CompCert \emph{classification} functions help typecheck mathematical
operations by determining the type of the result. The following
function leverages the classification function to typecheck binary
operations:
\begin{lstlisting}
Definition isBinOpResultType op a1 a2 ty : tc_assert :=
match op with
| Oadd => match classify_add (typeof a1) (typeof a2) with
| add_default => tc_FF
| add_case_ii _ $\hspace{.5mm}$ => tc_bool (is_int_type ty)
| add_case_pi _ $\hspace{.5mm}$ _ => tc_andp (tc_isptr a1) (tc_bool (is_pointer_type ty))
... end
... end.
\end{lstlisting}
Most operators in C are overloaded. For example the addition operator
works on both integers and pointers, behaving differently if given two
integers, two pointers, or one of each. Classification functions
determine which of these overloaded semantics of operators should be
used. These semantics are always determined by the types of the
operands. The C light operational semantics uses the constructors
(\lstinline|add_case_ii|, \lstinline|add_case_pi|, (and so on)) to
choose whether to apply integer-integer add (\lstinline|ii|),
pointer-integer add (\lstinline|pi|), and so on. The typechecker uses
the same constructors \lstinline{add_case_ii, add_case_pi,} to choose
type-checking guards.
Despite the reuse of CompCert code on operations, most of the
typechecker checks binary operations. This is because of the operator
overloading on almost every operator in C. The typechecker looks at
eight types of operations (shifts, boolean operators, and comparisons
can be grouped together as they have the exact same semantics with
respect to the type returned). Each of these has approximately four
behaviors in the semantics giving a total of around thirty cases that
need to be handled individually for binary operations.
The code above is a good representation of how the typechecker is
implemented. The first step is to match on the syntax. Next, if the expression
is an operation, we use CompCert's classify function to decide which overloaded
behavior to use. From there, we generate the appropriate assertion.
\subsection{Type Context}
\label{sec:context}
Expression evaluation requires an expression and an environment. The
result of expression evaluation depends on the environment:
\begin{lstlisting}
(* Global-variable environment *)
Definition genviron := Map.t block.
(* Adressable local-variable environment *)
Definition venviron := Map.t (block * type).
(* Temporary (nonadressable local variable) environment *)
Definition tenviron := Map.t val.
Inductive environ : Type :=
mkEnviron: forall (ge: genviron) (ve: venviron) (te: tenviron), environ.
\end{lstlisting}
We use a simple linear map \lstinline|Map.t| (which is equivalent to
\lstinline|(ident -> option val)|) for this environment because in
practice we expect it to always be an abstract quantified variable,
meaning the implementation won't affect performance. The type
\lstinline|block| is CompCert's address into the beginning of a
malloc'ed block.
To guarantee that certain expressions will evaluate, we control what
values can appear in environments. A static type context characterizes
the dynamic environments in which we can prove that expressions will
evaluate to defined values.
\begin{lstlisting}
Definition tycontext: Type :=
(PTree.t (type * bool) * (PTree.t type) * type * (PTree.t global_spec)).
\end{lstlisting}
\lstinline|PTree.t($\tau$)| (Section \ref{sec:ptrees}) is CompCert's efficient computational mapping
data-structure (from identifiers to $\tau$) implemented and proved
correct in Coq. The type
\lstinline|type| is not Coq's \lstinline|Type|, but the deep embedding
of C types that appear in Clight programs.
The elements of the type context are
\begin{itemize}
\item a mapping from temp-var names to type and initialization
information (recall that temp-vars are simply nonaddressable---ordinary---local variables),
\item a mapping from (addressable) local variable names to types,
\item a return type for the current function, and
\item a mapping from global variable names to types
\item a mapping from global variable names to function specifications
\end{itemize}
The first, second, and fourth items correspond to the three parts of
an environment (\lstinline|environ|), which is made up of temporary
variable mappings, local variable mappings, and global variable
mappings.
The fifth item is specific to proofs and is separated from the fourth
because function specifications can't be computed on in a reasonable
amount of time. Separating these two allows us to do the computations
we discuss in Chapter \ref{ch:reflection}.
A temporary variable is a (local) variable whose address is not taken
anywhere in the procedure. Unlike local and global variables, these
variables can not alias---so we can statically determine exactly when
their values are modified. If the typechecker sees that a temporary
variable is initialized, it knows that it will stay initialized. The
initialization analysis is conservative, but if the typechecker is
unsure, it can emit an assertion guard for the user to prove
initialization. In the case of initialization, the typechecker will
never give a result of ``does not typecheck''. It will only say
``yes'' or ``maybe''. It would be possible to occasionally simply say
``no'', but this would require extra information from the type
context, and wouldn't save the user much information.
Calculating initialization automatically is a significant convenience
for the user; proofs in the previous generation of our program logic
were littered with definedness assertions in invariants.