Artifact: Corpse Reviver

7.7.0.10

Artifact: Corpse Reviver

Cameron Moy

Phúc C. Nguyễn

Sam Tobin-Hochstadt

David Van Horn

This artifact accompanies the paper “Corpse Reviver: Sound and Efficient Gradual Typing via Contract Verification.” Artifact evaluation phase I consists of the Install and Optimizing a program sections. Artifact evaluation phase II consists of the Benchmark and Empirical results sections.

1 Install

You can either run the artifact in a VirtualBox virtual machine that includes all the necessary dependencies or manually install the artifact. We recommend the virtual machine.

1.1 VirtualBox

Download VirtualBox.
Download the artifact’s virtual machine image.
Open VirtualBox, choose File → Import Appliance, select the downloaded ova file, and start the virtual machine. The username and password are both scvcr.
Open a terminal window and change to the artifact directory.
$ cd ~/corpse-reviver/corpse-reviver-artifact

1.2 Manual

Install Racket 7.8.

Run the following commands to download and install SCV-CR:

$ git clone https://github.com/camoy/corpse-reviver

$ raco pkg install corpse-reviver/corpse-reviver-artifact \

corpse-reviver/corpse-reviver-benchmark \

corpse-reviver/corpse-reviver

Change to the artifact directory.
$ cd corpse-reviver/corpse-reviver-artifact

2 Optimizing a program

You can write your own gradually typed program and see how well SCV-CR can optimize it. Within the artifact directory there is an examples/ subdirectory with some files. Change to this directory. A file called data.rkt has the following code:

#lang typed/racket/base

(provide random-adders)

(: random-adders : (-> Natural (Listof (-> Integer Integer))))
(define (random-adders n)
(for/list ([i (in-range n)])
(λ ([x : Integer]) (+ (random 10) x))))
Figure 1: The data.rkt file.

This typed module provides a function called random-adders that generates a list of random “adders,” functions that add its argument to a random number. A file called main.rkt has the following code:

#lang racket/base

(require "data.rkt")

(define iterations 1000)
(define n 5000)

(time
(for ([i (in-range iterations)])
(for/sum ([f (in-list (random-adders n))])
(f (random 10)))))
Figure 2: The main.rkt file.

This untyped module generates a list of 5,000 adders from data.rkt and computes a sum with them. It does this for 1,000 iterations and measures how long it takes.

Run the program and use Racket’s profiler to see the cost of contracts due to gradual typing. On our machine, the program takes about 10 seconds with about 67% of that time due to contract checking.

$ raco make data.rkt main.rkt

$ racket main.rkt

cpu time: 9398 real time: 9412 gc time: 874

$ raco contract-profile main.rkt

Running time is 66.66% contracts

8014/12022 ms

(-> natural? (listof (-> exact-integer? any))) 8013.5 ms

data.rkt:3:9

random-adders 8013.5 ms

$ raco decompile main.rkt > unopt.out

Run SCV-CR on the modules and see the performance difference. The raco scv-cr command creates optimized bytecode by removing contracts that are proven safe statically. It can be used as replacement for raco make. In this simple example, SCV-CR proves that the contract cannot be violated and removes it automatically.

$ raco scv-cr data.rkt main.rkt

$ racket main.rkt

cpu time: 1256 real time: 1258 gc time: 28

$ raco contract-profile main.rkt

cpu time: 1406 real time: 1410 gc time: 8

Running time is 0% contracts

0/2340 ms

$ raco decompile main.rkt > opt.out

For further confirmation that the contract was removed, you can compare the decompiled outputs in unopt.out and opt.out.

$ diff opt.out unopt.out | grep "random-adders '5000" -A 2

< (let ((local67 (random-adders '5000)))

---

> (let ((local73 (lifted/12.1 '5000)))

On top is the optimized version that directly calls random-adders. The bottom, by contrast, calls lifted/12.1 instead. If you chase down this definition, it is the contracted version of random-adders.

If SCV is unable to prove that a contract won’t be violated, then that contract is not removed. A file called main-imprecise.rkt has the following code:

#lang racket/base

(require "data.rkt")

(define iterations 1000)
(define n 5000)

(time
(for ([i (in-range iterations)])
   (for/sum ([f (in-list (random-adders n))])
     (f (if (string<=? "a" "b")
            (random 10)
            "")))))
Figure 3: The main-imprecise.rkt file.

The only difference between main.rkt and main-imprecise.rkt is that the imprecise version contains a conditional that will violate the type of random-adders if the string a is not lexicographically smaller than the string b. Since it is lexicographically smaller, this branch is never executed and the contract check is unnecessary. However, SCV cannot prove this is so, and thus the contract is not removed.

This is reflected in the performance of the “optimized” program; the profiler also confirms that the contract was not optimized away.

$ raco scv-cr data.rkt main-imprecise.rkt

$ racket main-imprecise.rkt

cpu time: 8041 real time: 8081 gc time: 550

$ raco contract-profile main-imprecise.rkt

cpu time: 9037 real time: 9172 gc time: 522

Running time is 72.02% contracts

7034/9767 ms

(-> natural? (listof (-> exact-integer? any))) 7034 ms

data.rkt:3:9

random-adders 7034 ms

See the SCV-CR documentation for the full list of options to raco scv-cr.

3 Benchmark

Execute the following commands to run the entire benchmark suite:

$ cd corpse-reviver/corpse-reviver-artifact

$ raco scv-cr-benchmark -c 5 -b -i 5 -S 2 -R 2 -o data/baseline

$ raco scv-cr-benchmark -r -o data/opt

These commands will create a series of JSON-formatted measurement files—one for each benchmark sample. Baseline measurements, without applying SCV-CR, are placed in data/baseline. Measurements after applying SCV-CR are placed in data/opt.

We have found that on a decent laptop, the benchmarks with these parameters terminate in about 14 hours. If you want to run this script overnight, combine the last two commands with &&.

The suggested flags will sample from larger benchmarks instead of exhaustively measuring them all. Different parameter choices will yield different benchmarking times. Changing -R from 2 to 1 will halve benchmark completion time. We don’t recommend going any lower than this setting. If you want to run the benchmarks for longer, increasing any one of -S, -R, or -c, will result in a more accurate result. See the benchmarking documentation for more information.

After this is done, you can generate this page with your benchmark results.

$ raco setup corpse-reviver-artifact

$ raco docs T:corpse-reviver-artifact

When raco docs opens a browser, select the first result. The page you’re currently reading should appear, but the figures and claims in Empirical results will be generated from your data instead of ours (available in the author_data directory).

4 Empirical results

Compare the figures and claims generated from your benchmarking results to those generated from ours. Depending on your choice of parameters, the results may differ more or less. However, they should generally support the thesis of the paper—that across a wide range of benchmarks, contract verification can effectively reduce the performance overhead of sound gradual typing.

4.1 Figures

Figure 4: Overhead of gradual typing over the whole benchmark suite. The purple () curve is Typed Racket and the orange () curve is SCV-CR. The log-scaled x-axis indicates slowdown factor compared against the fully-untyped configuration, while the y-axis indicates the percent of configurations incurring that slowdown. Each benchmark is allocated an equal proportion of the y-axis. Higher is better.

Figure 5: Performance lattices for sieve (left) and zombie (right). Each point in the lattice is a configuration of the benchmark, where a white box is an untyped module and a black box is a typed module. The numbers below indicate the slowdown factor for Typed Racket 7.8 on the left and SCV-CR on the right.

Figure 6: Overhead of gradual typing for each benchmark individually. The purple curve is Typed Racket and the orange curve is SCV-CR. Each point (x, y) indicates an x-factor slowdown over the fully-untyped configuration for y% of configurations. The dashed lines between 1 and 2 occur at increments of 0.2 and between 2 and 10 at increments of 2.

Figure 7: Exact time it takes each configuration to execute. The purple curve is Typed Racket and the orange curve is SCV-CR. The x-axis is binned by the number of typed modules in a configuration. The y-axis is time to execute in seconds.

Racket Overhead
SCV-CR Overhead
SCV-CR Analyze
SCV-CR Compile
Benchmark
Max
Mean
Max
Mean
Mean ± 𝜎 (s)
Mean ± 𝜎 (s)
fsm
2.8×
1.7×
1.1×
0.7×
25 ± 3
16 ± 4
gregor
2.1×
1.7×
1.3×
1.1×
879 ± 858
52 ± 11
kcfa
4.3×
2.5×
1.1×
1×
35 ± 3
34 ± 4
lnm
1.1×
0.8×
1×
0.7×
125 ± 16
44 ± 6
morsecode
1.8×
1.2×
1×
0.9×
18 ± 9
10 ± 3
sieve
18.6×
6.4×
1×
1×
5 ± 2
4 ± 2
snake
16.4×
8.2×
1.6×
1×
94 ± 43
22 ± 4
suffixtree
32.1×
14.1×
1×
0.8×
28 ± 12
21 ± 4
synth
73.6×
32.6×
1.4×
1.1×
131 ± 51
34 ± 6
tetris
11.1×
5×
1.3×
1×
190 ± 128
25 ± 6
zombie
59.8×
27.8×
1.3×
1.1×
55 ± 19
15 ± 4
zordoz
3.9×
1.8×
1×
0.9×
361 ± 8
66 ± 8
Figure 8: Maximum and mean overhead for Racket 7.8 and SCV-CR for each benchmark. Red indicates a slowdown ≥3× while green indicates a slowdown ≤1.3×.

4.2 Supporting claims

We evaluate this approach on a dozen existing gradually-typed programs previously shown to have prohibitive performance overhead—with a median overhead of 2.1× and up to 73.6× in the worst case—and eliminate all overhead in most cases and suffer 1.6× in the worst case.
For example, Typed Racket 7.8 runs 50% of benchmark configurations with less than 2× slowdown. With SCV-CR, 95% of benchmark configurations have less than 1.2× slowdown compared to the fully-untyped configuration. As this plot makes clear, SCV-CR reduces overhead to nearly zero in almost all cases, and completely eliminates all overheads over 1.6×.
The worst overhead incurred by gradual typing with SCV-CR is a slowdown of 1.6×. Only 44% of benchmark configurations without contract verification are within this slowdown factor, while the largest overhead exceeds 73.6× overhead.
Take the sieve benchmark as an example. The baseline performance begins at 50%, meaning two of the four configurations are within a 1× slowdown of the fully-untyped configuration. From figure 5, these are the fully-untyped configuration itself and the fully-typed configuration. The one-time increase in the CDF shows the configuration that has 5× overhead. We never see the CDF reach 100% since this would occur at 18.6×, beyond the x-axis’s range.
There are no pathological cases and any path through the lattice of configurations from untyped to typed programs exhibit at most 1.6× slowdown.

	Racket Overhead		SCV-CR Overhead		SCV-CR Analyze	SCV-CR Compile
Benchmark	Max	Mean	Max	Mean	Mean ± 𝜎 (s)	Mean ± 𝜎 (s)
fsm	2.8×	1.7×	1.1×	0.7×	25 ± 3	16 ± 4
gregor	2.1×	1.7×	1.3×	1.1×	879 ± 858	52 ± 11
kcfa	4.3×	2.5×	1.1×	1×	35 ± 3	34 ± 4
lnm	1.1×	0.8×	1×	0.7×	125 ± 16	44 ± 6
morsecode	1.8×	1.2×	1×	0.9×	18 ± 9	10 ± 3
sieve	18.6×	6.4×	1×	1×	5 ± 2	4 ± 2
snake	16.4×	8.2×	1.6×	1×	94 ± 43	22 ± 4
suffixtree	32.1×	14.1×	1×	0.8×	28 ± 12	21 ± 4
synth	73.6×	32.6×	1.4×	1.1×	131 ± 51	34 ± 6
tetris	11.1×	5×	1.3×	1×	190 ± 128	25 ± 6
zombie	59.8×	27.8×	1.3×	1.1×	55 ± 19	15 ± 4
zordoz	3.9×	1.8×	1×	0.9×	361 ± 8	66 ± 8