Authors: Agostino Cortesi, Henrich Lauko, Martina Olliaro, Petr Ročkai
Abstract:

Automatic abstraction is a powerful software verification technique. In this paper, we elaborate an abstract domain for C strings, that is, null-terminated arrays of characters. We describe the abstract semantics of basic string operations and prove their soundness with regards to previously established concrete semantics of those operations. In addition to a selection of string functions from the standard C library, we provide semantics for character access and update, which make it possible to automatically lift arbitrary string-manipulating code into the domain.

The domain we present (called M-String) has two other abstract domains as its parameters: an index (bound) domain and a character domain. Picking different constituent domains allows M-String to be tailored for specific requirements of a given verification task, balancing precision against complexity.

In addition to describing the domain theoretically, we also provide an executable implementation of the abstract operations, along with a tool to automatically lift existing programs into the M-String domain. Using this tool and an explicit-state model checker, we have evaluated the proposed domain experimentally on a few simple but realistic test programs.

Documents:

Installation

DIVINE should work on any reasonably recent Linux distribution. It might also be possible to make it work on macOS, *BSD or even on Windows using Windows Subsystem for Linux, but this is not tested routinely.

Building from source

To build from th source, you will need some prerequisites, namely a C++11 compiler such as reasonably recent GCC (>= 4.9) or Clang (>= 3.2), make, CMake (>= 3.2), libedit, perl, and about 12 GB of disk space and 4 GB of RAM. For the symbolic verification to work, you will also need the Z3 solver, including its libraries.

To build DIVINE, you have to xtract the archive and then run make divine in its main directory. This should build DIVINE into _build.release/tools subdirectory by default. For more details, please refet to the manual.

Usage

Binary distribution contains a main binary of divine. If you have built DIVINE from the source, the built binary resides in _build.release/tools/divine.

To run divine verification with mstring abstraction on source.c use:

divine check --lart abstraction:mstring --symbolic -o ignore:memory -o ignore:control source.c

M-String interface

An abstract string with symbolic bounds can be created using a cunstructor __mstring_sym_val. To get definition of this function, source code needs to include rst/domains.h header.

__mstring_sym_val takes a variadic number of arguments – characters and bounds.

Example: construction of an mstraing of form a^i b^j where i + j < 5, i > 0, j > 0

char * str = __mstring__sym_val(3, 'a', 1, 5, 'b', 1, 5, '\0', 5, 6);

where the first argument is a number of characters including terminating zero. Following arguments consists of interleaved characters and bounds. The first bound is implicitly an interval [0, 1), the second bound is in the interval of [1, 5). The third bound is again in [1, 5), however the implementation guarantees that its greater than previous bound. End of the string buffer is guarded by last bound [5, 6).

Hence the resulting msting contains a sequence of as in range from index 0 up to first bound. Succeeding by bs starting at index of the first bound and ending at the second bound. The rest of the mstring contains zeros up to the last bound. Since bounds are symbolic, the resulting mstring describe all strings of the required form.

The implementation of M-String domain is located in attached sources. It is split into multiple files:

Evaluation

To run benchmarks unzip benchmark files and call python benchmark.py with divine in PATH.

Abstract Operations (sources)

Benchmarks of abstract operations were evaluated on three types of M-Strings:

  1. Word: a string of the form a^i b^j c^k , i + j + k ≤ length.

    states 8 64 1024 4096 LART
    strcmp 163 12.8s 14.7s 17.1s 27.2s 0.85s
    strcpy 36 1.67s 1.63s 2.18s 2.48s 0.51s
    strcat 477 32.2s 33.4s 31.9s 33.4s 0.92s
    strchr 24 0.28s 0.35s 0.53s 1.14s 0.88s
    strlen 26 0.45s 0.46s 0.69s 1.31s 0.86s
  2. Sequence: a string of the form a^i, i ≤ length.

    states 8 64 1024 4096 LART
    strcmp 12 0.55s 0.67s 1.15s 1.79s 0.65s
    strcpy 9 0.36s 0.27s 0.51s 0.83s 0.88s
    strcat 25 2.28s 2.5s 2.79s 3.19s 0.93s
    strchr 6 0.03s 0.08s 0.13s 0.26s 0.56s
    strlen 6 0.09s 0.11s 0.17s 0.33s 0.86s
  3. Alternation: is a string of form a^i b^j a^k b^l , i + j + k + l ≤ length.

    states 8 64 1024 4096 LART
    strcmp 744 63.4s 110s 129s 141s 0.77s
    strcpy 74 3.62s 4.9s 5.3s 4.36s 0.46s
    strcat 2406 208s 218s 220s 205s 0.95s
    strchr 45 0.54s 0.54s 0.92s 1.89s 0.83s
    strlen 53 1.01s 1.21s 1.94s 2.33s 0.82s

Columns in tables denote the size of state space, time of verification for various string sizes and time of transformation in LART.

C Standard Library (sources)

Number in column headers denotes length of input string.


  1. pdclib

  1. μCLibc

  1. musl-libc

Bison Grammars

  1. Numerical expressions (sources)
time 10 states 10 time 20 states 20 time 25 states 25 time 35 states 35
addition 40.2s 416 319s 3548 622s 13000 T -
ones 5.5s 62 8.1s 196 29.7s 402 189s 2186
alternation 708s 105 582s 11000 T - T -

addition is a string with two numbers with between them (the string is of form 123 + 123, where the whole length is bounded by the value from the table)

ones is a simplified version of addition where numbers consist of ones only

alternation is expression that contains numbers with alternating digits (1010)


  1. Simple programming language (sources)
time 10 states 10 time 50 states 50 time 100 states 100 time 1000 states 1000
value 6.58s 38 44.4s 238 90.4s 488 1100s 4988
loop 1.53s 23 3.28s 23 4.88s 23 33.3s 23
wrong 7.34s 82 27.9s 442 67.7s 892 311s 8992

value is a program with value declaration

loop is a program with a single loop

wrong is a program a syntactic error