Model Checking C and C++ Code via LLVM Bitcode ============================================== The traditional "explicit-state" model checking practice is not widely adopted in the programming community, and vice-versa, use of mainstream programming languages is not a common practice in the model checking community. Hence, model checking of systems expressed as C programs comes with some novelties for both kinds of users. First of all, the current main application of DIVINE is verification of safety (and some liveness) properties of asynchronous, shared-memory programs. The typical realisation of such asynchronous parallelism is programming with threads and shared memory. Often for performance and/or familiarity reasons, programming with threads, shared memory and locking is the only viable alternative, even though the approach is fraught with difficulties and presents many pitfalls that can catch even expert programmers unaware -- not to say novices. Sadly, resource locking is inherently non-compositional, hence there is virtually no way to provide a reliable yet powerful abstraction, all that while retaining speed and scalability. Despite all its shortcomings, lock-based programming (or alternatively, lock-free shared memory programming, which is yet more difficult) is becoming more prevalent. Model checking provides a powerful tool to ascertain correctness of programs written with locks around concurrent access to shared memory. Most programmers will agree that bugs that show up rarely and are nearly impossible to reproduce are the worst kind to deal with. Sadly, most concurrency bugs are of this nature, since they arise from subtle interactions of nondeterministically scheduled threads of execution. A test-case may work 99 times out of 100, yet the 100th time die due to an unfathomable invalid memory access. Even sophisticated modern debugging tools like `valgrind` are often powerless in this situation. This is where DIVINE can help, since it systematically and efficiently explores all relevant execution orderings, discovering even the subtlest race conditions, restoring a crucially important property of bugs: reproducibility. If you have ever changed a program and watched the test-case run in a loop for hundreds of iterations, wondering if the bug is really fixed, or it just stubbornly refuses to crash the program... well, DIVINE is the tool you have been waiting for. Of course, there is a catch. Model checking is computationally intensive and memory-hungry. While this is usually not a problem with comparably small unit tests, applying model checking to large programs may not be feasible -- depending on your use-case, and on the amount of memory and time that you have. With the universal LLVM backend, DIVINE can support a wide range of compiled programming languages. However, out of the box, language-specific support is only provided for C and C++. A fairly complete ISO C runtime library is provided as part of DIVINE, with appropriate hooks into DIVINE, as well as an implementation of the `pthread` specification, i.e. the POSIX threading interface. Additionally, an implementation of the standard C++ library is bundled with DIVINE. Besides the standard library, DIVINE also provides an adapted version of the *runtime* library required by C++ programs to implement runtime type identification and exception handling; both are fully supported by DIVINE. If your language interfaces with the C library, the `libc` part of language support can be re-used transparently. However, currently no other platform glue is provided for other languages. Your mileage may vary. Data structure and algorithmic code can be very likely processed with at most trivial additions to the support code, in any language that can be compiled into LLVM bitcode. Compiling Programs ------------------ The first step to take when you want to use DIVINE for C/C++ verification is to compile your program into LLVM bitcode and link it against the DIVINE-provided runtime libraries. When you issue `divine cc program.c`, `divine` will compile the runtime support and your program and link them together automatically, using a builtin clang-based compiler. The `cc` subcommand accepts a wide array of traditional C compiler flags like `-I`, `-std`, `-W` and so on. Alternatively, you can pass a single-file C or C++ program directly to other `divine` commands, like `verify` or `sim`, in which case the program will be compiled and linked transparently. Limitations ----------- When DIVINE interprets your program, it does so in a very strictly controlled environment, so that every step is fully reproducible. Hence, no real IO is allowed; the program can only see its own memory. While the `pthread` interface is fully supported, the `fork` system call is not: DIVINE cannot verify multi-process systems at this time. Likewise, C++ exceptions are well-supported but the C functions `setjmp` and `longjmp` are not yet implemented. The interpreter simulates an "in order" CPU architecture, hence any possible detrimental effects of instruction and memory access reordering won't be detected. Again, an improvement in this area is planned for a future release. State Space of a Program ------------------------ Internally, DIVINE constructs the state space of your program -- an oriented graph, whose vertices represent various states your program can reach, i.e. the values of all its mapped memory locations, the values in all machine registers, and so on. DIVINE normally doesn't store (or construct) all the possible states -- only a relevant subset. In a multi-threaded program, it often happens that more than one thread can run at once, i.e. if your program is in a given state, the next state it will get into is determined by chance -- it all hinges on which thread gets to run. Hence, some (in fact, most) states in a parallel program can proceed to multiple different configurations, depending on a scheduling choice. In such cases, DIVINE has to explore both such "variant" successors to determine the behaviour of the program in all possible scenarios. You can visualise the state space DIVINE explores by using `divine draw` -- this will show how the "future" of your program branches through various configurations, and how it converges back into a common point -- or, if its behaviour had changed depending on scheduling, diverges into multiple different outcomes. Non-Deterministic Choice ------------------------ Scheduling is not the only source of non-determinism in typical programs. Interactions with the environment can have different outcomes, even if the internal state of the program is identical. A typical example would be memory allocation: the same call in the same context could either succeed or fail, depending on the conditions outside the control of the program. In addition to failures, the normal input of the program (files, network, user input on the terminal) are all instances of non-determinism due to external influences. The sum of all such external behaviours that affect the outcome of the program is called the environment (this includes the scheduling of threads done by the operating system and/or CPU). When testing, the environment is controlled to some degree: the inputs are usually fixed, but eg. thread scheduling is (basically) random -- it changes with every test execution. Resource exhaustion can be simulated in a test environment to some degree. In a model checker, the environment is controlled much more strictly: when a test case is run through a model checker, it will come out the same every time. ω-Regular Properties and LTL ----------------------------