#44 accepted defect

Multi-Threaded Performance Doesn't Scale

Reported by:	John Lång	Owned by:	mornfall
Priority:	minor	Milestone:	5.0
Component:	other	Keywords:
Cc:

Description

On my Debian laptop, I noticed that running divine check on my model took 30s for one thread, 23s for two threads, and 21s for four threads. The serial part in the beginning took around 10 seconds. I'm currently running divine check on a bigger model on a virtual machine (Also running Debian 9) with 32 CPUs and the performance, in terms of Mips and states per second, doesn't seem to be basically any better than with a single thread. It looks like all the CPU cores do produce heat, but they're spending most of the time in kernel space. Could there be a race for locks going on between the model checking threads or something?

My models are part of the ADAPRO project in GitLab?: https://gitlab.com/jllang/adapro
I wrote this nice BASH script for running DIVINE on them. It requires some extra utilities, like bc, pgrep, and gnuplot, though.

Change History (2)

comment:1 Changed 6 years ago by mornfall

Milestone:	future → 5.0

Milestone renamed

comment:2 Changed 5 years ago by mornfall

Status:	new → accepted

Part of the issue is caused by boolean values represented using a single bit in LLVM, which causes lots of undefined bits to crop up in stack frames and that in turn hits slow paths in the code. Adding a special case for i1 values helps with scalability somewhat (but doesn't get us all the way). To make it clear, this hack is *not* merged, we want to fix the problem differently (but it'll still take some time). In the meantime, here's some numbers.

Current Code

 4 threads = 3.8 mips

With the i1 hack

 1 thread  = 2.2 mips
 4 threads = 6.5 mips
 8 threads = 6.8 mips
16 threads = 4.7 mips

Note: See TracTickets for help on using tickets.