Nuclear thread pool
Java developers know, that beneath classes from *.concurrent.* package there is a more low-level atomic Compare-And-Swap (CAS) mechanism.
Let’s take a look into ReentrantLock.tryLock() method:
final boolean tryLock() {
Thread current = Thread.currentThread();
int c = getState();
if (c == 0) {
if (compareAndSetState(0, 1)) {
setExclusiveOwnerThread(current);
return true;
}
} else Indeed, in main branch which returns true we see compareAndSetState() method, and if we follow stack of calls, we’ll finish in compareAndSetInt() from Unsafe class.
I won’t look through all locks in *.concurrent.* package, but there are plenty of other places where CAS is used, you can find them easily.
Now I take a step above and look into LinkedBlockingQueue and ConcurrentLinkedQueue collections.
In LinkedBlockingQueue offer() and take() methods are using ReentrantLock. Are there other collections which are not using locks?
Yes, sure. In ConcurrentLinkedQueue offer() and take() methods are based on atomic compareAndSet() method from VarHandle class (which does the same as Unsafe.compareAndSet()). That means, that, probably, ReentrantLock as a locking wrapper has some performance impact if compared with its atomic equivalent. The question is: how to measure this impact?
I’ve implemented two thread-pools:
Both thread-pools contain list of PoolRunnable worker threads which are constantly monitoring private val workingQueue and if new task submitted by execute() method - takes it into work.
In order to highlight performance impact of locking mechanism I’ll submit a large number of light-weight tasks. Each task increments a counter and that’s it. This approach should cause large amount of lock/unlock operations in offer()/take() method in LinkedBlockingQueue hence performance impact will become visible.
@Benchmark
fun atomic() {
val concurrentLinkedQueuePool = ConcurrentLinkedQueuePool(10)
val counter = AtomicInteger()
for (i in 0 until numOfTasks) {
val t = Runnable {
counter.getAndIncrement()
}
concurrentLinkedQueuePool.execute(t)
}
Thread.sleep(1)
concurrentLinkedQueuePool.stop()
}Full code of the benchmark is here.
Let’s run benchmark and see the numbers.
main summary:
Benchmark (numOfTasks) Mode Cnt Score ... Units
ThreadPoolBenchMark.atomic 100000000 avgt 3 15677.358 ... ms/op
ThreadPoolBenchMark.blocking 100000000 avgt 3 21440.026 ... ms/op
Hypothesis was correct: atomic implementation is faster by ~25%.
So, I think, this can be taken into consideration while solving multithreading problems.
Stay atomic!