Nuclear thread pool
Java developers know, that beneath classes from *.concurrent.*
package there is
a more low-level atomic Compare-And-Swap(CAS) mechanism.
Let’s take a look into ReentrantLock.tryLock()
method:
final boolean tryLock() {
Thread current = Thread.currentThread();
int c = getState();
if (c == 0) {
if (compareAndSetState(0, 1)) {
setExclusiveOwnerThread(current);
return true;
}
} else
Indeed, in main branch which returns true
we see compareAndSetState()
method, and if we follow stack of calls, we’ll finish in compareAndSetInt()
from Unsafe
class.
I won’t look through all locks in *.concurrent.*
package, but there are plenty of other places where CAS
is used, you can find them easily.
Now I take a step above and look into LinkedBlockingQueue
and ConcurrentLinkedQueue
collections.
In LinkedBlockingQueue
offer()
and take()
methods are using ReentrantLock
. Are there other collections which are not using locks ?
Yes, sure. In ConcurrentLinkedQueue
offer()
and take()
methods are based on atomic compareAndSet()
method from VarHandle
class (which does the same asUnsafe.compareAndSet()
). That means, that, probably, ReentrantLock
as a locking wrapper has some performance impact
if compared with its atomic equivalent. The question is: how to measure this impact ?
I’ve implemented two thread-pools:
Both thread-pools contain list of PoolRunnable
worker threads which are constantly monitoring private val workingQueue
and if new task submitted
by execute()
method - takes it into work.
In order to highlight performance impact of locking mechanism I’ll submit a large number of light-weight tasks. Each task increments
a counter and that’s it. This approach should cause
large amount of lock/unlock operations in offer()/take()
method in LinkedBlockingQueue
hence performance impact will become visible.
@Benchmark
fun atomic() {
val concurrentLinkedQueuePool = ConcurrentLinkedQueuePool(10)
val counter = AtomicInteger()
for (i in 0 until numOfTasks) {
val t = Runnable {
counter.getAndIncrement()
}
concurrentLinkedQueuePool.execute(t)
}
Thread.sleep(1)
concurrentLinkedQueuePool.stop()
}
Full code of the benchmark is here
used here.
Let’s run benchmark and see the numbers.
main summary:
Benchmark (numOfTasks) Mode Cnt Score ... Units
ThreadPoolBenchMark.atomic 100000000 avgt 3 15677.358 ... ms/op
ThreadPoolBenchMark.blocking 100000000 avgt 3 21440.026 ... ms/op
Hypothesis was correct: atomic implementation is faster by ~25%.
So, I think, this can be taken into consideration while solving multithreading problems.
Stay atomic!