Nuclear thread pool

Java developers know, that beneath classes from *.concurrent.* package there is a more low-level atomic Compare-And-Swap(CAS) mechanism.
Let’s take a look into ReentrantLock.tryLock() method:

final boolean tryLock() {
Thread current = Thread.currentThread();
int c = getState();
if (c == 0) {
  if (compareAndSetState(0, 1)) {
    setExclusiveOwnerThread(current);
    return true;
  }
} else

    Indeed, in main branch which returns true we see compareAndSetState() method, and if we follow stack of calls, we’ll finish in compareAndSetInt() from Unsafe class. I won’t look through all locks in *.concurrent.* package, but there are plenty of other places where CAS is used, you can find them easily.
    Now I take a step above and look into LinkedBlockingQueue and ConcurrentLinkedQueue collections. In LinkedBlockingQueue offer() and take() methods are using ReentrantLock. Are there other collections which are not using locks ?
    Yes, sure. In ConcurrentLinkedQueue offer() and take() methods are based on atomic compareAndSet() method from VarHandle class (which does the same asUnsafe.compareAndSet()). That means, that, probably, ReentrantLock as a locking wrapper has some performance impact if compared with its atomic equivalent. The question is: how to measure this impact ?
    I’ve implemented two thread-pools:

Both thread-pools contain list of PoolRunnable worker threads which are constantly monitoring private val workingQueue and if new task submitted by execute() method - takes it into work. In order to highlight performance impact of locking mechanism I’ll submit a large number of light-weight tasks. Each task increments a counter and that’s it. This approach should cause large amount of lock/unlock operations in offer()/take() method in LinkedBlockingQueue hence performance impact will become visible.

  @Benchmark
  fun atomic() {
  val concurrentLinkedQueuePool = ConcurrentLinkedQueuePool(10)
  val counter = AtomicInteger()

        for (i in 0 until numOfTasks) {
            val t = Runnable {
                counter.getAndIncrement()
            }
            concurrentLinkedQueuePool.execute(t)
        }
        Thread.sleep(1)
        concurrentLinkedQueuePool.stop()
  }

Full code of the benchmark is here used here.
Let’s run benchmark and see the numbers.

main summary:
Benchmark                   (numOfTasks)  Mode  Cnt      Score  ...  Units
ThreadPoolBenchMark.atomic     100000000  avgt    3  15677.358  ...  ms/op
ThreadPoolBenchMark.blocking   100000000  avgt    3  21440.026  ...  ms/op

Hypothesis was correct: atomic implementation is faster by ~25%.
So, I think, this can be taken into consideration while solving multithreading problems.
Stay atomic!