Site Search:

Threading Strategy: A Prelude to Liveness, Performance, and Testing

Back>

Before we dive into the technical trenches of Part III—covering liveness issues, performance bottlenecks, and concurrency testing—it’s worth stepping back to ask: What thread model are we designing for?

Many concurrency problems don’t arise in a vacuum—they stem from assumptions about how threads are used, scheduled, and managed. Java and Python, though often compared as general-purpose languages, take surprisingly different routes when it comes to threading strategy. And yet, in modern frameworks, we’re starting to see a convergence toward reactive or event-loop-based designs.

Reactive Isn’t Single-Threaded (Even When It Looks That Way)

One common misconception about reactive or non-blocking systems is that they are “single-threaded.” In truth, frameworks like Netty (Java), Vert.x (Java), Node.js (JavaScript), and asyncio (Python) follow a different pattern: event-loop for dispatch, thread pool for work.



In Netty, for example, IO events are handled by an event loop thread, but heavy computation is offloaded to worker threads. Spring WebFlux adopts a similar model through the Reactor library, supporting declarative non-blocking pipelines while still using multiple threads under the hood.

Likewise in Python, asyncio programs often appear single-threaded but can delegate work to ThreadPoolExecutor or ProcessPoolExecutor. A web server like FastAPI can handle thousands of requests using async def endpoints—meanwhile, background tasks are still executed by thread pools.

In all these cases, you’ll find an event loop doing lightweight scheduling, and a separate layer—often configurable—for concurrent work. This design allows high concurrency with controlled parallelism.

So... What’s the Strategy?

The threading strategy we choose should be intentional. Here are a few guiding questions:

  • Do we use blocking APIs? If so, can we tolerate context switches and thread-per-request models?
  • Can parts of our system benefit from backpressure-aware flows (like in Reactive Streams)?
  • Are our threads CPU-bound or IO-bound? How do we divide them?
  • Is our concurrency model implicit (managed by a framework) or explicit (we control every thread)?

Reactive frameworks give us tools for managing these trade-offs. They’re not magic. They are highly structured strategies to reduce contention, increase scalability, and defer blocking to well-defined edges of the system.

From Strategy to Reality

Part III of Java Concurrency in Practice focuses on what happens when our strategy meets reality: when threads block, when deadlocks sneak in, when performance plateaus, and when we try to test all this. It’s about keeping systems alive, responsive, and observable under real workloads.

But first, we have to think clearly about how our code will run, where the threads come from, and what work they’re allowed to do.

Final Thoughts

Every concurrency bug has a context—and that context is shaped by the thread strategy we adopt. Whether you're writing a microservice with Spring WebFlux, a data API with FastAPI, or building a custom actor system, the principles of liveness, performance, and testing begin with the threads that quietly power it all.

In the next chapters, we’ll study what goes wrong when these threads misbehave—and how to recover or prevent it.

Java concurrency in practice with examples

This java study materials arranged around Brian Goetz's book Java concurrency in practice, added java 7 and 8 features. Runnable java examples are provided for each chapters.

Chapter 1 Introduction

I Fundamentals


Chapter 2 Thread Safety

Chapter 3 Sharing Objects

Chapter 4 Composing Objects

Chapter 5 Building blocks

II Structuring Concurrent Applications




Chapter 9 GUI Applications

III Liveness, Performance, and Testing




Chapter 12 Testing Concurrent Programs

Concurrency in Java 8 (lamda and parallel Stream)







Use parallel Streams including reduction, decomposition, merging processes, pipelines and performance

Use CompletableFuture to build asynchronous application, pipelining asynchronous tasks

Chapter 11: Performance and Scalability

Back>

Performance is not just about making your code run fast—it's about doing the right work with minimal resource contention and ensuring the system scales gracefully under load. Chapter 11 explores how concurrent programs behave under increased demand and how to design systems that maintain performance as they scale.

Performance vs. Scalability

Performance refers to how quickly a task completes. Scalability refers to how well the system performs as the workload or number of threads increases.

A program might perform well with one or two threads but degrade rapidly when concurrency increases. This is where understanding Amdahl’s Law becomes crucial.

Amdahl's Law

Amdahl's Law describes the theoretical maximum speedup of a program when only part of it can be parallelized. It shows the limits of performance gains from adding more threads or processors.


Speedup(N) = 1 / (S + (1 - S) / N)
  • N is the number of processors or threads.
  • S is the fraction of the program that must run serially (non-parallelizable).

If 30% of a task is inherently sequential (S = 0.3), even with 1000 threads, the speedup cannot exceed ~3.3×.

Amdahl's Law Graph

Below is a graph graph illustrating Amdahl’s Law speedup for different serial fractions (S = 0.0, 0.1, 0.3, 0.5, 0.7)


As you can see:

  • With perfect parallelism (S = 0), speedup grows linearly.
  • With even a small serial portion, the speedup plateaus.
From another angle, here is a graph showing maximum utilization under Amdahl’s Law for various serial fractions (S = 0.0, 0.1, 0.3, 0.5, 0.7).


Speedup(N)     = 1 / (S + (1 - S) / N)
Utilization(N) = Speedup(N) / N

Where:

  • N is the number of processors.

  • S is the serial (non-parallelizable) portion of the program.

Utilization tells us how effectively each processor is being used. Even with many processors, utilization drops significantly when S is high.

Common Bottlenecks in Concurrent Applications

  • Lock contention: Threads blocking on synchronized code.
  • Context switching: Frequent switching between threads wastes CPU time.
  • Memory synchronization: Shared data updates require barriers that slow performance.
  • Coarse-grained locking: Large critical sections reduce parallelism.

Strategies to Improve Performance

1. Reduce Lock Scope


// Bad
synchronized (list) {
    for (Item item : list) {
        process(item);
    }
}

// Better
List<Item> snapshot;
synchronized (list) {
    snapshot = new ArrayList<>(list);
}
for (Item item : snapshot) {
    process(item);
}

2. Use Concurrent Collections

Use high-throughput alternatives like ConcurrentHashMap and ConcurrentLinkedQueue instead of synchronizing entire data structures.

3. Favor Atomic Variables


AtomicInteger counter = new AtomicInteger();

void increment() {
    counter.incrementAndGet();
}

4. Partition Workload

Split tasks into smaller units that operate independently, reducing lock contention and improving CPU cache efficiency.

Scalability Killers

  • Coarse-grained locks: Serializes large sections of code.
  • Shared mutable state: Increases contention and reduces concurrency.
  • Excessive thread count: Leads to overhead, cache invalidation, and thread thrashing.

Benchmarking and Profiling Tools

  • JVisualVM
  • JConsole
  • Java Flight Recorder (JFR)
  • perf, dtrace, or Instruments on macOS

Final Thoughts

Concurrency isn't just about making things faster. It's about doing things in parallel while avoiding pitfalls like contention and bottlenecks. Understanding the limits of parallelism through Amdahl’s Law, using fine-grained synchronization, and leveraging concurrent utilities can help you build applications that not only perform well, but also scale as your workload grows.

Next up: Chapter 12 – Explicit Locks and Conditions, where we dive into lower-level concurrency control.





Deadlock Analysis with Thread Dumps

Deadlock Analysis with Thread Dumps

In this section, we demonstrate a real deadlock scenario using the LeftRightDeadlock program. We reproduce the issue on macOS, capture a thread dump using kill -3, and analyze the output to understand how the deadlock occurs.

Source Code: LeftRightDeadlock.java


public class LeftRightDeadlock {
    private final Object left = new Object();
    private final Object right = new Object();

    public void leftRight() {
        synchronized (left) {
            synchronized (right) {
                doSomething();
            }
        }
    }

    public void rightLeft() {
        synchronized (right) {
            synchronized (left) {
                doSomethingElse();
            }
        }
    }

    void doSomething() {
        for (int i = 0; i < 10000; i++) {
            int j = i * i;
        }
    }

    void doSomethingElse() {
        for (int i = 0; i < 10000; i++) {
            int j = i * i;
        }
    }

    public static void main(String... args) {
        LeftRightDeadlock lrdl = new LeftRightDeadlock();
        for (int i = 0; i < 1000; i++) {
            new Thread(() -> {
                lrdl.leftRight();
                lrdl.rightLeft();
            }).start();
        }
    }
}

Terminal Commands to Reproduce the Deadlock


# Compile the program
javac LeftRightDeadlock.java

# Run the program and redirect output to a file
java LeftRightDeadlock > threaddump.out 2>&1 &

# Find the Java process
ps -ef | grep java

Example output:


501 36829   601   0 12:34AM ttys000    0:00.27 /usr/bin/java LeftRightDeadlock

# Send SIGQUIT to capture thread dump
kill -3 36829

Thread Dump Output (Excerpt)


>grep -A 300 "Found one Java-level deadlock:" threaddump.out
Found one Java-level deadlock:
=============================
"Thread-2":
  waiting to lock monitor 0x00006000008e4750 (object 0x000000070ff665a8, a java.lang.Object),
  which is held by "Thread-3"

"Thread-3":
  waiting to lock monitor 0x00006000008ec410 (object 0x000000070ff665b8, a java.lang.Object),
  which is held by "Thread-7"

"Thread-7":
  waiting to lock monitor 0x00006000008e4750 (object 0x000000070ff665a8, a java.lang.Object),
  which is held by "Thread-3"

Java stack information for the threads listed above:
===================================================
"Thread-2":
	at LeftRightDeadlock.leftRight(LeftRightDeadlock.java:7)
	- waiting to lock <0x000000070ff665a8> (a java.lang.Object)
	at LeftRightDeadlock.lambda$main$0(LeftRightDeadlock.java:37)
	at LeftRightDeadlock$$Lambda$1/0x0000000800000a30.run(Unknown Source)
	at java.lang.Thread.run(java.base@17.0.14/Thread.java:840)
"Thread-3":
	at LeftRightDeadlock.leftRight(LeftRightDeadlock.java:8)
	- waiting to lock <0x000000070ff665b8> (a java.lang.Object)
	- locked <0x000000070ff665a8> (a java.lang.Object)
	at LeftRightDeadlock.lambda$main$0(LeftRightDeadlock.java:37)
	at LeftRightDeadlock$$Lambda$1/0x0000000800000a30.run(Unknown Source)
	at java.lang.Thread.run(java.base@17.0.14/Thread.java:840)
"Thread-7":
	at LeftRightDeadlock.rightLeft(LeftRightDeadlock.java:16)
	- waiting to lock <0x000000070ff665a8> (a java.lang.Object)
	- locked <0x000000070ff665b8> (a java.lang.Object)
	at LeftRightDeadlock.lambda$main$0(LeftRightDeadlock.java:38)
	at LeftRightDeadlock$$Lambda$1/0x0000000800000a30.run(Unknown Source)
	at java.lang.Thread.run(java.base@17.0.14/Thread.java:840)

Found 1 deadlock.

Heap
 garbage-first heap   total 262144K, used 2949K [0x0000000700000000, 0x0000000800000000)
  region size 2048K, 2 young (4096K), 0 survivors (0K)
 Metaspace       used 4459K, committed 4672K, reserved 1114112K
  class space    used 343K, committed 448K, reserved 1048576K

Deadlock Analysis

The thread dump reveals a circular wait condition:

  • Thread-2 is waiting to lock left, which is held by Thread-3.
  • Thread-3 is waiting to lock right, which is held by Thread-7.
  • Thread-7 is waiting to lock left, which is held by Thread-3.

All threads are stuck waiting on each other, forming a cycle. The root cause is that leftRight() and rightLeft() methods acquire the same two locks but in opposite order, making deadlock inevitable when multiple threads interleave.

How to Fix

To eliminate the deadlock, use a consistent lock acquisition order throughout your code. For example, always acquire left before right, regardless of context. This ensures that no circular wait condition can occur.

Alternatively, refactor critical sections to avoid nested locking, or use higher-level concurrency utilities like ReentrantLock with try-lock and timeouts.