XYZ CODE: Chapter 11: Performance and Scalability

Back>

Performance is not just about making your code run fast—it's about doing the right work with minimal resource contention and ensuring the system scales gracefully under load. Chapter 11 explores how concurrent programs behave under increased demand and how to design systems that maintain performance as they scale.

Performance vs. Scalability

Performance refers to how quickly a task completes. Scalability refers to how well the system performs as the workload or number of threads increases.

A program might perform well with one or two threads but degrade rapidly when concurrency increases. This is where understanding Amdahl’s Law becomes crucial.

Amdahl's Law

Amdahl's Law describes the theoretical maximum speedup of a program when only part of it can be parallelized. It shows the limits of performance gains from adding more threads or processors.


Speedup(N) = 1 / (S + (1 - S) / N)

N is the number of processors or threads.
S is the fraction of the program that must run serially (non-parallelizable).

If 30% of a task is inherently sequential (S = 0.3), even with 1000 threads, the speedup cannot exceed ~3.3×.

Amdahl's Law Graph

Below is a graph graph illustrating Amdahl’s Law speedup for different serial fractions (S = 0.0, 0.1, 0.3, 0.5, 0.7)

As you can see:

With perfect parallelism (S = 0), speedup grows linearly.
With even a small serial portion, the speedup plateaus.

From another angle, here is a graph showing maximum utilization under Amdahl’s Law for various serial fractions (S = 0.0, 0.1, 0.3, 0.5, 0.7).

Speedup(N) = 1 / (S + (1 - S) / N)

Utilization(N) = Speedup(N) / N

Where:

N is the number of processors.
S is the serial (non-parallelizable) portion of the program.

Utilization tells us how effectively each processor is being used. Even with many processors, utilization drops significantly when S is high.

Common Bottlenecks in Concurrent Applications

Lock contention: Threads blocking on synchronized code.
Context switching: Frequent switching between threads wastes CPU time.
Memory synchronization: Shared data updates require barriers that slow performance.
Coarse-grained locking: Large critical sections reduce parallelism.

Strategies to Improve Performance

1. Reduce Lock Scope


// Bad
synchronized (list) {
    for (Item item : list) {
        process(item);
    }
}

// Better
List<Item> snapshot;
synchronized (list) {
    snapshot = new ArrayList<>(list);
}
for (Item item : snapshot) {
    process(item);
}

2. Use Concurrent Collections

Use high-throughput alternatives like ConcurrentHashMap and ConcurrentLinkedQueue instead of synchronizing entire data structures.

3. Favor Atomic Variables


AtomicInteger counter = new AtomicInteger();

void increment() {
    counter.incrementAndGet();
}

4. Partition Workload

Split tasks into smaller units that operate independently, reducing lock contention and improving CPU cache efficiency.

Scalability Killers

Coarse-grained locks: Serializes large sections of code.
Shared mutable state: Increases contention and reduces concurrency.
Excessive thread count: Leads to overhead, cache invalidation, and thread thrashing.

Benchmarking and Profiling Tools

JVisualVM
JConsole
Java Flight Recorder (JFR)
perf, dtrace, or Instruments on macOS

Final Thoughts

Concurrency isn't just about making things faster. It's about doing things in parallel while avoiding pitfalls like contention and bottlenecks. Understanding the limits of parallelism through Amdahl’s Law, using fine-grained synchronization, and leveraging concurrent utilities can help you build applications that not only perform well, but also scale as your workload grows.

Next up: Chapter 12 – Explicit Locks and Conditions, where we dive into lower-level concurrency control.

XYZ CODE

Chapter 11: Performance and Scalability