Site Search:

Identify potential threading problems among deadlock, starvation, livelock, and race conditions

Back>

There are three types of threading problems: deadlock, starvation and livelock.
race condition
race condition

deadlock

deadlock occurs when two threads are blocked by each other, since none of the threads can make progress in order to unblock the other one, the program hangs.

Here is an example. There are 2 pirates at the entrance of a treasure house and blocked each other's way. Pirate A yelled at pirate B: "I promise I will step back if you step back first." "You wish," pirate B replied "how about you step back first, I will do the same." None of them moves, it was intense.

OCPJP>cat Pirate.java 
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class Pirate {
    private Pirate enemy;
    private void setEnemy(Pirate enemy) {
        this.enemy = enemy;
    }
    
    private synchronized void moveBack() {
        try {
            System.out.println(Thread.currentThread().getName() + ": I am watching you, you move first!");
            Thread.sleep(1000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        synchronized(enemy) {
            System.out.println(Thread.currentThread().getName() + ": I step back");
        }
    }
    
    public static void main(String... args) throws InterruptedException {
        Pirate pirateA = new Pirate();
        Pirate pirateB = new Pirate();
        
        pirateA.setEnemy(pirateB);
        pirateB.setEnemy(pirateA);
        
        ExecutorService exec = Executors.newCachedThreadPool();
        exec.submit(() -> pirateA.moveBack());
        exec.submit(() -> pirateB.moveBack());
        
        exec.shutdown();
        exec.awaitTermination(10, TimeUnit.SECONDS);
    }
}
OCPJP>
OCPJP>javac Pirate.java 
OCPJP>java Pirate
pool-1-thread-1: I am watching you, you move first!
pool-1-thread-2: I am watching you, you move first!
^COCPJP>

In the Pirate example, there are two tasks are submitted to ExecutorService. Upon pool-1-thread-1 starts, the thread hold pirateA (synchronized method use "this" as lock) as the lock and enters moveBack, then goes to sleep; Upon pool-1-thread-2 starts, the thread hold pirateB as the lock and enters moveBack, then goes to sleep. 1 seconds later, pool-1-thread-1 wake up, trying to hold pirateB as lock in order to enter the synchronized(enemy) {} block. However, pool-1-thread-2 is holding the lock pirateB, so pool-1-thread-1 has to wait for pool-1-thread-2 to release the lock. At the mean time, pool-1-thread-2 wake up, trying to hold pirateA as lock in order to enter the synchronized(enemy) {} block. However, pool-1-thread-1 is holding the lock pirateA, so pool-1-thread-2 has to wait for pool-1-thread-1 to release the lock. None of the two threads can make progress, the program hangs.

starvation

starvation is a situation that a thread is unable to make progress due to the other threads constantly taking the resource that the thread is trying to access. In theory, the starving thread still have chance to make progress, however, in reality, the chance is so slim, the starving thread appears to be dead -- a situation called livelock. In practice, the boundary between starvation and livelock is blur, you want neither much more worse nor worst situation.

Ready for a sad story? Once upon a time, there was a very nice puppy, whenever he saw food, he always waits for a while, so that his friends can enjoy first. His greedy and selfish friends, finished the food as fast as they can, seldom leave anything for the nice puppy. Poor puppy is starving. Moral, blame the lazy feeder.

OCPJP>cat Puppy.java 
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicBoolean;

public class Puppy {
    private int waitTime = 0;
    private static final AtomicBoolean isFoodAvailable = new AtomicBoolean(true);
    
    public Puppy(int waitTime) {
        this.waitTime = waitTime;
    }
    
    private void eat() {

        while (true) {
            try {
                Thread.sleep(this.waitTime);
                if (isFoodAvailable.get()) {
                    // System.out.println(Thread.currentThread().getName() +
                    // ": I finished the food");
                    isFoodAvailable.set(false);
                    break;
                }
            } catch (InterruptedException e) {
                System.out.println(Thread.currentThread().getName() + ": I am still hungry.");
                break;
            }
        }
    }
    
    private static ScheduledExecutorService greedyFriends = Executors.newScheduledThreadPool(10);
    private static ScheduledExecutorService feeder = Executors.newSingleThreadScheduledExecutor();
    public static void main(String[] args) throws InterruptedException {
        feeder.scheduleWithFixedDelay(() -> isFoodAvailable.set(true), 1, 1, TimeUnit.SECONDS);
        int nicer = 213; //smaller the value, smaller the chance for starving Puppy.
        greedyFriends.scheduleAtFixedRate(() -> new Puppy(nicer).eat(), 0, 123, TimeUnit.MILLISECONDS);
        Thread starvingPuppy = new Thread(() -> new Puppy(1000).eat());
        starvingPuppy.start();
        starvingPuppy.join(15000);
        starvingPuppy.interrupt();
        System.out.println("greedyFriends:" + greedyFriends);
        greedyFriends.shutdownNow();
        greedyFriends.awaitTermination(2, TimeUnit.SECONDS);
        
        feeder.shutdown();
        feeder.awaitTermination(2, TimeUnit.SECONDS);
        System.out.println("feeder:"+feeder);
    }
}
OCPJP>javac Puppy.java 
OCPJP>date; java Puppy; date;
Sun Sep 17 07:50:11 EDT 2017
Thread-0: I am still hungry.
greedyFriends:java.util.concurrent.ScheduledThreadPoolExecutor@4a574795[Running, pool size = 10, active threads = 1, queued tasks = 0, completed tasks = 15]
pool-1-thread-6: I am still hungry.
feeder:java.util.concurrent.Executors$DelegatedScheduledExecutorService@f6f4d33
Sun Sep 17 07:50:27 EDT 2017
OCPJP>
OCPJP>date; java Puppy; date;
Sun Sep 17 07:50:37 EDT 2017
greedyFriends:java.util.concurrent.ScheduledThreadPoolExecutor@4a574795[Running, pool size = 2, active threads = 1, queued tasks = 0, completed tasks = 1]
pool-1-thread-1: I am still hungry.
feeder:java.util.concurrent.Executors$DelegatedScheduledExecutorService@f6f4d33
Sun Sep 17 07:50:38 EDT 2017
OCPJP>
OCPJP>date; java Puppy; date;
Sun Sep 17 07:50:41 EDT 2017
Thread-0: I am still hungry.
greedyFriends:java.util.concurrent.ScheduledThreadPoolExecutor@4a574795[Running, pool size = 10, active threads = 1, queued tasks = 0, completed tasks = 15]
pool-1-thread-3: I am still hungry.
feeder:java.util.concurrent.Executors$DelegatedScheduledExecutorService@f6f4d33
Sun Sep 17 07:50:56 EDT 2017
OCPJP>
OCPJP>date; java Puppy; date;
Sun Sep 17 07:51:05 EDT 2017
greedyFriends:java.util.concurrent.ScheduledThreadPoolExecutor@4a574795[Running, pool size = 2, active threads = 1, queued tasks = 0, completed tasks = 1]
pool-1-thread-1: I am still hungry.
feeder:java.util.concurrent.Executors$DelegatedScheduledExecutorService@f6f4d33
Sun Sep 17 07:51:06 EDT 2017
OCPJP>
OCPJP>date; java Puppy; date;
Sun Sep 17 07:51:10 EDT 2017
Thread-0: I am still hungry.
greedyFriends:java.util.concurrent.ScheduledThreadPoolExecutor@4a574795[Running, pool size = 10, active threads = 1, queued tasks = 0, completed tasks = 15]
pool-1-thread-3: I am still hungry.
feeder:java.util.concurrent.Executors$DelegatedScheduledExecutorService@f6f4d33
Sun Sep 17 07:51:25 EDT 2017
OCPJP>


Understand how this starvation happens needs subtle reasoning, the OCPJP won't test math. If you can not follow the reasoning of chance, no worry. Just take away the java syntax. ScheduledExecutorService is on upgrade test and useful in job, so better to be familiar with.

First of all, the starvingPuppy thread spent most of the time sleeping, at the small time window it is awake, how much chance the isFoodAvailable flag is true, so that it can exit the while loop in eat method? It depends on how much chance the flag stays true -- which depends on a few factors: the frequency feeder reset the flag to be true (you!!), the frequency a greedy friend set the flag to be false, how many hungry friends are checking the food, and when/where you run the program -- result in different thread cpu time allocating policy. In the situation we have a slow feeder,  10 faster eating puppy checking the food frequently, the starvingPuppy probably never saw the isFoodAvailable to be  true when it wakes up and check. The starvingPuppy has better chance, at beginning, when there are few (pool size) greedy friends waiting for food.

After delegated tasks to feeder, greedyFriends and started starvingPuppy thread, the main thread wait for the starvingPuppy to exit the while loop normally.

        starvingPuppy.join(15000);
        starvingPuppy.interrupt();


The wait took 15 seconds, then the starvingPuppy is interrupted so that it can exit the while loop by throwing InterruptedException.

The worker threads in greedyFriends are interrupted, the running threads are given 2 seconds (which is much longer than needed) to exit the while loop by throwing InterruptedException.

        greedyFriends.shutdownNow();
        greedyFriends.awaitTermination(2, TimeUnit.SECONDS);


Finally, the feeder is shutdown, and the program exits.

shutdown and shutdownNow has subtle differences:

  • shutdown: no new tasks are accepted, all submitted tasks are allowed to complete.
  • shutdownNow: no new tasks are accepted, no queued task is allocated worker thread to run, all running worker threads are interrupted.
  • awaiTermination: once timed out, all threads are interrupted (message/protocol to the running thread is: you should exit at a time you think proper, ASAP ). The running threads can ignore the message though (comment out break; in the catch), thus keeps running and prevents the program to exit. The only way you can stop that thread is power off the computer :).

livelock

Livelock is a special case of starvation, often as a result of two threads trying to resolve a deadlock, in which threads actively try to acquire a set of locks the other threads hold, by endlessly entering/exiting same set of blocking states. The difference between deadlock and lovelock is: threads in deadlock didn't change state; while threads in livelock endlessly switching between blocking states. The end result is the same -- the program never makes progress.

The story of two gentlemen. Two (robotic) gentlemen blocked the way of each other on a two way road. GentlemanA moves left in order to get out of the way of gentlemanB. GentlemanB does the same (with the same good intention of getting out of the way of gentlemanA). Now they are both on the same side of the road and blocked each other again. GentlemanA then moves right in order to get out of the way of gentlemanB. GentlemanB do the same with the same good intention. Their move end up as another block at the other side of the road. So they both move and again and again...until they both run out of battery...

OCPJP>cat RoboticGentleman.java 
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class RoboticGentleman {
    private volatile boolean isRiverside = true;
    private RoboticGentleman friend;
    
    private void setFriend(RoboticGentleman friend) {
        this.friend = friend;
    }
    
    private boolean polite() {
        return !this.isRiverside;
    }
    
    private void makeProgress() {
        while(true) {
            if(friend.isRiverside == this.isRiverside) {
                System.out.println(Thread.currentThread().getName() + ": Excuse me, I will move");
                try {
                    pause();
                    this.isRiverside = polite();
                    pause();
                } catch (InterruptedException e) {
                    System.out.println(Thread.currentThread().getName() + ": crap, I ran out of battery");
                    break;
                }
            } else {
                System.out.println(Thread.currentThread().getName() + ": thank you.");
                break;
            }
        }
    }
    
    private void pause() throws InterruptedException {
        Thread.sleep(1000);
    }
    
    public static void main(String args[]) throws InterruptedException {
        RoboticGentleman gentlemanA = new RoboticGentleman();
        RoboticGentleman gentlemanB = new RoboticGentleman();
        gentlemanA.setFriend(gentlemanB);
        gentlemanB.setFriend(gentlemanA);
        
        ExecutorService exec = Executors.newCachedThreadPool();
        
        Future<?> robotA = exec.submit(() -> gentlemanA.makeProgress());
        exec.submit(() -> gentlemanB.makeProgress());
        
        try {
            robotA.get(15, TimeUnit.SECONDS);
        } catch (ExecutionException | TimeoutException | InterruptedException e) {
            e.printStackTrace();
        }
        robotA.cancel(true);
        
        exec.shutdown();
        exec.awaitTermination(2, TimeUnit.SECONDS);
    }
}
OCPJP>
OCPJP>javac RoboticGentleman.java 
OCPJP>java RoboticGentleman
pool-1-thread-1: Excuse me, I will move
pool-1-thread-2: Excuse me, I will move
pool-1-thread-1: Excuse me, I will move
pool-1-thread-2: Excuse me, I will move
pool-1-thread-2: Excuse me, I will move
pool-1-thread-1: Excuse me, I will move
pool-1-thread-2: Excuse me, I will move
pool-1-thread-1: Excuse me, I will move
pool-1-thread-2: Excuse me, I will move
pool-1-thread-1: Excuse me, I will move
pool-1-thread-2: Excuse me, I will move
pool-1-thread-1: Excuse me, I will move
pool-1-thread-2: Excuse me, I will move
pool-1-thread-1: Excuse me, I will move
pool-1-thread-2: Excuse me, I will move
pool-1-thread-1: Excuse me, I will move
java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:205)
at RoboticGentleman.main(RoboticGentleman.java:55)
pool-1-thread-1: crap, I ran out of battery
pool-1-thread-2: thank you.
OCPJP>

In the RoboticGentleman example, the livelock only happens to the two (robotic) gentlemen trying to do good to each other. The lock breaks as soon as one of them stops the do good action, such as runing out of battery thus taking no action.


In the real world, human being seldom ran into deadlock, starvation and livelock. The two pirates eventually will breaks the deadlock with one of them takes a surprise move, the starvingPuppy will cry for help and the lazy feeder gives him special meal, as the roboticGentleman, you know the simple fix of the program, right? Just comment out the pause or give a random pause time is enough to teach them to be more human like.

race conditions

When a program is not properly synchronized, it runs into race condition -- a phenomenon based on changing, so called check then act -- you saw something at a moment, came back later assuming the condition still holds, alas, no love for you anymore.

OCPJP>cat ChangingNumber.java 
import java.util.stream.Stream;

public class ChangingNumber {
    private int number = 10;
    private void checkThenAct() {
        if(number >= 5) { //running this line takes cpu cycles too
            //the current thread could be suspended before next lines so that the other threads get chance to run.
            Stream.iterate(1, x -> x + 1).limit(Integer.MAX_VALUE - 10).forEach(x -> {}); //waste cpu cycles
            System.out.println(Thread.currentThread().getName() + ": I saw something in treasurebox, not " + number);
        }
    }
    
    public static void main(String[] args) {
        ChangingNumber treasure = new ChangingNumber();
        new Thread(() -> treasure.checkThenAct()).start();
        new Thread(() -> {treasure.number = 0; System.out.println(Thread.currentThread().getName() + ": you get company");}).start();  
    }
}
OCPJP>
OCPJP>javac ChangingNumber.java 
OCPJP>date; java ChangingNumber; date;
Sun Sep 17 07:57:42 EDT 2017
Thread-1: you get company
Thread-0: I saw something in treasurebox, not 0
Sun Sep 17 07:57:54 EDT 2017
OCPJP>

In the program ChangingNumber, a thread executed checkThenAct -- when it reads the value of number in if statement, the number variable is 10, then it goes ahead executing some time consuming code before reading the number variable again. If the first thread is the only one accessing treasure object, the variable number will stay at 10. However, there is another thread also accessing treasure object, which sets the number variable to 0. This example emphasis the fact that running code takes time, execute a single line of code could takes lots of cpu cycle. In multi-threading  code, variable value changes by other threads between the current thread executes this line and next line.

These hidden cpu cycles can manifest themselves in more mysterious ways.

OCPJP>cat CrazyCounter.java 
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.stream.Stream;

public class CrazyCounter {
    private int number = 0;
    
    private void checkThenAct() {
        System.out.print(" " + number++);
    }
    
    public static void main(String...args) {
        CrazyCounter counter = new CrazyCounter();
        ExecutorService exec = Executors.newCachedThreadPool();
        //ExecutorService exec = Executors.newSingleThreadExecutor();
        Stream.iterate(0, x -> x+1).limit(10).forEach(x -> exec.submit(() -> counter.checkThenAct()));
        System.out.println(exec);
        exec.shutdown();
    }
}
OCPJP>javac CrazyCounter.java 
OCPJP>java CrazyCounter
 0 4 3 2 1 6 7 5 8 9java.util.concurrent.ThreadPoolExecutor@5674cd4d[Running, pool size = 8, active threads = 0, queued tasks = 0, completed tasks = 10]
OCPJP>java CrazyCounter
 0 3 4 2 1 5 6 7 8 9java.util.concurrent.ThreadPoolExecutor@7adf9f5f[Running, pool size = 6, active threads = 0, queued tasks = 0, completed tasks = 10]
OCPJP>java CrazyCounter
 0 3 2 1 5 0 4 6 7 8java.util.concurrent.ThreadPoolExecutor@5674cd4d[Running, pool size = 8, active threads = 0, queued tasks = 0, completed tasks = 10]
OCPJP>java CrazyCounter
 0 3 4 2 1 5 6 7 8 9java.util.concurrent.ThreadPoolExecutor@85ede7b[Running, pool size = 7, active threads = 0, queued tasks = 0, completed tasks = 10]
OCPJP>java CrazyCounter
 0 3 4 2 0 5 1 6 7 8java.util.concurrent.ThreadPoolExecutor@5674cd4d[Running, pool size = 8, active threads = 0, queued tasks = 0, completed tasks = 10]
OCPJP>java CrazyCounter
 0 4 3 2 6 1 5 7 8 9java.util.concurrent.ThreadPoolExecutor@85ede7b[Running, pool size = 7, active threads = 0, queued tasks = 0, completed tasks = 10]
OCPJP>java CrazyCounter
 0 3 2 1 5 4 6 7 8 9java.util.concurrent.ThreadPoolExecutor@5674cd4d[Running, pool size = 8, active threads = 0, queued tasks = 0, completed tasks = 10]
OCPJP>java CrazyCounter
 0 3 2 1 4 5 6 7 8 9java.util.concurrent.ThreadPoolExecutor@85ede7b[Running, pool size = 7, active threads = 0, queued tasks = 0, completed tasks = 10]
OCPJP>java CrazyCounter
 0 3 2 0 5 1 4 6 7 8java.util.concurrent.ThreadPoolExecutor@5674cd4d[Running, pool size = 8, active threads = 0, queued tasks = 0, completed tasks = 10]
OCPJP>java CrazyCounter
 0 3 2 1 5 0 4 6 7 8java.util.concurrent.ThreadPoolExecutor@63961c42[Running, pool size = 8, active threads = 0, queued tasks = 0, completed tasks = 10]
OCPJP>java CrazyCounter
 0 3 2 1 0 6 5 4 7 8java.util.concurrent.ThreadPoolExecutor@5674cd4d[Running, pool size = 8, active threads = 0, queued tasks = 0, completed tasks = 10]
OCPJP>java CrazyCounter
 0 5 4 3 2 1 7 6 8 9java.util.concurrent.ThreadPoolExecutor@5674cd4d[Running, pool size = 8, active threads = 0, queued tasks = 0, completed tasks = 10]
OCPJP>java CrazyCounter
 0 4 3 2 6 1 5 7 8 9java.util.concurrent.ThreadPoolExecutor@63961c42[Running, pool size = 9, active threads = 0, queued tasks = 0, completed tasks = 10]
OCPJP>java CrazyCounter
 0 3 4 2 1 5 6 7 9 8java.util.concurrent.ThreadPoolExecutor@85ede7b[Running, pool size = 7, active threads = 0, queued tasks = 0, completed tasks = 10]
OCPJP>java CrazyCounter
 0 3 2 1 4 5 6 7 8 9java.util.concurrent.ThreadPoolExecutor@5674cd4d[Running, pool size = 8, active threads = 0, queued tasks = 0, completed tasks = 10]
OCPJP>java CrazyCounter
 0 4 3 5 2 1 6 7 8 9java.util.concurrent.ThreadPoolExecutor@5674cd4d[Running, pool size = 8, active threads = 0, queued tasks = 0, completed tasks = 10]
OCPJP>java CrazyCounter
 0 4 3 2 1 5 6 7 8 9java.util.concurrent.ThreadPoolExecutor@85ede7b[Running, pool size = 7, active threads = 0, queued tasks = 0, completed tasks = 10]
OCPJP>java CrazyCounter
 0 3 2 1 5 4 6 7 8 8java.util.concurrent.ThreadPoolExecutor@85ede7b[Running, pool size = 7, active threads = 0, queued tasks = 0, completed tasks = 10]
OCPJP>java CrazyCounter


If you run CrazyCounter.java long enough, you will see all sorts of race conditions. Most of the time, the counter prints out all the digits from 0 to 9 once. However, sometimes, some digits are skipped while other digits are printed twice (0 3 2 1 0 6 5 4 7 8). The reason is System.out.print(" " + number++); takes many cpu cycles. At moment a, Thread1 reads number as 0. Before it can execute number++ and println, thread1 is suspended and other threads get chance to run. Other threads executed through System.out.print(" " + number++); multiple times, increased the value of number to,  say 5. At moment b, Thread1's execution is re-assumed, it then executes number++, sets number's value back to 1, and println 0 to the output. The detail of he craziness is various on different systems, but they are all related to check then act.

Another hidden race condition of this program is: the main thread is racing with the 10 worker threads. If the main thread calls shutdownNow before the 10 worker threads finish executing, you will see less than 10 digits get print out.

Race condition is nasty, the take away is don't write check then act code in multi-threading program, synchronizing data access for shared objects, which we will cover in the next topic.