Site Search:

DFS and BFS

Feel the Algorithm

You looks puzzled, don't know what that guy is talking about or have no clue how to reach the goal. You are in a maze.

Here are two approaches.

Approach one. You can stop that guy for an explanation for anything you don't understand during the conversation, until you know everything thing about the topic, or... well ... until that guy get so annoyed that he has to drag the topic back to the problem you are facing right now.

Approach two. You hold your questions for a while (in a notebook maybe), let that guy finish, then you ask some questions, listen, then ask some relevant questions, until, you find the way to reach your goal. You may still have a lot of confusing about the topic, but you know enough to get the work done.

Actually the above approaches are so common, we used them everyday.

Breadth first search (BFS) is an efficient approach. We get from point A to point B quick, the key is focus on the most relevant knowledge and know just enough. During BFS explore, we gradually expand our knowledge sphere. We have to keep track of the boarder line between known and unknown, we queue our next questions into a queue or priority queue, so that we can push out our knowledge boarder from that question, later. At the beginning, when we know little, we have few questions, but as we know more, we exposed to more unknown, the knowledge sphere has bigger boarder line, the question queue gets bigger as well. This will continue until we exausted almost all the aspect of the topic, the vertex in the graph get exausted, our known and unknown boarder shrinks,  the question queue gets smaller. We finally cleared all the questions and we know the topic. Many famous algorithms took this approach -- Lazy Prim's MST, Eager Prim's MST, Kruskal's MST, Dijkstra's SPT, Bellman-Ford's SPT. As we can see, when optimization or efficiency is involved, BFS is the favorable approach. BFS is not eager, it is patient. It starts from small, patiently knows just enough, so that each question asked is the most relevant one. However, after the job is done, we looks back -- globally, it happens to be a greedy algorithm -- each step is a global optimized move (most relevant), it ends up finding an optimized property of the graph, such as MST (smallest spanning tree) or SPT (shortest path tree).

Depth first search (DFS), we consume the knowledge web by going deep. The key is marker. We have to do something about the knowledge we have already studied. First all, remember them, so that we know where we were (e.g. reachability). We may want to mark more, for example, what category it belong to so that we can cluster the knowledges (e.g. connected components). We may want to mark the route leading us here, so that later we can tell if we were going in circle (e.g. circle detection). We may also do something to the notes, maybe 2 color notes (e.g. bipartite problem). What makes the DFS special is, the approach has a natural happens-before sequence. The order you visits the graph vertexes created a dependency tree. -- Your starting point is the root of all the later events. Nothing happens, if you don't make the first move. Then you reached second point, the second point now depends on the first point, and the second point itself became the root of the later events. If there is no cycle in the graph, following your path, we just rebuilt a tree without any dependency conflict. It may sounds quite unfathomable, but not -- in your journey, you never faced the problem "hey, I can not get to this because I am lack of that", You just keep moving from this to that, your exploring route builds a perfect dependency tree -- which is a hard problem to solve given many nodes and their links (topological order).

Travel the Map

The first question we want to know about the map is: can we reach B from A?

DFS traversal

Start from A, we can search the graph, invoke a recursive method that visits vertices. If we visited B, then the answer is yes, otherwise, the answer is no.

To visit a vertex:

  1. mark it as visited.
  2. recursively visit all the neighbor vertices that haven't been visited.
With the graph model we build in previous section, add a dfs method is simple.

void dfs(Integer v) {
    marked.add(v);
    for(Integer w : map.get(v)) {
        if(!marked.contains(w)) {
            //do stuff here (pre-order)
            dfs(w);
            //do stuff here (post-order)
        }
    }
}

Notice the dfs can not stop in the middle, return won't exit the traversal. It finishes when all the vertexes reachable from A has been visited. 
Here is an example we do stuff at the pre-order position. We label the edges in the order we visited them, we also color all the edges red before we reach the goal, then color all the edges blue after we reached the goal. 
Another approach is to explore the graph by fanning out on all the directions, visit the neighbors 1 edge away, then neighbors 2 edges away... This approach gives us a bonus of finding the shortest path to reach point B from point A.

To visit vertexes, we put the source vertex on the queue, then perform the following 2 steps until the queue is empty:
  1. Take the next vertex v from the queue and mark it.
  2. Put onto the queue all the immediate neighbors of v.
With the graph model we build in previous section, add a bfs method is kind of simple, you need to pay attention to the first element before the main while loop.

void bfs(Integer s) {
    Queue<Integer> queue = new LinkedList<>();
    marked.add(s);
    //do stuff here
    queue.add(s);
    while(!queue.isEmpty()) {
        Integer v = queue.poll();
        for(Integer w : map.get(v)) {
            if(!marked.contains(w)) {
                marked.add(w);
                //do staff here
                queue.add(w);
                //no difference if you do stuff here
            }
        }
    }
}

Here is a dual example for bfs we did before. We label the edges in the order we visited them, we also color all the edges red before we reach the goal, then color all the edges blue after we reached the goal. 

Digraph

For directed graph, the dfs and bfs traversal are very similar. Here is dual of the above pairs in directed graph.

Find the Path

The second question we want to know about the map is: what is the path leading us from A to B?

In the previous problem, we have visited all the vertexes reachable from point A. Why not record our travel history in a data structure, so that given any point, we can find the previous point leading us here. By repeatedly asking which point leading us here, given any point we visited in the graph, we can traceback to the starting point A.

If the total number of points in a graph is N, a N sized array can record the previous point for all the N points. The index is the current point, the value is the previous point. Or we can use a map, the current point is the key, the previous point is the value.

To construct a path from A to B, we put the previous points into a stack when we trace back from B to A.

Finally, recording the previous point is one extra line besides the normal dfs and bfs flow.

void dfs(Integer v) {
    marked.add(v);
    for(Integer w : neighbors.get(v)) {
        if(!marked.contains(w)) {
            edgeTo[w] = v;
            dfs(w);
        }
    }
}




void bfs(Integer s) {
    Queue<Integer> queue = LinkedList<>();
    marked.add(s);
    queue.add(s);
    while(!queue.isEmpty()) {
        Integer v = queue.poll();
        for(Integer w : neighbors.get(v)) {
            if(!marked.contains(w)) {
                marked.add(w);
                edgeTo[w] = v;
                queue.add(w);
            }
        }
    }
}

Detect Circles

Naturally, dfs can detect circles in a graph during traversal. If we ever meet a previously marked point, and that marked point is not the previous point leading us here, then we just ran into a circle.

So we need to pass in the previous point leading us here.

void dfs(int v, int p) {
    marked.add(v);
    for(int w : neighbors.get(v)) {
        if(!marked.contains(w)) {
            dfs(w, v);
        } else if(w != p) {
            this.hasCircle = true;
        }
    }
}

At the starting point, we need to call dfs(start, start) to start the recursive call.

For Digraph, if we revisited a marked point that is not the previous point leading us here, it doesn't mean we have detected a circle. Digraph is one way path. We could be in the following situation. 
not a circle on digraph
not a circle on directed graph
Only when we revisited a point that is still on the recursive call stack, we can claim that a directed circle is detected. So we need to keep an array or map to record if each point is on stack or not.

void dfs(int v) {
    onStack[v] = true;
    marked.add(v);
    for(int w : neighbors.get(v)) {
        if(this.hasCircle) return;
        if(!marked.contains(w)) {
            dfs(w);
        } else if(onStack[w]) {
            this.hasCircle = true;
        }
    }
    onStack[v] = false;
}
circle on directed graph
circle on directed graph

Global Properties

In the examples we have studied so far, we start from one given point, reaching out as far as we can. There is no guarantee that we have visited each point in the graph. So we can not have conclusion for the global properties of the graph. For example:

3 disconnected parts in a graph
3 disconnected parts in a graph


  • how many connected components are in the graph? 
  • can we color all the edges(or points) with 2 colors (is the graph bipartite)?
  • is there any circles in the graph?
  • can we finish all tasks with their dependency constraint?

To answer these questions, first of all, we have to visit all the points. The code skeleton looks like:
for(int s : neighbors.keySet()) {
    if(!marked.contains(s)) {
        dfs(s);  //or bfs(s) here
        //count++;
    }
}

Connected Components

To calculate connected components, we traversal the graph and associate a component number to each point. The points with the same component number belong to the same connected components.

For directed graph, we also traversal the graph and associate a component number to each point. The points with the same component number belong to the same connected components.

However, the implementation is a little bit harder than undirected graph. For undirected graph, all the points in the same component will be reached in one travel if we start from any point in that component. Since the edge is one way, all the points in the same component may not be reached in one graph traversal starting at random point in the component. In that case, we can use union-find to combine the component numbers assigned during partial traversal of a component.

void dfs(v) {
    marked.add(v);
    if(neighbors.get(v) == null) return;
    for(Integer w : neighbors.get(v)) {
        if(!marked.contains(w)) {
            unionFind.put(w, v); //add w to the union that v belongs to
            dfs(w);
        } else {
            union(unionFind.get(w), unionFind(v));  //combine the unions that w and v belong to
        }
    }
}

Bipartite Graph

If we color the graph's points with 2 colors, then a graph is bipartite if any pair of neighboring points have different colors. 

We can dfs or bfs traversal the graph, color each point with a different color than the point leading to it. If we succeed, the graph is bipartite, otherwise, the graph is not bipartite.
For directed graph, we may need more than one traversal in order to discover all the connected points. So when we found 2 points with the same color, we can not draw quick conclusion. There are many cases to consider.



Circle Detection

In order to get conclusion, we have to traversal the whole graph, make sure we didn't miss any connected components.



For directed graph, the code is even simpler, this is because no matter where we start on a circle, we are guaranteed to traversal the whole circle and we don't have to pass in which point leads us here, when we do the recursive call.