Site Search:

Reduce string to smallest lexicographical order

Problem

Given a string which contains only lowercase letters, remove duplicate letters so that every letter appears once and only once. You must make sure your result is the smallest in lexicographical order among all possible results.

Example 1:

Input: "bcabc"
Output: "abc"
Example 2:

Input: "cbacdcbc"
Output: "acdb"

Solution


The lexicographical order comparison of 2 strings start at index 0, from left to right, compare the characters at the same position, stop at the first position the characters are different.
For example, azzzzzzzz is smaller than baa, because the character a index 0 is different, 'a' is smaller than 'b', the rest of the characters don't matter.
For another examle, aabzzzzz is smaller than aacaa, because the characters at index 0 and 1 are the same, they differ at index 2, and 'b' is smaller than 'c', the rest of the characters don't matter. 

If we scan the character from left to right and decide which one to save and which one to remove. 
For example:
bcabc

we read in b, it ok to keep it, we read in c, bc also makes sense. Now we read in bca, the string doesn't make sense anymore. Keep bc will increase the  lexicographical order, since bca will be larger than a, we should remove bc. Remove them is ok, we won't violate the criteria that each character has to be in the final string, there are another b and c later. Finally we add the rest of bc, abc is the final result, we satisfied the criteria that the lexicographical order is smallest, and we have all the characters.

For another example:
mnzbamnb

we read in mnz, the string makes sense, so we keep them. Now we read in b, the string mnzb doesn't make sense anymore, mnzb is lexicographical larger than b, we should remove mnz. Removing mn is ok, since there are other m and n later. However, remove z is not ok, since z is unique. If we have to keep z, we must keep mn as well because mnz is lexicographical smaller than z. As a result, the unique character z saved all the characters from removing by later characters. We have mnz in the final string, the problem changes to decide what characters in bamnb will be in the final string. mnzba will reduce to mnza, mnzam will be reduced to mnza, because adding a duplicate m increase lexicographical order. mnzan will be reduced to mnza, then we add the b, which is the last appearance of b, therefore unique when we meet it. So the final string is mnzab.


With some observations, we find out that only two kinds of characters left in the final string.
  1. global minimal character. Any character comes before this character will increase the lexicographical order, therefore, should be removed. Unless it is unique.
  2. unique character. Unique character can not be removed no matter how big it is. As a result, a unique character shields all characters before it from getting removed from the final string. The string before it becomes permanent, we only have to consider the characters after it. A duplicate character becomes unique when we reach the last one of its appearance.
With a stack, we can simulate the above process. From left to right of the string, we push the characters into the stack, before a character enter the stack, it needs to pop out the characters larger than it until the stack is empty or a unique character is met. The character then only enters the stack if it is a unique character so far. Finally, we we reach the end of the string, whatever left in the stack is the final result.

import java.util.*;
public class LexicoGraphicalOrder {
  public static String dedupe(String str) {
    Set<Character> onStack = new HashSet<>();
    Map<Character, Integer> lastIndex = new HashMap<>();
    Stack<Character> stack = new Stack<>();
    int N = str.length(); //8
    //cbacdcbc
    //       .
    //stack  acdb
    //onStack acdb
    for(int i = 0; i < N; i++) {  //c 7, b 6, a 2, d 4
      lastIndex.put(str.charAt(i), i);
    }
    for(int i = 0; i < N; i++) {//7
      char currChar = str.charAt(i);  //c
      if(!onStack.contains(currChar)) {
        while(!stack.isEmpty() && stack.peek() - currChar > 0 && lastIndex.get(stack.peek()) > i) {
          Character lastSeen = stack.peek();  //b
          if(lastSeen - currChar > 0 && lastIndex.get(lastSeen) > i) {
            stack.pop();
            onStack.remove(lastSeen);
          }
        }
        stack.push(currChar);
        onStack.add(currChar);
      }
    }
    StringBuilder sb = new StringBuilder();
    for(char c : stack) {
      sb.append(c);
    }
    return sb.toString();
  }

  public static void main(String...args) {
    System.out.println(dedupe("cbacdcbc"));
    System.out.println(dedupe("mnzamn"));
    System.out.println(dedupe("mnzmnedcbamnnnnnmmmm"));
  }
}

The time complexity is O(N). It is not O(N^2) because no matter how large the string is, the queue size won't be larger than 'z'-'a' which is a constant. For the same reason, the space complexity is also O(N).