Site Search:

Regular Expression Matching

Problem:

A regular expression can be empty or contains '.', '*' and charecters a-z, write a function to tell if the regular expression can match the entire input string. The input string can be emtpy, or contains only charecters a-z.

'.' can match any single character.
'*' can match zero or more of the preceding element.

You can assume the regular expression is a valid one.

For example:
Given regular expression "ab." and input string "abc", the result is true.
Given regular expression "ab*" and input string "abbbbb", the result is true.
Given regular expression "ab*" and input string "abc", the result is false.
Given regular expression "ab*x.*" and input string "abxyzz", the result is true.

Solutions:

This is a hard problem, because of the *. However we can start with simple case first. If the regex don't have *, the problem can be solved with a simple recursive function: For regex string p and input string s, if p.charAt(0) matches s.charAt(0), the problem reduces to match p.substring(1) to s.substring(1).

boolean backtrace(String p, String s) {
  ...
  if(s.charAt(0) == p.charAt(0) || p.charAt(0) == '.')
    return backtrace(p.substring(1), s.substring(1));
  ...
}

Now the * makes the condition harder, so we have to modify the simple recursive function.
We need to know what is the character before *. 
One approach is to pass the previous character in the recursive call. 
boolean backtrace(char pre, String p, String s) {...}
then look back one character when meet *.

Another approach is to look forward 2 characters for potential .* or a-z* match.

No matter what approach, dealing with * has two conditions:
  1. * matches nothing, skip *, problem reduces to backtrack(p.substring(1), s)
  2. * matches something, problem reduces to backtrack(p, s.substring(1))
The following code gives 2 methods using the look back and look forward approaches for *.

class Solution {
  public static boolean canMatch(char pre, String p, String s) {
    //System.out.println("pre=" + pre + ", s="+s+", p="+p);
    if(s.isEmpty()) {
      if(p.isEmpty() || p.equals("*") || p.equals(".*"))
        return true;
      else return false;
    }
    if(p.isEmpty())
      return false;
    if(p.charAt(0) == s.charAt(0) || p.charAt(0) == '.') {
      return canMatch(p.charAt(0), p.substring(1), s.substring(1));
    } else {
      //* matches nothing
      if(canMatch(pre, p.substring(1), s))
        return true;
      //* matches something
      if(pre == '.') {
        return canMatch(pre, p, s.substring(1));
      } else {
        if(s.charAt(0) != pre) return false;
        return canMatch(pre, p, s.substring(1));
      }
    }
  }
 
  public static boolean canMatch(String p, String s) {
    //System.out.println("s="+s+", p="+p);
    if(p.isEmpty()) return s.isEmpty();
    if(s.isEmpty()) {
      if(p.equals("*") || p.equals(".*"))
        return true;
      else
        return false;
    }
    boolean match = false;
    if(p.charAt(0) == '.' || p.charAt(0) == s.charAt(0)) {
      match = true;
    }
    if(p.length() > 1 && p.charAt(1) == '*') {
      if(match) {
        //* matches nothing
        if(canMatch(p.substring(2), s.substring(1)))
          return true;
        return canMatch(p, s.substring(1));
      }
      return false;
    } else {
      return match && canMatch(p.substring(1), s.substring(1));
    }
  }
 
  public static void main(String[] args) {
    System.out.println(canMatch('$', "ab.", "abc"));
    System.out.println(canMatch('$', "ab*", "abbbbb"));
    System.out.println(canMatch('$', "ab*", "abc"));
    System.out.println(canMatch('$', "ab*x.*", "abxyzz"));
    System.out.println(canMatch('$', "", ""));
    System.out.println(canMatch('$', ".*", ""));
    System.out.println(canMatch('$', ".*", "x"));
    System.out.println(canMatch('$', "b*", "x"));
    System.out.println();
    System.out.println(canMatch("ab.", "abc"));
    System.out.println(canMatch("ab*", "abbbbb"));
    System.out.println(canMatch("ab*", "abc"));
    System.out.println(canMatch("ab*x.*", "abxyzz"));
    System.out.println(canMatch("", ""));
    System.out.println(canMatch(".*", ""));
    System.out.println(canMatch(".*", "x"));
    System.out.println(canMatch("b*", "x"));
  }
}

The time complexity and space complexity calculation is even harder, it is O((T+P)2
. ,where T is the length of the input string and P is the length of regular expression. Prove it need a paper, lets don't worry about it.