RegularExpressionMatching [source code]

public class RegularExpressionMatching {
static
/******************************************************************************/
class Solution {
    public boolean isMatch(String s, String p) {
        char[] chs = s.toCharArray (), chp = p.toCharArray ();
        boolean[][] dp = new boolean[chs.length + 1][chp.length + 1];
        dp[0][0] = true;
        for (int j = 1; j < dp[0].length; j++) {
            dp[0][j] = j > 1 && chp[j - 1] == '*' && dp[0][j - 2];          // clever use of short-circuit
        }
        for (int i = 1; i < dp.length; i++) {
            for (int j = 1; j < dp[0].length; j++) {
                if (chp[j - 1] == '*')
                    dp[i][j] = dp[i][j - 2] || (chs[i - 1] == chp[j - 2] || chp[j - 2] == '.') && dp[i - 1][j];
                else
                    dp[i][j] = (chs[i - 1] == chp[j - 1] || chp[j - 1] == '.') && dp[i - 1][j - 1];
            }
        }
        return dp[chs.length][chp.length];
    }
}
/******************************************************************************/

    public static void main(String[] args) {
        RegularExpressionMatching.Solution tester = new RegularExpressionMatching.Solution();
        String[] inputs = {
            "aa", "a*", "true",
            "aaa", "aaaa", "false",
            "", ".", "false",
        };
        for (int i = 0; i < inputs.length / 3; i++) {
            String s = inputs[3 * i], p = inputs[3 * i + 1];
            boolean ans = inputs[3 * i + 2].equals ("true");
            System.out.println (Printer.separator ());
            boolean output = tester.isMatch (s, p);
            System.out.printf ("%s and %s -> %s, expected: %b\n", 
                s, p, Printer.wrapColor (output + "", output == ans ? "green" : "red"), ans
            );
        }
    }
}

题目意思倒是看的清楚, 但是没有什么特别好的思路;

笔算几个case, 想想人逻辑, 看看能不能在笨办法当中找到提示;

想满时间, 想不出来;

最后是参考discussion最优解写出来的一个算法; 注意, 以后复习的时候, 除了这个解法(从左到右), 还要记得awice写的这个peel head的也就是从右到左的; 其实两者的思路都是类似的, 但是从右到左的思路的index相对简单一些; 另外, 简单的recursion做法(没有memo的)也要熟悉, 不可能你面试的时候上来就知道一个Bottom-Up的DP, 鬼才信; 最后的速度是34ms (39%).

注意这里的第三个case, 我一开始认为这个应该是true, 还特意加了base case来把这个当做special case处理, 但是后面提交了才知道这个应该是FALSE; 想想其实是合理的;

在理解这个recursive逻辑的时候, 一个关键问题是ask yourself: what does this portion of the PATTERN correponds to in the STRING?

editorial

Approach #1: Recursion [Accepted]

Intuition

If there were no Kleene stars (the * wildcard character for regular expressions), the problem would be easier - we simply check from left to right if each character of the text matches the pattern.

When a star is present, we may need to check many different suffixes of the text and see if they match the rest of the pattern. A recursive solution is a straightforward way to represent this relationship.

Algorithm

Without a Kleene star, our solution would look like this:

def match(text, pattern):  
    if not pattern: return not text  
    first_match = bool(text) and pattern[0] in {text[0], '.'}  
    return first_match and match(text[1:], pattern[1:])

If a star is present in the pattern, it will be in the second position
pattern[1]. Then, we may ignore this part of the pattern, or delete a matching character in the text. If we have a match on the remaining strings after any of these operations, then the initial inputs matched.

class Solution {  
    public boolean isMatch(String text, String pattern) {  
        if (pattern.isEmpty()) return text.isEmpty();  
        boolean first_match = (!text.isEmpty() &&   
                               (pattern.charAt(0) == text.charAt(0) || pattern.charAt(0) == '.'));  

        if (pattern.length() >= 2 && pattern.charAt(1) == '*'){  
            return (isMatch(text, pattern.substring(2)) ||   
                    (first_match && isMatch(text.substring(1), pattern)));  
        } else {  
            return first_match && isMatch(text.substring(1), pattern.substring(1));  
        }  
    }  
}

只看这个代码的话, 确实逻辑非常简练, 他就是在每一个位置提炼出来了几个decision; 这道题之所以能够成为hard, 是因为在一个node的decision情况很多很复杂;

我自己写的时候, 其实也意识到了这个题目应该围绕*来写, 我深知考虑把所有的*的位置先标出来; 但是并没有想到具体的好的实现方法; 他这里的这个做法, 虽然没有明确的去点名*, 但是这种peel head的recursion, 实际上每一个node考虑的就是一个类似a*这样的两人组;

如果有这个两人组, 那么我们可以不匹配这个两人组, 或者我们匹配这个两人组; 如果没有这个两人组, 我们直接就正常让text和pattern一人出一个字母就行了; 逻辑结构其实非常简练;

这里或许可以总结出来一个thinking trigger? 我自己在思考的时候, 就想过标记两人组的位置, 但是后来认为不太可能, 因为这样得到的是一个连长度都未知的list; 这个很不好用; 或者说, 这样一个uncertainty太多的结果, 会让最后的结果穷搜的成分太高, 是我不希望的;

但是我们这里最后转换为用recursion来解决这个问题; 这样的好处就是we don't ask for the pair's position until we encounter them.

这个算法总体来说逻辑是非常的干净的, 不过复杂度好像并不是很优秀; 具体分析在文章里面有, 他给的结果是一个2的指数级别, 但是我个人认为可能是一个3的指数级别, 因为每一个node有3个recursive call;

Approach #2: Dynamic Programming [Accepted]

Intuition

As the problem has an optimal substructure, it is natural to cache intermediate results. We ask the question dp(i, j): does text[i:] and pattern[j:] match? We can describe our answer in terms of answers to questions involving smaller strings.

Algorithm

We proceed with the same recursion as in Approach #1, except because calls will only ever be made to match(text[i:], pattern[j:]), we use
dp(i, j) to handle those calls instead, saving us expensive string-building operations and allowing us to cache the intermediate results.

Top-Down Variation :

enum Result {  
    TRUE, FALSE  
}  

class Solution {  
    Result[][] memo;  

    public boolean isMatch(String text, String pattern) {  
        memo = new Result[text.length() + 1][pattern.length() + 1];  
        return dp(0, 0, text, pattern);  
    }  

    public boolean dp(int i, int j, String text, String pattern) {  
        if (memo[i][j] != null) {  
            return memo[i][j] == Result.TRUE;  
        }  
        boolean ans;  
        if (j == pattern.length()){  
            ans = i == text.length();  
        } else{  
            boolean first_match = (i < text.length() &&   
                                   (pattern.charAt(j) == text.charAt(i) ||  
                                    pattern.charAt(j) == '.'));  

            if (j + 1 < pattern.length() && pattern.charAt(j+1) == '*'){  
                ans = (dp(i, j+2, text, pattern) ||   
                       first_match && dp(i+1, j, text, pattern));  
            } else {  
                ans = first_match && dp(i+1, j+1, text, pattern);  
            }  
        }  
        memo[i][j] = ans ? Result.TRUE : Result.FALSE;  
        return ans;  
    }  
}

Bottom-Up Variation

class Solution {  
    public boolean isMatch(String text, String pattern) {  
        boolean[][] dp = new boolean[text.length() + 1][pattern.length() + 1];  
        dp[text.length()][pattern.length()] = true;  

        for (int i = text.length(); i >= 0; i--){  
            for (int j = pattern.length() - 1; j >= 0; j--){  
                boolean first_match = (i < text.length() &&   
                                       (pattern.charAt(j) == text.charAt(i) ||  
                                        pattern.charAt(j) == '.'));  
                if (j + 1 < pattern.length() && pattern.charAt(j+1) == '*'){  
                    dp[i][j] = dp[i][j+2] || first_match && dp[i+1][j];  
                } else {  
                    dp[i][j] = first_match && dp[i+1][j+1];  
                }  
            }  
        }  
        return dp[0][0];  
    }  
}

注意他这里的Bottom-Up的版本是用倒序来填表的, 是因为他之前recursion的顺序实际上是一个smaller index depends on larger index的过程; 所以从大的index开始填, 然后到小的index的顺序, 是合理的;

discussion最优解:

@xiaohui7 said in My concise recursive and DP solutions with full explanation in C++:

Please refer to [my blog post][1] if you have any comment. Wildcard matching problem can be solved similarly.

    class Solution {  
    public:  
        bool isMatch(string s, string p) {  
            if (p.empty())    return s.empty();  

            if ('*' == p[1])  
                // x* matches empty string or at least one character: x* -> xx*  
                // *s is to ensure s is non-empty  
                return (isMatch(s, p.substr(2)) || !s.empty() && (s[0] == p[0] || '.' == p[0]) && isMatch(s.substr(1), p));  
            else  
                return !s.empty() && (s[0] == p[0] || '.' == p[0]) && isMatch(s.substr(1), p.substr(1));  
        }  
    };  

    class Solution {  
    public:  
        bool isMatch(string s, string p) {  
             // f[i][j]: if s[0..i-1] matches p[0..j-1]  
             // if p[j - 1] != '*'  
             //      f[i][j] = f[i - 1][j - 1] && s[i - 1] == p[j - 1]  
             // if p[j - 1] == '*', denote p[j - 2] with x  
             //      f[i][j] is true iff any of the following is true  
             //      1) "x*" repeats 0 time and matches empty: f[i][j - 2]  
             //      2) "x*" repeats >= 1 times and matches "x*x": s[i - 1] == x && f[i - 1][j]  
             // '.' matches any single character  
            int m = s.size(), n = p.size();  
            vector<vector<bool>> f(m + 1, vector<bool>(n + 1, false));  

            f[0][0] = true;  
            for (int i = 1; i <= m; i++)  
                f[i][0] = false;  
            // p[0.., j - 3, j - 2, j - 1] matches empty iff p[j - 1] is '*' and p[0..j - 3] matches empty  
            for (int j = 1; j <= n; j++)  
                f[0][j] = j > 1 && '*' == p[j - 1] && f[0][j - 2];  

            for (int i = 1; i <= m; i++)  
                for (int j = 1; j <= n; j++)  
                    if (p[j - 1] != '*')  
                        f[i][j] = f[i - 1][j - 1] && (s[i - 1] == p[j - 1] || '.' == p[j - 1]);  
                    else  
                        // p[0] cannot be '*' so no need to check "j > 1" here  
                        f[i][j] = f[i][j - 2] || (s[i - 1] == p[j - 2] || '.' == p[j - 2]) && f[i - 1][j];  

            return f[m][n];  
        }  
    };

The most critical observation is that "x" can either match empty string, or at least one x. In the latter case, it is equivalent to "xx" or "x*x", which are key in the two solutions above. This is the simplest way for me to understand, in contrast with existing recursive or Dynamic Programming solutions, which I never fully comprehend.

[1]: http://xiaohuiliucuriosity.blogspot.com/2014/12/regular-expression-matching.html

2014年的老解法了, 但是思路很清晰, 比如他第一个解法的代码, 我认为一定程度上甚至写的比awice的editorial1还要简练一些;

这题如果真的要总结什么东西的话, 首先, 什么是matching problem?

严格来说这种题目我以前都没有见过; 这里可以看到, 大部分的解法实际上都是用recursion来解决的, 也就是一种类似于DFS的搜索的方法去解决; 我自己在想的时候, 其实一直在思考用纯粹的iteration(不是Bottom-Up DP)来解决, 因为NLP学regex的时候, 就告诉你各种FSA的东西, 然后看起来好像regex matching完全就是应该是一个线性走一遍就能解决的iterative算法; 但是实际上并不是这样; 虽然你的trace看起来是一个很iterative的感觉, 并不代表最后的代码就一定应该用iterative的方式去写; recursion因为其独有的自动性, 在很多时候写代码就是比iteration方便;

当然, 一个比较牵强的讲法是, 之前学426的时候, OCaml写了那么多pattern matching, 按说应该有不少印象了? 自然而然是用peel head recursion来做啊; 当然了, 这个也是开玩笑的讲法, 拿这样的想法来作为一个trigger还是有点太粗暴了;

另外, 格外瞩目的是他下面这个Bottom-Up写法, 写的真的是非常的好; awice最后的Bottom-Up用的是一个倒序的写法, 是因为如果我们仅仅是转换Top-Down recursion得到的一个Bottom-Up, 那么模仿recursion的顺序是自然的, 尤其是观察recursion过程当中体现出来的optimal structure;

但是这个作者选择了正序, 最后导致你看这个Bottom-Up版本的时候, 甚至感觉是在看一个全新的解法, 因为有很多多原来的逻辑的全新解读;

看这个算法的时候也是不知不觉的想到了ED算法; 事实上, 他这里的这个写法, 跟ED的写法还真有不少类似的地方; 比如他表格的index, 就是用的类似NLP上面讲过的, 是dot的index, 虽然看起来有点像一个length量;

另外, 他还是保留了他Top-Down的清晰的逻辑结构: recursive build的时候, 还是根据是否有*来分成两个branch;

另外, 这里也看出来DP填表的时候的初始化的重要性;

首先是他把第一row和col都单独的初始化. 不要怕麻烦, 多点base case会让你的算法更安全; 事实上, 这里first column的初始化过程一点都不trivial;
注意他给[0][0]位置的初始化, 必须是true; 这个很关键, 如果搞错了, first col的初始化就会出错, 我试了的;

下面的一个java改写:

public boolean isMatch(String s, String p) {  
    //     'match' below including .  
    // f(i,j) means s where s.len=i matches p where p.len=j  
    // f(i,j) =  
    //     if (p_j-1 != * ) f(i-1, j-1) and s_i-1 matches p_j-1  
    //     if (p_j-1 == * )  
    //         * matches zero times: f(i,j-2)  
    //         or * matches at least one time: f(i-1,j) and s_i-1 matches p_j-2  

    if (!p.isEmpty() && p.charAt(0) == '*') {  
        return false;   // invalid p  
    }  

    boolean[][] f = new boolean[s.length() + 1][p.length() + 1];  

    // initialize f(0,0)  
    f[0][0] = true;  

    // f(k,0) and f(0,2k-1) where k>=1 are false by default  

    // initialize f(0,2k) where p_2k-1 = * for any k>=1  
    for (int j = 1; j < p.length(); j+=2) {  
        if (p.charAt(j) == '*') {  
            f[0][j+1] = f[0][j-1];  
        }  
    }  

    for (int i = 1; i <= s.length(); i++) {  
        for (int j = 1; j <= p.length(); j++) {  
            if (p.charAt(j - 1) != '*') {  
                f[i][j] = f[i - 1][j - 1] && isCharMatch(s.charAt(i - 1), p.charAt(j - 1));  
            } else {  
                f[i][j] = f[i][j - 2] || f[i - 1][j] && isCharMatch(s.charAt(i - 1), p.charAt(j - 2));  
            }  
        }  
    }  

    return f[s.length()][p.length()];  
}  

// no * in p  
private boolean isCharMatch(char s, char p) {  
    return p == '.' || s == p;  
}

总体差别不大, 不过他把first row的初始化部分改写了一下; 最后的结果是对的, 但是我不觉得他这个改写多有必要: 本身就是Bottom-Up填表, OP原来的写法完全没有额外的cost;

而且这里这个改写之后的初始化的方法, 感觉背后并没有之前那种很明确的逻辑关系了; 当然你可以说总结了规律, 不过之前的那个初始化方式的得来过程其实就已经够复杂了; 在真正面试的时候, 光是初始化一下base case就搞得这么复杂, 是会把自己的心态搞崩的;

上面那个OP的帖子里还提到了一篇文章: https://articles.leetcode.com/regular-expression-matching/

This looks just like a straight forward string matching, isn’t it? Couldn’t we just match the pattern and the input string character by character? The question is, how to match a ‘*’?

A natural way is to use a greedy approach; that is, we attempt to match the previous character as many as we can. Does this work? Let us look at some examples.

s = “abbbc”, p = “abc”
Assume we have matched the first ‘a’ on both s and p. When we see “b” in p, we skip all b’s in s. Since the last ‘c’ matches on both side, they both match.

s = “ac”, p = “abc”
After the first ‘a’, we see that there is no b’s to skip for “b”. We match the last ‘c’ on both side and conclude that they both match.

It seems that being greedy is good. But how about this case?

s = “abbc”, p = “abbbc”
When we see “b” in p, we would have skip all b’s in s. They both should match, but we have no more b’s to match. Therefore, the greedy approach fails in the above case.

One might be tempted to think of a quick workaround. How about counting the number of consecutive b’s in s? If it is smaller or equal to the number of consecutive b’s after “b*” in p, we conclude they both match and continue from there. For the opposite, we conclude there is not a match.

This seem to solve the above problem, but how about this case:
s = “abcbcd”, p = “a.c.d”

Here, “.*” in p means repeat ‘.’ 0 or more times. Since ‘.’ can match any character, it is not clear how many times ‘.’ should be repeated. Should the ‘c’ in p matches the first or second ‘c’ in s? Unfortunately, there is no way to tell without using some kind of exhaustive search.

We need some kind of backtracking mechanism such that when a matching fails, we return to the last successful matching state and attempt to match more characters in s with ‘*’. This approach leads naturally to recursion.

The recursion mainly breaks down elegantly to the following two cases:

If the next character of p is NOT ‘*’, then it must match the current character of s. Continue pattern matching with the next character of both s and p.

If the next character of p is ‘*’, then we do a brute force exhaustive matching of 0, 1, or more repeats of current character of p… Until we could not match any more characters.

bool isMatch(const char *s, const char *p) {  
  assert(s && p);  
  if (*p == '\0') return *s == '\0';  

  // next char is not '*': must match current character  
  if (*(p+1) != '*') {  
    assert(*p != '*');  
    return ((*p == *s) || (*p == '.' && *s != '\0')) && isMatch(s+1, p+1);  
  }  
  // next char is '*'  
  while ((*p == *s) || (*p == '.' && *s != '\0')) {  
    if (isMatch(s, p+2)) return true;  
    s++;  
  }  
  return isMatch(s, p+2);  
}

大概尝试理解了一下这个算法, 感觉跟上面讲的那个算法是差不多的, 反正是依赖recursion; 稍微有点区别的是, 比如pair head对上了之后, 他这里对s的head进行一个traversal, 对所有的每一个, that is in the duplicated run, 都进行一次match;

感觉看到现在的算法都是差不多的, 也没有看到真正的基于DFA的算法; 我认为我自己这个题目完全没有想到思路的一个原因就是过于注重跟自己知道的东西(FSA)进行类比, 最后没有真正的去花时间深入了解题目本身的玄机, 也就不能想到对于.和*的个性化的处理方式;

@monkeyGoCrazy said in Easy DP Java Solution with detailed Explanation:

This Solution use 2D DP. beat 90% solutions, very simple.

Here are some conditions to figure out, then the logic can be very straightforward.

1, If p.charAt(j) == s.charAt(i) :  dp[i][j] = dp[i-1][j-1];  
2, If p.charAt(j) == '.' : dp[i][j] = dp[i-1][j-1];  
3, If p.charAt(j) == '*':   
   here are two sub conditions:  
               1   if p.charAt(j-1) != s.charAt(i) : dp[i][j] = dp[i][j-2]  //in this case, a* only counts as empty  
               2   if p.charAt(i-1) == s.charAt(i) or p.charAt(i-1) == '.':  
                              dp[i][j] = dp[i-1][j]    //in this case, a* counts as multiple a   
                           or dp[i][j] = dp[i][j-1]   // in this case, a* counts as single a  
                           or dp[i][j] = dp[i][j-2]   // in this case, a* counts as empty

Here is the solution

    public boolean isMatch(String s, String p) {  

        if (s == null || p == null) {  
            return false;  
        }  
        boolean[][] dp = new boolean[s.length()+1][p.length()+1];  
        dp[0][0] = true;  
        for (int i = 0; i < p.length(); i++) {  
            if (p.charAt(i) == '*' && dp[0][i-1]) {  
                dp[0][i+1] = true;  
            }  
        }  
        for (int i = 0 ; i < s.length(); i++) {  
            for (int j = 0; j < p.length(); j++) {  
                if (p.charAt(j) == '.') {  
                    dp[i+1][j+1] = dp[i][j];  
                }  
                if (p.charAt(j) == s.charAt(i)) {  
                    dp[i+1][j+1] = dp[i][j];  
                }  
                if (p.charAt(j) == '*') {  
                    if (p.charAt(j-1) != s.charAt(i) && p.charAt(j-1) != '.') {  
                        dp[i+1][j+1] = dp[i+1][j-1];  
                    } else {  
                        dp[i+1][j+1] = (dp[i+1][j] || dp[i][j+1] || dp[i+1][j-1]);  
                    }  
                }  
            }  
        }  
        return dp[s.length()][p.length()];  
    }

跟之前的思路还是类似的, 不过这里有一个新的思路, 就是提啊不是两个一起的判断, 而是单独只判断一个字母; 这里带来的一个问题是, 你看他, 碰到.的时候, 直接就两个都peel掉; 你可能会认为这样有问题, 但是你看他下面对*处理的时候, 回头看的时候会考虑前一个位置是否是., 所以不会出问题;

另外注意看他表格的index的定义, 不是ED那种定义方式, 而是用的好像end at index i这样的定义; 看起来有点直观, 但是最后结果就错了;

但是好像他代码写的有点毛病:

@dolphinwby said in Easy DP Java Solution with detailed Explanation:

Great explanation! But it seems that the solution code cannot be accepted. I revised your code:

public class Solution {  
    public boolean isMatch(String s, String p) {  
        if(s == null || p == null) {  
            return false;  
        }  
        boolean[][] state = new boolean[s.length() + 1][p.length() + 1];  
        state[0][0] = true;  
        // no need to initialize state[i][0] as false  
        // initialize state[0][j]  
        for (int j = 1; j < state[0].length; j++) {  
            if (p.charAt(j - 1) == '*') {  
                if (state[0][j - 1] || (j > 1 && state[0][j - 2])) {  
                    state[0][j] = true;  
                }  
            }   
        }  
        for (int i = 1; i < state.length; i++) {  
            for (int j = 1; j < state[0].length; j++) {  
                if (s.charAt(i - 1) == p.charAt(j - 1) || p.charAt(j - 1) == '.') {  
                    state[i][j] = state[i - 1][j - 1];  
                }  
                if (p.charAt(j - 1) == '*') {  
                    if (s.charAt(i - 1) != p.charAt(j - 2) && p.charAt(j - 2) != '.') {  
                        state[i][j] = state[i][j - 2];  
                    } else {  
                        state[i][j] = state[i - 1][j] || state[i][j - 1] || state[i][j - 2];  
                    }  
                }  
            }  
        }  
        return state[s.length()][p.length()];  
    }  
}

这个人的index最后还是用ED的那种定义方式;

另外原来OP好像还有一个typo:

@y91 said in Easy DP Java Solution with detailed Explanation:

Hi,
in your pseudo code, in line 6.
"p.charAt(i-1)" may give out of bound .. should it be j-1 ?

虽然有这些小毛病, 不过这个OP的解释真的是相当的清晰到位了;

一个小的讲究, 他的这个case:

or dp[i][j] = dp[i][j-1]   // in this case, a* counts as single a

其实是可以归并到上面的multiple的case里面去的; discussion最优解就是这么写的; 因为这个case跟multiple其实是一类, 这里p(或者说j)都没有动;

@localvar said in The shortest AC code.:

a shorter one (14 lines of code) with neatly format:

    class Solution {  
    public:  
        bool isMatch(const char *s, const char *p) {  
            for( char c = *p; c != 0; ++s, c = *p ) {  
              if( *(p+1) != '*' )  
                  p++;  
              else if( isMatch( s, p+2 ) )  
                  return true;  
              if( (*s==0) || ((c!='.') && (c!=*s)) )  
                  return false;  
          }  
          return *s == 0;  
        }  
    };

这个解法写的还是挺neat的; 注意, 他的body里面其实也是在判断是否有*而已: 有的话, 就给s, p+2一次几乎, 如果没有, 该干嘛干嘛. 注意, 如果没有*, 那么p是直接++的; 但是如果有*, 但是没有recurse成功, 这个时候p是保持不动的; 比如, ab, a*两个, 那么这里的isMatch (s, p + 2)就会失败; 这个时候s会++, 下一个循环判断, 就会在第二个if出现FALSE; 总体来说这个算法写的是非常巧妙的, 你可以对比笔算a, a*和ab, a*两个例子就知道了;

这个算法应该就是我一开始拿到题目想写出来的那个iterative的算法了; 不过我当时还是低估了这个题目的难度, 事实证明, 单纯的用iterative来写这题是非常难的, 他这里也还是借助于了recursion. 上面的两篇article里面, 都提到了这题因为有太多的不确定性, 所以必须依赖recursion. 你没有一个deterministic的策略, 往往写纯粹的iteration就很难;

submission最优解:25(99). 给了大量的笔记, 好东西; 下面开始:

看小本本！！！！第12页

基本相似：44. Wildcard Matching，区别; '*' 可以表示：including the empty sequence

time complexity = O(min(s.length, p.length))? 不太确定是min还是max？

思路：DP:

build 2D boolean array, that boolean[i][j] means that whether i length character from s can match j characters from p。从s里去0 ~ 第i个字符，和从p中取0 ~ j个字符，是否match

三种情况：i and j are length, NOT index
1, If p.charAt(j) == s.charAt(i) :  dp[i][j] = dp[i-1][j-1];  
2, If p.charAt(j) == '.' : dp[i][j] = dp[i-1][j-1];  
3, If p.charAt(j) == '*':   
   here are two sub conditions:  
               1   if p.charAt(j-1) != s.charAt(i) : dp[i][j] = dp[i][j-2]  //in this case, a* only counts as empty  
               2   if p.charAt(i-2) == s.charAt(i-1) or p.charAt(j-2) == '.':  
                              dp[i][j] = dp[i-1][j]    //in this case, a* counts as multiple a   
                           or dp[i][j] = dp[i][j-1]   // in this case, a* counts as single a  
                           or dp[i][j] = dp[i][j-2]   // in this case, a* counts as empty

这个好像就是discussion看到的一个解释的最清楚的解法一样的解释, 难道他就是作者?

Time Complexity: Let T, P be the lengths of the text and the pattern respectively. The work for every call to dp(i, j) for i=0, ... ,Ti=0,...,T; j=0, ... ,Pj=0,...,P is done once, and it is O(1) work. Hence, the time complexity is O(TP).

Space Complexity: The only memory we use is the O(TP) boolean entries in our cache. Hence, the space complexity is O(TP).

time = O(mn), space = O(mn), m = s.length(), n = p.length()

class Solution {  
    public boolean isMatch(String s, String p) {  
        //since * match zero or more of the preceding element, so * can't be the head of p, otherwise, index overflow  
       if (s == null || p == null) {  
            return false;       // lann: 这个base case好像不对, 根据我的test3;  
       }  

        boolean[][] dp = new boolean[s.length()+1][p.length()+1];  
        //initial 0 string s length matches 0 string p length  
        dp[0][0] = true;  //remember to initialize dp[0][0]: "" matches "" is true !!!  

        //build dp by get 0 char from s, and get i length char from p, our later results will depend on this  
         for (int i = 1; i <= p.length(); ++i) {  
            if (p.charAt(i - 1) == '*') {  //小本本里的第一种情况  
                dp[0][i] = dp[0][i - 2];   //不用check i-2 是否overflow，因为 * 不能是p的第一个字符  
            }  
        }  

        for (int si = 1; si <= s.length(); ++si) {  
            for (int pi = 1; pi <= p.length(); ++pi) {  
                if (p.charAt(pi - 1) == '.' || p.charAt(pi - 1) == s.charAt(si - 1)) {   // '.'可以match任何char  
                    dp[si][pi] = dp[si - 1][pi - 1];  //depends on previous stage  
                } else if (p.charAt(pi - 1) == '*') {  
                    //p.charAt(pi - 2)shi*之前的那个char，因为pi是长度，pi - 1是 * 的index  
                    // s = abcd|d  
                    // p = abc|d*  
                    if (p.charAt(pi - 2) == s.charAt(si - 1) || p.charAt(pi - 2) == '.') {   
                        dp[si][pi] = dp[si][pi - 2] || dp[si - 1][pi] || dp[si][pi - 1];  //两个箭头的值  
                    } else {  
                        dp[si][pi] = dp[si][pi - 2];   //如果不match，就只等于一个上面一个箭头的值：false？  
                    }  
                }  
            }  
        }  
    return dp[s.length()][p.length()];  
    }  
}

Method2: dfs, O(2^n) time, O(n) space, n is length of p (each part can be matched or not matched)

class Solution {  
    public boolean isMatch(String s, String p) {  
        if (s == null || p == null) {  
            return false;  
        }  
        if (p.length() == 0) {//if no pattern can be used to match s, check whether s is empty too  
            return s.length() == 0;  
        }  
        if (p.length() == 1) {//if p only has one char, check whether s is also one char left, and then try to match them  
            return s.length() == 1 && matchFirstChar(s, p);  
        }  
        if (p.charAt(1) != '*') {//if second char isn't '*',we have to match first char of both s&p,and try to match the rest  
            return matchFirstChar(s, p) && isMatch(s.substring(1), p.substring(1));  
        }  
        return isMatch(s, p.substring(2)) || (matchFirstChar(s, p) && isMatch(s.substring(1), p));  
    }  
    //isMatch(s,p.substring(2)):check if we can skip this 2-char pattern,or pattern is used by last 1st char of s(so skip it)  
    //if pattern shouldn't be skipped(which means it should match more char),continue to try match 1 char with this pattern  

    private boolean matchFirstChar(String s, String p) {  
        return s.length() != 0 && (s.charAt(0) == p.charAt(0) || p.charAt(0) == '.');  
    }  
}

Method3: time = O(mn), space = O(n)

思路:

这道题可以用递归解决，不过时间复杂度是指数级，这里介绍一个用动态规划在平方时间内解决的办法。

解法的核心理念是：从后往前看pattern的每一位，对于pattern的每一位，我们尽可能的把待匹配串string从后往前给匹配上。我们用一个数组match[string.length() + 1]来表示string被匹配的情况，这里如果match[j]是true，而我们pattern正指向第i位，则说明string从第j位到最后都已经被pattern第i位之前的某些部分给成功匹配了，所以我们不用再操心了。match[i]为true的条件是match[i + 1]为true，且string第i个字符也能被成功匹配。

动手画画图, 仔细理解一下他这里对于表格的定义到底是什么意思;

感觉他上面的定义有点问题; match和j的定义都是向后的, 但是实际上, 我认为i的定义其实也是向后的; 不信你可以看下面的第一个trace: p[i] == 'b'的情况. 不然根本讲不通; 或者, 你用aab, aab的例子, 更好理解; 我想到了一个更好的例子: (s, p) = (aaa, aa), 比较在第一个iteration和第二个iteration里面, 为什么match[2]从1变成了0; 注意, i的含义, 是一个从右往左累积的概念; 当p在i这个位置的时候, 你要研究的是p[i:]所有的能否match;

before i = 1:  
0 0 0 1  
 a a a  
 a a  
   ^  

after i = 1:  
0 0 1 0  
 a a a  
 a a  
   ^  

after i = 0:  
0 1 0 0  
 a a a  
 a a  
 ^

注意, 不要脑子里臆想出来一个s和p之间的alignment关系, 没有的, 不存在的, 一个人一个指针, 跑而已; 只要注意match跟s的对位关系就行了;

那我们就可以从后往前开始看pattern的每一位能匹配多少string的字符了：

如果pattern的这一位是*，那我们要用这一位，来从后往前尝试匹配string，因为string后面是已经匹配好的，前面是还没匹配好的，所以从前往后匹配星号可能会导致我们匹配了一些pattern该星号前面的星号应该匹配的部分。而从后往前匹配则不会影响pattern该星号后面星号所匹配的部分，因为已经匹配的部分我们会直接跳过。

如果pattern这一位不是*，那我们则不能匹配多个字符，我们只能匹配一个字符，这时候要对string从前往后匹配，因为如果后面没被匹配，前面也肯定不会被匹配，所以从前向后能保证我们把pattern的这一位匹配到string当前最后面那个还没匹配的字符。这样如果那个字符能被匹配就通过了。

我们举个例子
match:  0 0 0 1  
string:  a a b  
pattern: a * b  
             |
这里我们先看pattern最后一位b能匹配到多少，这里因为b不是星号，所以我们从左往右尝试匹配string，第一个a不行，第二个a也不行，然后到b，这里因为match[3]是true，b也和b相同，所以匹配成功。
match:  0 0 1 1  
string:  a a b  
pattern: a * b  
           |
然后看pattern的这个星号，我们要从后往前匹配string。因为b已经被匹配了，match[2]是true，所以直接跳过。然后到a，发现个pattern中星号前面的字符a相同，所以匹配成功，match[1]也置为true再看string的第一个a，还是可以匹配成功，这样整个string都被匹配成功了。

你看他后面对应这一行的代码你就知道了, 他这里用了一个小技巧, 这里的这个直接跳过对应的是一个用short-circuit or完成的操作;

你可以看到人家这里用来develop用的这个例子, 是非常简单的; 这个也是我自己以前一直讲的一个观点, 在你还在刚开始研究一个算法的时候, 就从最简单最typical的例子开始, 不要老是想着最边缘的例子;

这里还有几个情况，首先，无论刚才那pattern中最后一个b有没有匹配到string中任何一个字符，match[3]也要置为false。这样才能防止pattern最后字母没有匹配上，而pattern前面的部分反而把string的结尾给匹配了。还有如果pattern中是句号的话，那相当于字符相同。

上面的部分都是他的笔记, 下面的是实际上爆机的代码:

class Solution {  
    public boolean isMatch(String s, String p) {  
        boolean[] match = new boolean[s.length() + 1];  
        match[s.length()] = true;  
        for(int i = p.length() - 1; i >=0; i--){  
            if(p.charAt(i) == '*'){  
                // 如果是星号，从后往前匹配  
                for(int j = s.length() - 1; j >= 0; j--){  
                    match[j] = match[j] || (match[j + 1] && (p.charAt(i - 1) == '.' || (p.charAt(i - 1) == s.charAt(j))));   
                }  
                // 记得把i多减一，因为星号是和其前面的字符配合使用的. lann: 配合上面i的header里面的--, 这里就会把*前面的这个字母也给直接跳过了;   
                i--;  
            } else {  
                // 如果不是星号，从前往后匹配  
                for(int j = 0; j < s.length(); j++){  
                    match[j] = match[j + 1] && (p.charAt(i) == '.' || (p.charAt(i) == s.charAt(j)));  
                }  
                // 只要试过了pattern中最后一个字符，就要把match[s.length()]置为false  
                match[s.length()] = false;  
            }  
        }  
        return match[0];  
    }  
}

这个人自己的分析是有点问题的, 看我上面的讨论; 首先, 在是*的情况下, 还是好分析的, 直接就跟awice的写法一样, 无非是重现peel head recursion的逻辑而已; again, 注意他这里这个shorcircuit的应用: 如果short circuit触发, 其实就相当于是pattern的head(记得吗, 我们说了, i的定义也是向后, 对应p[i:]), 比如说是a*整个被无视了, corresponds to empty string in s. 而如果short circuit没有触发, 需要recurse的时候, 其实就是对应我自己的代码里面查询[i - 1][j]的步骤, 对应的是这个head corresponds to multiple a.

对了, 当我说i是向后定义的时候, 假如[i]是*, 那么是要包含*前面的这个位置的字母的;

比较难理解的是, 当不是*的情况下, 他的这个处理; 我尝试把这里的改成倒序, 但是整个就不能AC了; 自己又重新模拟了一下aaa, aa的trace, 发现确实不能随便乱改, 这里这个从左到右的顺序很关键; 首先, DP算法填表的时候, 一定要思考一个问题, 就是依赖顺序, 这里你看到, 是[j]依赖[j+1], 一个向右的依赖顺序; 如果我们在这里iterate j的时候, 从右向左, 就有一个问题, 当你到[j]的时候, 你所依赖的[j+1]刚刚已经被更新过了! 当然你可以用这个来理解; 但是感觉这个还是没有解释深层的逻辑;

首先一点澄清一下, 假如[i]不是, 那么[i+1]肯定也不是, 具体看上面那个branch: 保证了*前面的字母会被跳过;

然后记住我们的s[j] and p[i]其实现在都是各自的head; 所以其实就是都去头, 然后分别进行向后的查询. 那么一个关键的问题来了, 为什么这里不能用倒序? awice的就用了啊; 我后来想通了. 首先, 这里这个算法本身其实最开始的动机就是空间优化; 但是空间优化之后带来一个问题, 更新顺序问题出现了; 因为这里有一个向右依赖, 所以我们要保证从左向右的更新; awice的做法之所以不需要, 是因为所有的row之间是独立的! 所以就算顺序有点不合理, 不会出现depending on a value that has already being changed.

最后一点总结, 研究算法的时候, 首先还是确定it works, 然后有时候感觉还是自己研究trace比依赖别人的笔记更有效; 这个作者这个代码明显是看来的, 解释的很多地方一窍不通.

当然, 如果是discussion上面那种, 大部分都是代码作者自己写的解释, 是可以看一下的; 往往能够让你看代码最看不懂的时候, 醍醐灌顶;

Problem Description

Implement regular expression matching with support for '.' and '*'.

'.' Matches any single character.  
'*' Matches zero or more of the preceding element.  

The matching should cover the entire input string (not partial).  

The function prototype should be:  
bool isMatch(const char *s, const char *p)  

Some examples:  
isMatch("aa","a") → false  
isMatch("aa","aa") → true  
isMatch("aaa","aa") → false  
isMatch("aa", "a*") → true  
isMatch("aa", ".*") → true  
isMatch("ab", ".*") → true  
isMatch("aab", "c*a*b") → true

Difficulty:Hard
Total Accepted:179.6K
Total Submissions:738K
Contributor:LeetCode
Companies
googlefacebookubertwitterairbnb
Related Topics
stringdynamic programmingbacktracking
Similar Questions
Wildcard Matching

LC10 RegularExpressionMatching

RegularExpressionMatching [source code]

editorial

Approach #2: Dynamic Programming [Accepted]

Problem Description

results matching ""

No results matching ""