WordBreakII [source code]


public class WordBreakII {
static
/******************************************************************************/
class Solution {
    Map<Integer, List<String>> memo;

    public List<String> wordBreak(String s, List<String> wordDict) {
        memo = new HashMap<> ();
        return helper (s, new HashSet<> (wordDict), 0);
    }

    List<String> helper (String s, Set<String> dict, int start) {
        if (memo.containsKey (start))
            return memo.get (start);
        List<String> res = new LinkedList<> ();
        if (start == s.length ()) {
            res.add ("");
            return res;
        }
        for (int i = start + 1; i <= s.length (); i++) {
            String cur_word = s.substring (start, i);
            if (dict.contains (cur_word)) {
                for (String path : helper (s, dict, i)) {
                    res.add (cur_word + (path.length () > 0 ? " " : "") + path);
                }
            }
        }
        memo.put (start, res);
        return res;
    }
}
/******************************************************************************/

    public static void main(String[] args) {
        WordBreakII.Solution tester = new WordBreakII.Solution();
        String[][] inputs = {
            {"catsanddog"}, {"cat", "cats", "and", "sand", "dog"}, {"cats and dog", "cat sand dog"},
            {"leetcode"}, {"leet", "code", "lee", "t"}, {"leet code", "lee t code"},
            {"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"}, {"a","aa","aaa","aaaa","aaaaa","aaaaaa","aaaaaaa","aaaaaaaa","aaaaaaaaa","aaaaaaaaaa"}, {},
        };
        for (int i = 0; i < inputs.length / 3; i++) {
            System.out.println (Printer.separator ());
            String s = inputs[3 * i][0];
            List<String> wordDict = Arrays.asList (inputs[3 * i + 1]);
            Set<String> output = new HashSet<> (tester.wordBreak (s, wordDict)), ans = new HashSet<> (Arrays.asList (inputs[3 * i + 2]));
            System.out.printf ("%s\n%s\n%s, expected: %s\n", 
                s, wordDict, Printer.wrap (output.toString (), output.equals (ans) ? 92 : 91), ans
            );
        }
    }

    static String printPaths (Map<Integer, List<List<String>>> paths, int N) {
        StringBuilder sb = new StringBuilder ();
        for (int i = 0; i <= N; i++) {
            if (paths.containsKey (i) && paths.get (i).size () > 0) {
                sb.append (String.format ("%d=%s", i, paths.get (i)));
                if (i < N)
                    sb.append ("\n");
            }
        }
        return sb.toString ();
    }
}

跟第一题还是非常相似的, 为了挑战一下, 这里我们用trie+DP来尝试一下;

因为最后要返回word, 所以我希望这题干脆直接在node里面存string, 而不是boolean;

想了一下, 感觉这题好像DP并不是特别好, 因为空间开销特别大: 这题而且是那种不能降维优化的DP: 你站在i的时候, 后面的所有的j的位置的结果你可能都需要用到;

而每一个位置存储的结果应该是类似一个list of strings这样的东西, 而每次前面的i比如匹配到一个后面的j, 中间的path i->j也成功的话, 那么i的结果就要把所有的j的结果都append一个word进去; 感觉开销相当的大;

后来想了一下, 这里还是犯了一个用string来直接存取substring的方法, 这题实际上也是可以直接用int index来操作的; 估计这个就是这题之所以是hard的难点所在;

但是这个思路有一个问题啊, 写一半的:

    public List<String> wordBreak(String s, List<String> wordDict) {  
        TrieNode root = new TrieNode ();  
        for (String word : wordDict)  
            root.insert (word);  
        int N = s.length ();  
        int[][] next = new int[N + 1][N + 1];  
        int[] next_pos = new int[N + 1];  
        next[N][next_pos[N]++] = -1;  
        for (int i = N - 1; i >= 0; i--) {  
            TrieNode cur = root;  
            for (int j = i + 1; j <= N; j++) {  
                cur = cur.links[s.charAt (j - 1)];  
                if (cur != null) {  
                    if (cur.word.length () > 0 && next_pos[j] > 0) {  
                        next[i][next_pos[i]++] = j;  
                    }  
                } else {  
                    break;  
                }  
            }  
        }  
        List<String> res = new ArrayList<> ();  
        for (int j = 0; j < next_pos[0] - 1; j++) {  

        }  
    }

就算你最后建立了一个这样的跳转的结果, 你最后还是要专门走一个类似DFS的结果来collect所有的path, 感觉就是浪费了: 这个collect的过程免不了要使用substring, 这个东西本身就是非常贵的;

然后改成了这个代码:

    public List<String> wordBreak(String s, List<String> wordDict) {  
        TrieNode root = new TrieNode ();  
        for (String word : wordDict)  
            root.insert (word);  
        int N = s.length ();  
        List<List<String>> paths = new ArrayList<> ();  
        for (int i = 0; i <= N; i++)  
            paths.add (new LinkedList<> ());  
        paths.get (N).add ("");  
        for (int i = N - 1; i >= 0; i--) {  
            TrieNode cur = root;  
            for (int j = i + 1; j <= N; j++) {  
                cur = cur.links[s.charAt (j - 1)];  
                if (cur != null) {  
                    if (cur.word.length () > 0 && paths.get (j).size () > 0) {  
                        List<String> paths_i = paths.get (i);  
                        for (String path : paths.get (j))  
                            paths_i.add (cur.word + (path.length () > 0 ? " " : "") + path);  
                    }  
                } else break;  
            }  
        }  
        return paths.get (0);  
    }

但是这个被test3给break掉了, TLE, 这个时候还不想放弃, 那没办法了, 直接就想办法消除掉这个string操作? 这样就是一个三重嵌套的list了, 也是没办法了;

改成用一个list来表示path;

    public List<String> wordBreak(String s, List<String> wordDict) {  
        TrieNode root = new TrieNode ();  
        for (String word : wordDict)  
            root.insert (word.intern ());  
        int N = s.length ();  
        Map<Integer, List<List<String>>> paths = new HashMap<> ();  
        List<String> initial_path = new LinkedList<> ();  
        paths.computeIfAbsent (N, num -> new LinkedList<> ()).add (initial_path);  
        for (int i = N - 1; i >= 0; i--) {  
            TrieNode cur = root;  
            for (int j = i + 1; j <= N; j++) {  
                cur = cur.links[s.charAt (j - 1)];  
                if (cur != null) {  
                    if (cur.word.length () > 0 && paths.containsKey (j)) {  
                        List<List<String>> paths_i = paths.computeIfAbsent (i, num -> new LinkedList<> ());  
                        for (List<String> path : paths.get (j)) {  
                            List<String> new_path = new LinkedList<> (path);  
                            new_path.add (0, cur.word);  
                            paths_i.add (new_path);  
                        }  
                    }  
                } else break;  
            }  
        }  
        List<String> res = new ArrayList<> ();  
        for (List<String> path : paths.get (0))  
            res.add (String.join (" ", path));  
        return res;  
    }

还是太慢了, 实在是没有办法了;

看了editorial感觉string操作实际上不是瓶颈, 然后继续尝试修改:

    public List<String> wordBreak(String s, List<String> wordDict) {  
        TrieNode root = new TrieNode ();  
        for (String word : wordDict)  
            root.insert (word.intern ());  
        int N = s.length ();  
        LinkedList<String>[] paths = new LinkedList[N + 1];  
        for (int i = 0; i <= N; i++)  
            paths[i] = new LinkedList<> ();  
        paths[N].add ("");  
        for (int i = N - 1; i >= 0; i--) {  
            TrieNode cur = root;  
            for (int j = i + 1; j <= N; j++) {  
                cur = cur.links[s.charAt (j - 1)];  
                if (cur != null) {  
                    if (cur.word.length () > 0 && paths[j].size () > 0) {  
                        List<String> paths_i = paths[i];  
                        for (String path : paths[j])  
                            paths_i.add (cur.word + (path.length () > 0 ? " " : "") + path);  
                    }  
                } else break;  
            }  
        }  
        return paths[0];  
    }

还是不行;

所以实际上是trie出了问题? 因为我打印出来trace看的时候, 我这个从末尾开始Bottom-Up的DP, 实际上最后每一个i--对应的位置的path数量越来越多; 但是test3实际上最后因为s里面有一个b, 所以最后的结果应该是fail, 空集;

那么为什么他们的这些recursion的方法就不会出这个问题呢?

把editorial看完发现, 这题不支持Bottom-Up的做法, 就是故意用这个办法来break掉所有的Bottom-Up的做法的;

那么为什么recursion可以?

看下面的分析吧;
最后好歹也是AC了, 速度是21ms (43%);

The naive approach to solve this problem is to use recursion. For finding the solution, we check every possible prefix of that string (s) in the dictionary of words, if it is found in the dictionary (say s1), then the recursive function is called for the remaining portion of that string. This function returns the prefix s1 appended by the result of the recursive call using the remaining portion of the string (s−s1), if the remaining portion is a substring which can lead to the formation of a valid sentence as per the dictionary. Otherwise, empty list is returned.

public class Solution {  
    public List<String> wordBreak(String s, Set<String> wordDict) {  
        return word_Break(s, wordDict, 0);  
    }  
    public List<String> word_Break(String s, Set<String> wordDict, int start) {  
        LinkedList<String> res = new LinkedList<>();  
        if (start == s.length()) {  
            res.add("");  
        }  
        for (int end = start + 1; end <= s.length(); end++) {  
            if (wordDict.contains(s.substring(start, end))) {  
                List<String> list = word_Break(s, wordDict, end);  
                for (String l : list) {  
                    res.add(s.substring(start, end) + (l.equals("") ? "" : " ") + l);  
                }  
            }  
        }  
        return res;  
    }  
}

这个其实跟我的思路差不多, 但是他这个要加了memo之后才是一样的;

Approach #2 Recursion with memoization [Accepted]

Algorithm

In the previous approach we can see that many subproblems were redundant, i.e we were calling the recursive function multiple times for the same substring appearing through multiple paths. To avoid this we can use memorization method, where we are making use of a hashmap to store the results in the form of a key:value pair. In this hashmap, the key used is the starting index of the string currently considered and the value contains all the sentences which can be formed using the substring from this starting index onwards. Thus, if we encounter the same starting index from different function calls, we can return the result directly from the hashmap rather than going for redundant function calls.

With memorization many redundant subproblems are avoided and recursion tree is pruned and thus it reduces the time complexity by a large factor.

public class Solution {  

    public List<String> wordBreak(String s, Set<String> wordDict) {  
        return word_Break(s, wordDict, 0);  
    }  
    HashMap<Integer, List<String>> map = new HashMap<>();  

    public List<String> word_Break(String s, Set<String> wordDict, int start) {  
        if (map.containsKey(start)) {  
            return map.get(start);  
        }  
        LinkedList<String> res = new LinkedList<>();  
        if (start == s.length()) {  
            res.add("");  
        }  
        for (int end = start + 1; end <= s.length(); end++) {  
            if (wordDict.contains(s.substring(start, end))) {  
                List<String> list = word_Break(s, wordDict, end);  
                for (String l : list) {  
                    res.add(s.substring(start, end) + (l.equals("") ? "" : " ") + l);  
                }  
            }  
        }  
        map.put(start, res);  
        return res;  
    }  
}

这个我就不服了, 这个naive的算法居然也AC了; 所以最后这个trie做法实际上根本没有优势?

Approach #3 Using Dynamic Programming [Time Limit Exceeded]:

...

public class Solution {  
    public List<String> wordBreak(String s, Set<String> wordDict) {  
        LinkedList<String>[] dp = new LinkedList[s.length() + 1];  
        LinkedList<String> initial = new LinkedList<>();  
        initial.add("");  
        dp[0] = initial;  
        for (int i = 1; i <= s.length(); i++) {  
            LinkedList<String> list = new LinkedList<>();  
            for (int j = 0; j < i; j++) {  
                if (dp[j].size() > 0 && wordDict.contains(s.substring(j, i))) {  
                    for (String l : dp[j]) {  
                        list.add(l + (l.equals("") ? "" : " ") + s.substring(j, i));  
                    }  
                }  
            }  
            dp[i] = list;  
        }  
        return dp[s.length()];  
    }  
}

看来不是我一个人! 这个方法最后也是TLE了! 虽然认为这个的复杂度实际上跟上面的recursion是一样, 都是N^3; 这个就真的没意思了;

Now I find answers to my own question. The bottom up DP solution is good solution only when we know where to start and which branches are valuable to global result. For this problem, an memorization search would be a good answer as we would only spend time on where in need. In detail, The reason that DP method Time Limit Exceeded is that the method computed too much time in invalid branches. There are many intermediate lists which will not construct final result. At the time of computing these lists, we did not know whether it will be included in final result, as a result, my suggestion is that, if we really want to solve this problem using DP, instead of spending O(k) time, where k is the length of the intermediate list, we could have just spend o(1) time by only storing the previous one degree indexes based on which we could rebuild the result string list, instead of computing the result string list. This is quite similar to the problem Word Ladder II.

Another solution is to use BFS from Word Break I. Exact same code, just modifying a little bit.
First, get rid of visited[] array. Second, and another queue, this queue will go along with the normal index queue to store the substrings at each level. At each level, a substring is created by concatenating the found string at the current level with the string from previous level. Although TLE (have to add a word breakable check to make it acceptable), it's still a learning experience.

public List<String> wordBreak(String s, List<String> wordDict) {  
    Set<String> wordDictSet = new HashSet(wordDict);  
    Queue<Integer> queue = new LinkedList<>();          
    Queue<String> queueStr = new LinkedList<>();  
    List<String> list = new ArrayList<>();          

    queue.add(0);  
    queueStr.offer("");  

    while (!queue.isEmpty()) {  
        int size = queue.size();  
        for (int i = 0; i < size; i++) {  
            String currentStr = queueStr.poll();  
            int start = queue.poll();            
            for (int end = start + 1; end <= s.length(); end++) {  
                String subStr = s.substring(start, end);  
                if (wordDictSet.contains(subStr)) {  
                    String newStr = currentStr + subStr;  
                    if (end == s.length())  
                        list.add(newStr);  
                    else {  
                        queueStr.add(newStr + " ");  
                        queue.add(end);  
                    }  
                }  
            }  
        }  
    }                                          
    return list;              
}

我还以为这个方法BFS能过, 原来BFS也过不了;

看来这些Test Case实际上是后面故意加的:

https://leetcode.com/problems/word-break-ii/discuss/44400/Please-add-test-case-'baaaaaaaa...aaaa'-'a'-'aa'-'aaa'-...-'aaa...aa'

The topic has been well discussed in many posts such as here and here. Because the existence of test case 'aaaaa...aaaab', ['a', 'aa', 'aaa', ... 'aaa...aa'], the forward DP solution will cause MLE while the backward DP is just fine, apparently the test case 'baaaaaaaa...aaaa', ['a', 'aa', 'aaa', ..., 'aaa...aa'] should also be included.

topdown not always the same as bottomup

另外, 看到这里其实大概知道只有Top-Down recursion可以了: 你脑子里可以有这个动画: Top-Down recursion实际上还是一个path一个path的探索的, 中间用memo来加速; 而Bottom-Up的做法, 或者BFS, 这种要把所有的path同时grow的做法, 最后花在结果维护上面的时间太多了(不仅是空间占用提高, 因为是string操作, 所以最后花在维护这些空间上面的时间cost也相当不像话); 而Top-Down的recursion, 就类似backtracking一样, 一个path一个path的伸缩探索, 每一个时刻只维护一个可行的path, 这样完全避免了这个维护所有path的开销;

这个也是宝贵的一刻, 所以topdown recursion + memo, 并不总是可以用Bottom-Up的建表来替换的;

另外他们这些人故意加一个什么breakable的操作, 这个真的是没有意思的cheat OJ的做法; 我认为这题的一个核心点就是让你意识到, Top-Down和Bottom-Up做法, 有时候Top-Down可能是有独特的优势的; 这个也是这个问题的hard所在: 不然自己故意加一个规避OJ的方法, 这个题目就真的没什么好hard的了;

discussion实际上也有很多人讨论这个问题:

https://leetcode.com/problems/word-break-ii/discuss/44185/Getting-rid-of-TLE

if you are getting TLE despite the correct DP or DFS solution, it might be because the largest test input is like this

[“aa…(lots of ‘a’).b”, “a”,“aaaaa”…so on]
As you can see this test case should return empty as last character in input string is b, which is not in the dictionary. So all the work in DP/DFS is a waste

To escape from TLE, just put a check first whether the input string s is breakable or not…if breakable then try to break it using your algo

这种思路实在是太二了, 这个就是看了Test Case之后cheat OJ而已;

Actually add the checking of whether the word is breakable is not sufficient. What about replace the end character from b to a? Your algorithm still gets TLE.

Well, the number of solutions in that case would go from 0 to exponential in input length, so any algorithm computing all solutions would get TLE…

真的无聊好吧;

https://leetcode.com/problems/word-break-ii/discuss/44167/My-concise-JAVA-solution-based-on-memorized-DFS

Explanation

Using DFS directly will lead to TLE, so I just used HashMap to save the previous results to prune duplicated branches, as the following:

public List<String> wordBreak(String s, Set<String> wordDict) {  
    return DFS(s, wordDict, new HashMap<String, LinkedList<String>>());  
}         

// DFS function returns an array including all substrings derived from s.  
List<String> DFS(String s, Set<String> wordDict, HashMap<String, LinkedList<String>>map) {  
    if (map.containsKey(s))   
        return map.get(s);  

    LinkedList<String>res = new LinkedList<String>();       
    if (s.length() == 0) {  
        res.add("");  
        return res;  
    }                 
    for (String word : wordDict) {  
        if (s.startsWith(word)) {  
            List<String>sublist = DFS(s.substring(word.length()), wordDict, map);  
            for (String sub : sublist)   
                res.add(word + (sub.isEmpty() ? "" : " ") + sub);                 
        }  
    }         
    map.put(s, res);  
    return res;  
}

简单的topdown + memo, 没有什么特别的地方;

I think the wordDict may be large. So It’s better to solve this problem in a formal way. Below is my java code (derive from https://discuss.leetcode.com/topic/12997/11ms-c-solution-concise/2) hope it helps!

public class Solution {  
    HashMap<String,List<String>> map = new HashMap<String,List<String>>();  
    public List<String> wordBreak(String s, Set<String> wordDict) {  
        List<String> res = new ArrayList<String>();  
        if(s == null || s.length() == 0) {  
            return res;  
        }  
        if(map.containsKey(s)) {  
            return map.get(s);  
        }  
        if(wordDict.contains(s)) {  
            res.add(s);  
        }  
        for(int i = 1 ; i < s.length() ; i++) {  
            String t = s.substring(i);  
            if(wordDict.contains(t)) {  
                List<String> temp = wordBreak(s.substring(0 , i) , wordDict);  
                if(temp.size() != 0) {  
                    for(int j = 0 ; j < temp.size() ; j++) {  
                        res.add(temp.get(j) + " " + t);  
                    }  
                }  
            }  
        }  
        map.put(s , res);  
        return res;  
    }  
}

这个也是上一题提过的一个思路, 就是内循环的两种写法, 一种是循环的是substring, 然后跟set比较contains, 一种是循环的是每一个dict里面是word, 然后直接更当前的input string进行比较; 一般认为第一种做法更好一点, 尤其是可以寄希望于hash计算比equals计算要快一些;

@leogogogo Let’s start form an example. Assume we have a word “leet” and length is four(just forget the dictionary). We want to get final result. Assume we have a function “p()” to solve this solution, that is “res = p(leet)”;
In the brut force method(back tracking), we have the following solution:
res = Math.min(“l” + p(“eet”) , “le” + p(“et”) , “lee” + p(“t”) , “leet”);
(for “le” + p(“et”) the left part is atomic , there is no cut in “le”).
we use the cutting position to show every calculate. We can get an recursion tree just like this:

but in the memo solution, we don’t need to solve subproblems over and over again, we can use the result we have worked out before, so we can pruning the above tree in:

(all of the pictures are from “Introduction to Algorithms”)
So we can reduce the runtime from exponential-time to polynominal-time.
now we have an dictionary, we even not need to call our recursion function letter by letter so the running time is even less.
(I don’t calculate the “substring()” into notice because it is build in method)
If there is something wrong, I’m willing to have a discuss~

I agree. although this program works, it is not efficient if my string is composed of just 2 words for example and my dictionary contains 1 million words.

Time complexity is O(len(wordDict) ^ len(s / minWordLenInDict)), because there’re len(wordDict) possibilities for each cut

这个分析是有点问题的, 因为没有考虑memo; 不过总体的分析思路还是fanning ^ height这个套路, 这个很熟悉了; 这个是OP自己分析的; 那么如果算上memo, 复杂度是多少?

首先memo里面有多少个entry? 这个是很好算的, 因为s总共就这么长, 所以最后是O(s_length)个entry, 不过以前也碰到过, 不能光是因为memo里面只有这么多entry, 就认为最后的复杂度是这么多, 还是要算fanning;

我记得当时讲这个原理的时候用的是这个例子(我们还是简单升级的方法来分析复杂问题, 就先回忆我们熟悉的以前学过的简单的模型):

在这题的context下, 每一个node就是一个input substring(all ends at N, but different start), 然后每一个edge就是一个dict word;

这么说好像OP的分析也没问题? 不对, OP分析的是指数级别, 如果按照这里这个node的分析方法, 实际上应该是s_length * fanning, 这里的fanning就是dict_size; 所以不是指数级别, 而是乘数级别(有时候这个fanning要用aggregate的方法来分析, 当然这里并不需要);

所以简单升级的方法还是有必要的, 就一点一点思考就行了; 类比啊, 简化举例啊; 尤其是简化举例, 我发现高手都喜欢这样搞, 比如分析个复杂度, 人家不是上来就是很抽象的直接做, 往往就是几个简单升级程度的例子case, 然后来分析结果, 然后就看出来规律了;

不过好像忽略了一个地方, 这里比如你在一个node, 拿到了下一个child的结果, 这个后面有一个需要再加工的过程! 这个很烦人啊; 这个不是上面的fanning能解释的(fanning只解释了每一个node的循环的次数); 所以实际上我现在并没有办法确定OP的分析就是有问题的;

The naive DP solution will TLE as Recursion solution. From this problem I think we can see one “disadvantage” of DP: Bottom-up DP incurs many useless sub-solutions even if it reduces cost of repeating computation. Since we have no idea which sub-solution would be used by upper level, all sub-problems are computed.

naive这个话就过分了, 这种Top-Down+memo的就不naive了吗; 只能说两种方向各有优势, 这题正好是看出区分来了而已;

In the worst case the runtime of this algorithm is O(2^n).
Consider the input “aaaaaa”, with wordDict = [“a”, “aa”, “aaa”, “aaaa”, “aaaaa”, “aaaaa”]. Every possible partition is a valid sentence, and there are 2^n-1 such partitions. It should be clear that the algorithm cannot do better than this since it generates all valid sentences. The cost of iterating over cached results will be exponential, as every possible partition will be cached, resulting in the same runtime as regular backtracking. Likewise, the space complexity will also be O(2^n) for the same reason - every partition is stored in memory.
Where this algorithm improves on regular backtracking is in a case like this: “aaaaab”, with wordDict = [“a”, “aa”, “aaa”, “aaaa”, “aaaaa”, “aaaaa”], i.e. the worst case scenario for Word Break I, where no partition is valid due to the last letter ‘b’. In this case there are no cached results, and the runtime improves from O(2^n) to O(n^2).

这个人说的好像是有道理的; 这个方法最后实际上应该还是指数级别的, 因为就算向下的结果是可以直接从memo里面拿出来的, 这个再加工的过程还是不可忽略;

因为每一个node实际上只被计算一次, 所以每一个node实际上只有一次fanning的机会; 这个也是以前总结过的机会; 每一个node本身的流程实际上就是, 第一次被计算的时候, fanning计算, 然后放入memo, 然后返回结果; 所以这个fanning的元素实际上就是每一个node在还不属于memo的第一次计算的时候产生的;

那么我们考虑这里的一个node呢?

fanning部分是没问题的, 有s_length跟node, 每一个node第一次计算的fanning是dict_size, 这些都没问题;

memo加上去的效果就是, 可以假设fanning出去的每一个branch, sub-result本身是可以认为是O(1)就返回上来的;

比如就是他说的这个例子:

那么我们在aaaaa这个地方, cost就可以说是2^4 + 2^3 + 2^2 + 2^1 + 2^0, (因为比如aaaa的结果肯定是O(2^4)级别的数量(不是说时间)), 所以最后的复杂度还真的就是指数级别;

这个真的是相当隐晦了;

感觉这道题目有点扮猪吃老虎的感觉, 看起来是一个很无脑的recursion题目, 实际上里面的理论分析, 包括自己要想出来这个pathological的case, 也都是很难的事情;

https://leetcode.com/problems/word-break-ii/discuss/44178/11ms-C++-solution-(concise)

class Solution {  
    unordered_map<string, vector<string>> m;  

    vector<string> combine(string word, vector<string> prev){  
        for(int i=0;i<prev.size();++i){  
            prev[i]+=" "+word;  
        }  
        return prev;  
    }  

public:  
    vector<string> wordBreak(string s, unordered_set<string>& dict) {  
        if(m.count(s)) return m[s]; //take from memory  
        vector<string> result;  
        if(dict.count(s)){ //a whole string is a word  
            result.push_back(s);  
        }  
        for(int i=1;i<s.size();++i){  
            string word=s.substr(i);  
            if(dict.count(word)){  
                string rem=s.substr(0,i);  
                vector<string> prev=combine(word,wordBreak(rem,dict));  
                result.insert(result.end(),prev.begin(), prev.end());  
            }  
        }  
        m[s]=result; //memorize  
        return result;  
    }  
};

还是一样的做法;

讲起来, 我第一题的时候自己刚开始想到的就是这个做法, 结果这一题尝试DP反而踩了这个最深的坑;

不对, 我第一题自己的写法外循环实际上是for each word in dict, 是比较差的写法; 这里是for each substring of s, 是比较好的写法;

https://leetcode.com/problems/word-break-ii/discuss/44179/Slightly-modified-DP-Java-solution

Hi guys!

There’s a lot of concern in other posts about “aaaa…aab” test case that causes TLE when we run through our string not in reverse but from start to end. I’ve thought a bit on how to add a tiny modification and make just the whole thing more effective, not only pass the TLE case.

The approach is the same as before: we loop through all possible prefixes checking if it in the dictionary and caching the results.

But just before jumping into recursion we could also check that the right reminder has a prefix from the dictionary, because if it hasn’t then there’s no sense in splitting the reminder into sub-strings. It’s just a linear check, which I think also could be optimized with some caching but even without optimization the solution is accepted. And also the code looks quite understandable.

public class Solution {  

    private final Map<String, List<String>> cache = new HashMap<>();  

    private boolean containsSuffix(Set<String> dict, String str) {  
        for (int i = 0; i < str.length(); i++) {  
            if (dict.contains(str.substring(i))) return true;  
        }  
        return false;  
    }  

    public List<String> wordBreak(String s, Set<String> dict) {  
        if (cache.containsKey(s)) return cache.get(s);  
        List<String> result = new LinkedList<>();  
        if (dict.contains(s)) result.add(s);  
        for (int i = 1; i < s.length(); i++) {  
            String left = s.substring(0,i), right = s.substring(i);  
            if (dict.contains(left) && containsSuffix(dict, right)) {  
                for (String ss : wordBreak(right, dict)) {  
                    result.add(left + " " + ss);  
                }  
            }  
        }  
        cache.put(s, result);  
        return result;  
    }  
}

这个其实还是一个cheat OJ的做法, 他就是在preleaf的位置, 先检查一下right(也就是马上要recurse进去的substring)at least ends with a word in the dict. 注意这个check他没有选择用for each word in dict, str.endsWith (word)这样的方法来做, 而是还是用for each suffix of str, check if the dict contains it; 这样避免因为dict的size产生的负担, 但是这里有一个问题, substring这个函数本身是有问题的, 所以他的containsSuffix这个函数实际上是会有O(N^2)的复杂度;

而且本身他这个就是一个Top-Down+memo的做法, 根本没有必要加这个containsSuffix来做的;

submission: 7(100):

class Solution {  
    private int min = Integer.MAX_VALUE;;  
    private int max = 0;  
    public List<String> wordBreak(String s, List<String> wordDict) {  
        HashSet<String> set = new HashSet<String>();  
        for(String word:wordDict){  
            set.add(word);  
            int curLen = word.length();  
            min = (curLen<min)? curLen:min;  
            max = (curLen>max)? curLen:max;  
        }  
        List<String> result = new ArrayList<String>();  
        boolean[] invalid  = new boolean[s.length()];//invalid[i]: [i:] is unbreakable  
        seperate(s,result, new StringBuilder(), 0, set, invalid);  
        return result;  
    }  

    public boolean seperate(String s,List<String> res, StringBuilder tmp, int index, HashSet<String> set, boolean[] invalid){  
        if(index == s.length()){  
            res.add(tmp.toString().trim());  
            return true;  
        }  
        boolean breakable = false;  
        int prelen = tmp.length();  

        int rightbound = Math.min(s.length(),index+max);  
        for(int end = index+1; end<=rightbound; end++){  
            int curLen = end-index;  
            if(end<s.length()&&invalid[end]){  
                continue;  
            }  
            String cur = s.substring(index, end);  
            if(set.contains(cur)){  

                tmp.append(" ").append(cur);  
                breakable |= seperate(s,res,tmp,end,set, invalid);  
                tmp.setLength(prelen);  
            }  
        }  
        invalid[index] = !breakable;  
        return breakable;  
    }  
}

没有特别仔细的看, 大概知道意思; 首先这个人的思路是一个mutation recursion, 而不是returned recursion, 也就是是一个backtracking的思路; 你可能怀疑既然是mutation, 为什么说有一个boolean的返回值? 因为这个实际上后面有用; 然后max的作用我们也都知道了, 就是在node的外循环限制一下范围, 一个小优化;

然后关键就是他这里怎么实现的memo; 因为是backtracking, 所以我们不需要memo里面记录下一个node的所有结果; 但是肯定还是要实现一下memo; 他这里的做法就是, 只记录一个boolean, 来表明一个node(对应的就是一个start index)有没有必要进入; 这样最后也是一个类似memo的prune的结果;

其他的部分代码上面都很明确了; 反正也是一个不错的思路; 这题我一开始还真的没有想到用mutation怎么来做; 这个设计总体还是很熟练的; 最后的速度为什么差这么多, 可能还是因为returned recursion的这个再加工过程, 真的很expensive; 而backtracking做法里面, 因为是用StringBuilder来维护一个path, 所以path的增长和缩短相对来说效率高的多(小path到大path的过程, 在returned recursion里面则是用完全的string +这种东西来做的);

Problem Description

Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, add spaces in s to construct a sentence where each word is a valid dictionary word. You may assume the dictionary does not contain duplicate words.

Return all such possible sentences.

For example, given
s = "catsanddog",
dict = ["cat", "cats", "and", "sand", "dog"].

A solution is ["cats and dog", "cat sand dog"].

UPDATE (2017/1/4):
The wordDict parameter had been changed to a list of strings (instead of a set of strings). Please reload the code definition to get the latest changes.

Difficulty:Hard
Total Accepted:110.4K
Total Submissions:452.1K
Contributor:LeetCode
Companies
googleubertwittersnapchatdropbox
Related Topics
dynamic programmingbacktracking
Similar Questions
Word BreakConcatenated Words

LC140 WordBreakII

WordBreakII [source code]

editorial

Approach #1 Brute Force [Time Limit Exceeded]

Approach #2 Recursion with memoization [Accepted]

Approach #3 Using Dynamic Programming [Time Limit Exceeded]:

topdown not always the same as bottomup

Problem Description

results matching ""

No results matching ""