WordBreak [source code]


public class WordBreak {
static
/******************************************************************************/
class Solution {
    Map<String, Boolean> map = new HashMap<> ();

    public boolean wordBreak(String s, List<String> wordDict) {
        if (map.containsKey (s))
            return map.get (s);
        boolean res = false;
        for (String word : wordDict) {
            if (s.startsWith (word) && (s.length () == word.length () || wordBreak (s.substring (word.length ()), wordDict))) {
                res = true;
                break;
            }
        }
        map.put (s, res);
        return res;
    }
}
/******************************************************************************/

    public static void main(String[] args) {
        WordBreak.Solution tester = new WordBreak.Solution();
        String[][] inputs = {
            {"leetcode"}, {"leet","code", "lee"}, {"true"},
            {"leetleetcode"}, {"leet","code", "lee"}, {"true"},
        };
        for (int i = 0; i < inputs.length / 3; i++) {
            System.out.println (Printer.separator ());
            String s = inputs[3 * i][0];
            List<String> wordDict = Arrays.asList (inputs[3 * i + 1]);
            boolean ans = inputs[3 * i + 2][0].equals ("true"), output = tester.wordBreak (s, wordDict);
            System.out.printf ("%s and %s\n%s, expected: %b\n",
                s, wordDict, Printer.wrap (output + "", output == ans ? 92 : 91), ans
            );
        }
    }
}

立刻想到的一个testcase:

"leetleetcode"  
["leet","code"]

这个是true的, 所以一个word是可以重复使用的;

完全没有思路, 先从笨办法人逻辑开始想想看;

简单的思路肯定是认为直接匹配成功leet, 然后peel掉, 然后继续匹配就行; 不过一个问题是, 加入dictionary里面比如还有lee这样的, 那么事实上站在这个level, 是可能继续有其他的匹配方式的; 所以这个题目一个直接的想法是一个recursion或者说backtracking; tag里面提到了DP, 那么意思是可能是要加一个memo的操作;

感觉这题还是从tag看到DP的时候, 得到的提示太多了, 最后很快就AC了, 速度是11ms (87%).

The naive approach to solve this problem is to use recursion and backtracking. For finding the solution, we check every possible prefix of that string in the dictionary of words, if it is found in the dictionary, then the recursive function is called for the remaining portion of that string. And, if in some function call it is found that the complete string is in dictionary, then it will return true.

public class Solution {  
    public boolean wordBreak(String s, List<String> wordDict) {  
        return word_Break(s, new HashSet(wordDict), 0);  
    }  
    public boolean word_Break(String s, Set<String> wordDict, int start) {  
        if (start == s.length()) {  
            return true;  
        }  
        for (int end = start + 1; end <= s.length(); end++) {  
            if (wordDict.contains(s.substring(start, end)) && word_Break(s, wordDict, end)) {  
                return true;  
            }  
        }  
        return false;  
    }  
}

他这里base case是放在leaf的位置来判断的, 我上面的写法是在preleaf的位置判断的;

Approach #2 Recursion with memoization [Accepted]

Algorithm

In the previous approach we can see that many subproblems were redundant, i.e we were calling the recursive function multiple times for a particular string. To avoid this we can use memoization method, where an array
memo is used to store the result of the subproblems. Now, when the function is called again for a particular string, value will be fetched and returned using the memo array, if its value has been already evaluated.

With memoization many redundant subproblems are avoided and recursion tree is pruned and thus it reduces the time complexity by a large factor.

public class Solution {  
    public boolean wordBreak(String s, List<String> wordDict) {  
        return word_Break(s, new HashSet(wordDict), 0, new Boolean[s.length()]);  
    }  
    public boolean word_Break(String s, Set<String> wordDict, int start, Boolean[] memo) {  
        if (start == s.length()) {  
            return true;  
        }  
        if (memo[start] != null) {  
            return memo[start];  
        }  
        for (int end = start + 1; end <= s.length(); end++) {  
            if (wordDict.contains(s.substring(start, end)) && word_Break(s, wordDict, end, memo)) {  
                return memo[start] = true;  
            }  
        }  
        return memo[start] = false;  
    }  
}

这里比我的做法好的一点是, 没有用substring来做这个memo; 我这题有点秀逗了, 刚开始看到memo, 就觉得肯定避免不了用substring, 因为既然memo, 那么只能用string来作为key是不是; 但是这题来说, 实际上是可以用sub as start index思路来表达的, 因为当你站在一个substring这个subproblem的时候, 你最后想要知道的, 实际上是要一直匹配到底的, 所以并不存在说start本身不能决定一个substring: 比如只匹配一部分: 这种情况substring做法可能是有点优势;

不过换个角度想想, 这题到底能不能这样做? 比如我走到一个start的位置, 然后我并不是向下继续匹配到底(结束位置), 可能匹配一部分呢? 听起来很好, 你具体怎么做呢?

另外, memo start的做法的另外一个好处是直接一个array就能完成memo, 而不需要用Map;

Approach #3 Using Breadth-First-Search [Accepted]

Algorithm

Another approach is to use Breadth-First-Search. Visualize the string as a tree where each node represents the prefix upto index end. Two nodes are connected only if the substring between the indices linked with those nodes is also a valid string which is present in the dictionary. In order to form such a tree, we start with the first character of the given string (say
s) which acts as the root of the tree being formed and find every possible substring starting with that character which is a part of the dictionary. Further, the ending index (say i) of every such substring is pushed at the back of a queue which will be used for Breadth First Search. Now, we pop an element out from the front of the queue and perform the same process considering the string s(i+1,end) to be the original string and the popped node as the root of the tree this time. This process is continued, for all the nodes appended in the queue during the course of the process. If we are able to obtain the last element of the given string as a node (leaf) of the tree, this implies that the given string can be partitioned into substrings which are all a part of the given dictionary.

public class Solution {  
    public boolean wordBreak(String s, List<String> wordDict) {  
        Set<String> wordDictSet=new HashSet(wordDict);  
        Queue<Integer> queue = new LinkedList<>();  
        int[] visited = new int[s.length()];  
        queue.add(0);  
        while (!queue.isEmpty()) {  
            int start = queue.remove();  
            if (visited[start] == 0) {  
                for (int end = start + 1; end <= s.length(); end++) {  
                    if (wordDictSet.contains(s.substring(start, end))) {  
                        queue.add(end);  
                        if (end == s.length()) {  
                            return true;  
                        }  
                    }  
                }  
                visited[start] = 1;  
            }  
        }  
        return false;  
    }  
}

这个做法和上面有memo的recursion的复杂度都是N^2, 自己想想为什么;

注意这里end, 是一个exclusive的index, 这个选择方法可能理解上容易出小错误, 要稍微警惕一下;

这个BFS实现很有问题, 我给出了优化:

I optimized the BFS solution in editorial from 30% to 95%:

public class Solution {  
    public boolean wordBreak(String s, List<String> wordDict) {  
        int max_len = -1;  
        for (String word : wordDict)  
            max_len = Math.max (max_len, word.length ());  
        Set<String> wordDictSet=new HashSet(wordDict);  
        Queue<Integer> queue = new LinkedList<>();  
        boolean[] visited = new boolean[s.length()];  
        queue.add(0);  
        visited[0] = true;  
        while (!queue.isEmpty()) {  
            int start = queue.remove();  
            for (int end = start + 1; end <= s.length () && end - start <= max_len; end++) {  
                if (wordDictSet.contains(s.substring(start, end)) && !(end < s.length () && visited[end])) {  
                    if (end == s.length()) {  
                        return true;  
                    }  
                    queue.add(end);  
                    visited[end] = true;  
                }  
            }  
        }  
        return false;  
    }  
}

Optimizations:

only traverse max_len rather than to the end of s to find end.
mark visited when enqueue rather than dequeue.

第二条可能不那么容易看出来:

看这个例子, s2的时候, 会把S4放到queue里面去, 那么当我们进行到S3的时候, 这个时候就不应该在S3的位置继续enqueue S4了, 因为S2已经加过了; BFS里面注意mark的位置, 这个也是老生常谈了;

Approach #4 Using Dynamic Programming [Accepted]:

...

public class Solution {  
    public boolean wordBreak(String s, List<String> wordDict) {  
        Set<String> wordDictSet=new HashSet(wordDict);  
        boolean[] dp = new boolean[s.length() + 1];  
        dp[0] = true;  
        for (int i = 1; i <= s.length(); i++) {  
            for (int j = 0; j < i; j++) {  
                if (dp[j] && wordDictSet.contains(s.substring(j, i))) {  
                    dp[i] = true;  
                    break;  
                }  
            }  
        }  
        return dp[s.length()];  
    }  
}

就是Bottom-Up的写法, 没有认真看了; 这题相对有点新意的写法应该是BFS;

https://leetcode.com/problems/word-break/discuss/43790/Java-implementation-using-DP-in-two-ways

public class Solution {  
    public boolean wordBreak(String s, Set<String> dict) {  

        boolean[] f = new boolean[s.length() + 1];  

        f[0] = true;  


        // // First DP  
        // for(int i = 1; i <= s.length(); i++){  
        //     for(String str: dict){  
        //         if(str.length() <= i){  
        //             if(f[i - str.length()]){  
        //                 if(s.substring(i-str.length(), i).equals(str)){  
        //                     f[i] = true;  
        //                     break;  
        //                 }  
        //             }  
        //         }  
        //     }  
        // }  

        //Second DP  
        for(int i=1; i <= s.length(); i++){  
            for(int j=0; j < i; j++){  
                if(f[j] && dict.contains(s.substring(j, i))){  
                    f[i] = true;  
                    break;  
                }  
            }  
        }  

        return f[s.length()];  
    }  
}

真的没有想到这题居然有正宗DP写法; 这个DP算法本身是完全合理的设计, 没有任何问题; 那这里我就有一个问题了, 这帮人到底是怎么想到用DP来做这个题目的? 可以这样认为, 一个s, 作为一个subproblem, 其实可能有很多种通过word组合的方式, 类似的对应了, 有很多的path可以组成这个S; 当你拿到一个S的时候, 你从一头开始, 先选择一个word, 但是这个选择实际上是有很多种的, 选完了之后剩下的subproblem本身也是有很多选择; 这种时候就要想到用DP来去除uncertainty: DP的优势就是, 根本不用去思考这个不确定性, 只要建立了一个recursive structure, 最后就能保证得到想要求的value;

但是对于他的第二种做法, 实际上还有改进空间:

if you do :
for(int j = i-1 ; j >= 0; j–)
in the inner loop, will improve performance a little bit

public class Solution {  
    public class TrieNode {  
        boolean isWord;  
        TrieNode[] c;  

        public TrieNode(){  
            isWord = false;  
            c = new TrieNode[128];  
        }  
    }  

    public void addWord(TrieNode t, String w) {  
        for (int i = 0; i < w.length(); i++) {  
            int j = (int)w.charAt(i);   
            if (t.c[j] == null) t.c[j] = new TrieNode();  
            t = t.c[j];  
        }  
        t.isWord = true;  
    }  

    public boolean wordBreak(String s, Set<String> wordDict) {  
        TrieNode t = new TrieNode(), cur;  
        for (String i : wordDict) addWord(t, i);  
        char[] str = s.toCharArray();  
        int len = str.length;  
        boolean[] f = new boolean[len + 1];  
        f[len] = true;  

        for (int i = len - 1; i >= 0; i--) {  
            //System.out.println(str[i]);  
            cur = t;  
            for (int j = i; cur != null && j < len ; j++) {  
                cur = cur.c[(int)str[j]];  
                if (cur != null && cur.isWord && f[j + 1]) {  
                    f[i] = true;  
                    break;  
                }  
            }  
        }  
        return f[0];  
    }  
}

这个优化还是不错的, 因为无论是OP的第一个做法还是第二个做法, 实际上的过程都是一个prefix match(找到最短的match的一个prefix)来寻找hit, 所以这里用trie真的是可以提升效率的;

关键不是记住一个办法, 而是在实际上确实需要用这个办法的时候, 要能够想到;

f[i] stands for whether subarray(0, i) can be segmented into words from the dictionary. So f[0] means whether subarray(0, 0) (which is an empty string) can be segmented, and of course the answer is yes.
The default value for boolean array is false. Therefore we need to set f[0] to be true.

这个实际上就是一个类似dot表达里面见过的问题; 然后这里base case本身也不难理解: semiring的初始值这里是需要初始化到true的;

public class Solution {  
    public boolean wordBreak(String s, Set<String> wordDict) {  
        if(s == null || s.length() == 0){  
            return true;  
        }  
        int n = s.length();  
        boolean [] dp = new boolean[n+1];  
        dp[0] = true ;  

        int maxLength = 0;  
        for(String t : wordDict){  
            maxLength = Math.max(maxLength, t.length());  
        }  

        for(int i = 1; i <= n; i++){  
            dp[i] = false;  
            for(int j = i-1; j >= Math.max(0, i - maxLength) && !dp[i]; j--){  
                if(dp[j]){  
                    if(wordDict.contains(s.substring(j, i))){  
                        dp[i] = true;  
                    }  
                }  
            }  

        }  

        return dp[n];  


    }  

}

用max_len控制的小优化;

Because after you processed a few words, it is likely that the match word will be found at the end of the finished part of the string, but not a really long word which begins from the beginning of the string

这个是解释为什么j从右到左更好;

一个更加简练的写法:

public class Solution {  
    public boolean wordBreak(String s, Set<String> wordDict) {  
        int n = s.length();  
        boolean[] f = new boolean[n + 1];  
        f[0] = true;  
        for (int i = 1; i <= n; i++)  
            for (int j = 1; j <= i && !f[i]; j++)  
                f[i] = wordDict.contains(s.substring(j - 1, i)) && f[j - 1];  
        return f[n];  
    }  
}

另一个写法:

public boolean wordBreak(String s, List<String> wordDict) {  
    boolean[] dp = new boolean[s.length() + 1];  
    Set<String> set = new HashSet<>();  
    set.addAll(wordDict);  

    dp[0] = true;  
    for (int i = 1; i <= s.length(); i++) {  
        for (int j = i - 1; j >= 0; j--) {  
            dp[i] = dp[j] && set.contains(s.substring(j, i));  
            if(dp[i]) break;  
        }  
    }  
    return dp[s.length()];  
}

k = S.length(), n = dict.size().
The run time complexity for the first DP is k*n and the complexity for the second DP is k^2. The performance has nothing to do with how strings are generated. So I don’t think it will matter whether the strings are randomly generated or not.

这个分析是有问题的, 因为第一个方法里面的equals, 第二个方法里面的contains, 都是需要一个string comparison的操作的, 所以这个cost他没有算进去; 当然了, 这个最后两个方法还能不能说相等, 取决于set的contains是怎么实现的; 假如每一个word的平均长度是l, 那么第一个方法的复杂度是knl(也可以用aggregate的角度来分析, 结果是差不多的); 第二个方法的复杂度就取决于这个hash最后实际上是怎么计算的了;

https://leetcode.com/problems/word-break/discuss/43814/C++-Dynamic-Programming-simple-and-fast-solution-(4ms)-with-optimization

We use a boolean vector dp[]. dp[i] is set to true if a valid word (word sequence) ends there. The optimization is to look from current position i back and only substring and do dictionary look up in case the preceding position j with dp[j] == true is found.

bool wordBreak(string s, unordered_set<string> &dict) {  
        if(dict.size()==0) return false;  

        vector<bool> dp(s.size()+1,false);  
        dp[0]=true;  

        for(int i=1;i<=s.size();i++)  
        {  
            for(int j=i-1;j>=0;j--)  
            {  
                if(dp[j])  
                {  
                    string word = s.substr(j,i-j);  
                    if(dict.find(word)!= dict.end())  
                    {  
                        dp[i]=true;  
                        break; //next i  
                    }  
                }  
            }  
        }  

        return dp[s.size()];  
    }

这个还是类似的区别: 如果j是上一个node, 然后i是当前的node, 这个做法的思路就是先找到一个合格的parent node, 然后比较j->i这个path是否合理; 实际上最后这个算法看起来跟path寻找问题还是非常类似的;

If you run your code in http://www.lintcode.com/en/problem/word-break/, you will get TLE. Actually, we just need to consider the max length of a word in the dictionary in the second loop in DP. Maybe change for(int j = i-1; j >= 0; j--) to for(int j = i-1; j >= max(0, i - maxLen); j--) is better, where maxLen can be pre-calculated. Although for some case it may not be a good job (since we need to consider the complexity of the computation of maxLen), but if the size of the dictionary is not quite that large, it will be a good optimization.

https://leetcode.com/problems/word-break/discuss/43797/A-solution-using-BFS

People have posted elegant solutions using DP. The solution I post below using BFS is no better than those. Just to share some new thoughts.

We can use a graph to represent the possible solutions. The vertices of the graph are simply the positions of the first characters of the words and each edge actually represents a word. For example, the input string is “nightmare”, there are two ways to break it, “night mare” and “nightmare”. The graph would be

0–>5–>9

| _^

The question is simply to check if there is a path from 0 to 9. The most efficient way is traversing the graph using BFS with the help of a queue and a hash set. The hash set is used to keep track of the visited nodes to avoid repeating the same work.

For this problem, the time complexity is O(n^2) and space complexity is O(n), the same with DP. This idea can be used to solve the problem word break II. We can simple construct the graph using BFS, save it into a map and then find all the paths using DFS.

bool wordBreak(string s, unordered_set<string> &dict) {  
    // BFS  
    queue<int> BFS;  
    unordered_set<int> visited;  

    BFS.push(0);  
    while(BFS.size() > 0)  
    {  
        int start = BFS.front();  
        BFS.pop();  
        if(visited.find(start) == visited.end())  
        {  
            visited.insert(start);  
            for(int j=start; j<s.size(); j++)  
            {  
                string word(s, start, j-start+1);  
                if(dict.find(word) != dict.end())  
                {  
                    BFS.push(j+1);  
                    if(j+1 == s.size())  
                        return true;  
                }  
            }  
        }  
    }  

    return false;  
}

如果上一种做法那里能理解为什么这题跟path问题非常类似, 那么想到BFS也是很直接; 虽然并不一定要用这个思路过程;

Nice solution! I think the basic idea here is same as DP:probe based on previous successful match
I made a Java implementation based on your BFS idea

public boolean wordBreak(String s, Set<String> dict) {  
    if (dict.contains(s)) return true;  
    Queue<Integer> queue = new LinkedList<Integer>();  
    queue.offer(0);  
    // use a set to record checked index to avoid repeated work.  
    // This is the key to reduce the running time to O(N^2).  
    Set<Integer> visited = new HashSet<Integer>();  
    visited.add(0);  
    while (!queue.isEmpty()) {  
        int curIdx = queue.poll();  
        for (int i = curIdx+1; i <= s.length(); i++) {  
            if (visited.contains(i)) continue;  
            if (dict.contains(s.substring(curIdx, i))) {  
                if (i == s.length()) return true;  
                queue.offer(i);  
                visited.add(i);  
            }  
        }  
    }  
    return false;  
}

submission: 5(100):

class Solution {  
    public boolean wordBreak(String s, List<String> wordDict) {  
        int maxWordLen = 0;  
        Set<String> dict = new HashSet<>();  
        for (String word : wordDict) {  
            dict.add(word);  
            maxWordLen = Math.max(maxWordLen, word.length());  
        }  
        boolean[] canBreak = new boolean[s.length() + 1];  
        canBreak[0] = true;  
        for (int i = 1; i <= s.length(); i++) {  
            for (int j = i - 1; j >= 0 && i - j <= maxWordLen; j--) {  
                if (canBreak[j] && dict.contains(s.substring(j, i))) {  
                    canBreak[i] = true;  
                    break;  
                }  
            }  
        }          
        return canBreak[s.length()];  
    }  
}

就是加了一个max_len优化的DP做法;

Problem Description

Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, determine if s can be segmented into a space-separated sequence of one or more dictionary words. You may assume the dictionary does not contain duplicate words.

For example, given
s = "leetcode",
dict = ["leet", "code"].

Return true because "leetcode" can be segmented as "leet code".

UPDATE (2017/1/4):
The wordDict parameter had been changed to a list of strings (instead of a set of strings). Please reload the code definition to get the latest changes.

Difficulty:Medium
Total Accepted:201.9K
Total Submissions:643.4K
Contributor:LeetCode
Companies
googlefacebookamazonbloomberguberyahoopocket gemssquarecoupang
Related Topics
dynamic programming
Similar Questions
Word Break II

LC139 WordBreak

WordBreak [source code]

editorial

Approach #1 Brute Force [Time Limit Exceeded]

Approach #2 Recursion with memoization [Accepted]

Approach #3 Using Breadth-First-Search [Accepted]

Approach #4 Using Dynamic Programming [Accepted]:

Problem Description

results matching ""

No results matching ""