WordLadder [source code]


public class WordLadder {
static
/******************************************************************************/
class Solution {
    public int ladderLength(String beginWord, String endWord, List<String> wordList) {
        if (beginWord.length () != endWord.length () || beginWord.length () == 0)
            return 0;
        Map<String, Set<String>> patterns = new HashMap<> ();
        for (String word : wordList) {
            char[] chw = word.toCharArray ();
            for (int i = 0; i < chw.length; i++) {
                char c = chw[i];
                chw[i] = '_';
                String w = new String (chw);
                patterns.computeIfAbsent (w, w_arg -> new HashSet<> ()).add (word);
                chw[i] = c;
            }
        }
        Queue<String> queue = new LinkedList<> ();
        Set<String> visited = new HashSet<> ();
        queue.offer (beginWord);
        int level = 0;
        while (!queue.isEmpty ()) {
            int size = queue.size ();
            level++;
            for (int i = 0; i < size; i++) {
                String s = queue.poll ();
                if (s.equals (endWord))
                    return level;
                if (!visited.add (s))
                    continue;
                char[] chs = s.toCharArray ();
                for (int j = 0; j < chs.length; j++) {
                    char c = chs[j];
                    chs[j] = '_';
                    Set<String> pattern_matches = patterns.getOrDefault (new String (chs), Collections.emptySet ());
                    for (String matched_word : pattern_matches)
                        queue.offer (matched_word);
                    chs[j] = c;
                }
            }
        }
        return 0;
    }
}
/******************************************************************************/

    public static void main(String[] args) {
        WordLadder.Solution tester = new WordLadder.Solution();
        String[][] inputs = {
            {"hit", "cog", "5"}, {"hot","dot","dog","lot","log","cog"},
        };
        for (int i = 0; i < inputs.length / 2; i++) {
            System.out.println (Printer.separator ());
            List<String> wl = Arrays.asList (inputs[2 * i + 1]);
            String bw = inputs[2 * i][0], ew = inputs[2 * i][1];
            int output = tester.ladderLength (bw, ew, wl), ans = Integer.parseInt (inputs[2 * i][2]);
            System.out.printf ("%s TO %s WITH %s -> %s, expected: %d\n", 
                bw, ew, wl, Printer.wrap (output + "", output == ans ? 92 : 91), ans
            );
        }
    }
}

首先, 虽然是一个dictionary的题目, 但是感觉这题并不是和用trie来做, 因为trie完成的是一个prefix查询, 但是这里每一步的这个转换, 位置是可以倒退到之前的index的, 而trie只能从上到下的查询;

最后差一点能够按时写完, 整体的逻辑都是对的, 就是最后在push的时候, 忘记写一行undo, 导致最后要debug; 最后是速度是168ms (31%), 应该有更好的办法;

整体的思路还是比较明确的:

这个图本身应该是描述的比较清楚了, 一些需要重点注意点的地方:

  • 什么位置mark visited? 实际上这题还是正常的就是Poll的时候就mark就行了; 虽然queue里面可能同时存在两个比如lot, 但是最后实际上只有一个会被处理; 第二个被Poll出来的那个(较深的那个)会被直接丢弃;
  • 怎么查询adjacency关系? 这个是这个图上不那么直观能够看出来的; 但是这个技巧实际上也见过, 就是一个pattern lookup的技巧; 一开始对list进行一个基于pattern的预处理就行了;

explicit pruning on trace

另外, 还是关于这个visited的处理, 为了便于在自己写程序的时候看出来这种类型的pruning, 你画图的时候对于这种被prune掉的branch, 最好还是写出来, 然后打叉, 而不是直接自己脑子里过一下, 不必要, 就不写了; 很多时候一个问题的prune策略并不是那么明显的, 把这些信息都记录下来有利于总结出来实际的算法而不是停留在一个单独的trace上面;

注意上面level++的位置: 放在每一个level开始之前, 这样假如这个level hit, 那么当前的level数量就是你想要的结果;


没有editorial;


https://leetcode.com/problems/word-ladder/discuss/40707/Easy-76ms-C++-Solution-using-BFS

Well, this problem has a nice BFS structure.

Let’s see the example in the problem statement.

start = "hit"

end = "cog"

dict = ["hot", "dot", "dog", "lot", "log"]

Since only one letter can be changed at a time, if we start from "hit", we can only change to those words which have only one different letter from it, like "hot". Putting in graph-theoretic terms, we can say that "hot" is a neighbor of "hit".

The idea is simpy to begin from start, then visit its neighbors, then the non-visited neighbors of its neighbors… Well, this is just the typical BFS structure.

To simplify the problem, we insert end into dict. Once we meet end during the BFS, we know we have found the answer. We maintain a variable dist for the current distance of the transformation and update it by dist++ after we finish a round of BFS search (note that it should fit the definition of the distance in the problem statement). Also, to avoid visiting a word for more than once, we erase it from dict once it is visited.

The code is as follows.

这个算法还是比较naive的, 只观察到了BFS这个考点, 但是没有观察到pattern这个考点; 你看他选neighbor的时候, 直接就是穷举的;

另外, 他完成visited的方法是用一个erase, 这种delete操作我记得一般是相当expensive的;

class Solution {  
public:  
    int ladderLength(string beginWord, string endWord, unordered_set<string>& wordDict) {  
        wordDict.insert(endWord);  
        queue<string> toVisit;  
        addNextWords(beginWord, wordDict, toVisit);  
        int dist = 2;  
        while (!toVisit.empty()) {  
            int num = toVisit.size();  
            for (int i = 0; i < num; i++) {  
                string word = toVisit.front();  
                toVisit.pop();  
                if (word == endWord) return dist;  
                addNextWords(word, wordDict, toVisit);  
            }  
            dist++;  
        }  
    }  
private:  
    void addNextWords(string word, unordered_set<string>& wordDict, queue<string>& toVisit) {  
        wordDict.erase(word);  
        for (int p = 0; p < (int)word.length(); p++) {  
            char letter = word[p];  
            for (int k = 0; k < 26; k++) {   
                word[p] = 'a' + k;  
                if (wordDict.find(word) != wordDict.end()) {  
                    toVisit.push(word);  
                    wordDict.erase(word);  
                }  
            }  
            word[p] = letter;  
        }   
    }   
};

The above code can still be speeded up if we also begin from end. Once we meet the same word from start and end, we know we are done. This link provides a nice two-end search solution. I rewrite the code below for better readability. Note that the use of two pointers phead and ptail save a lot of time. At each round of BFS, depending on the relative size of head and tail, we point phead to the smaller set to reduce the running time.

class Solution {  
public:  
    int ladderLength(string beginWord, string endWord, unordered_set<string>& wordDict) {  
        unordered_set<string> head, tail, *phead, *ptail;  
        head.insert(beginWord);  
        tail.insert(endWord);  
        int dist = 2;  
        while (!head.empty() && !tail.empty()) {  
            if (head.size() < tail.size()) {  
                phead = &head;  
                ptail = &tail;  
            }  
            else {  
                phead = &tail;   
                ptail = &head;  
            }  
            unordered_set<string> temp;   
            for (auto itr = phead -> begin(); itr != phead -> end(); itr++) {  
                string word = *itr;  
                wordDict.erase(word);  
                for (int p = 0; p < (int)word.length(); p++) {  
                    char letter = word[p];  
                    for (int k = 0; k < 26; k++) {  
                        word[p] = 'a' + k;  
                        if (ptail -> find(word) != ptail -> end())  
                            return dist;  
                        if (wordDict.find(word) != wordDict.end()) {  
                            temp.insert(word);  
                            wordDict.erase(word);  
                        }  
                    }  
                    word[p] = letter;  
                }  
            }  
            dist++;  
            swap(*phead, temp);  
        }  
        return 0;   
    }  
};

后来想想, 因为是set的erase, 说不定其实并不是特别的expensive, 毕竟set的lookup本身也不是很expensive;

这个算法总体的核心思路还是很清楚的, 但是刚开始看的时候有一个地方没有看懂: 为什么dist就是我们最后想要的结果? 不是两头对称的吗?
实际上, 你自己在脑子里过一下这个算法的trace(最好从最开始的开始), dist实际上并不是一个对称增长的概念, 要么是head这边增长一个, 要么是tail那边++一个, 头尾两边是交替进行的; 所以假如你在phead的时候发现走到了一个被包含在ptail里面的variation上面, 那么就可以直接返回这个时候的dist;

其他的地方应该是不难理解的;

  • ph和pt是两个指针, 相当于是在h和t两个set上面多加了一层indirection, 这样这个交换操作就方便很多;
  • 理解这里head和tail的概念; 用头和尾来理解并不是最准确的方式, 最好的方式我认为是用这一头(当前正在处理的这一头)和**那一头;

@palindrome88 I think the original solution is O(nL), n is the number of the words in the wordList, and L is the length of the each word.
Because, the worst case is that each word needs to enqueue and dequeue. And each time the we need to loop of L times, to check if it exits in the wordList. And the find function is O(1) and to change number is O(1). so the finally time complexity is O(nL).
Please let me know, if I am wrong.

First of all, thanks for your detail comment.
But in your first solution, I think, “wordDict.erase(word);” can be deleted(the first statement in function addNextWords), because it is not necesary since you have deleted the word when it is pushed in the queue, so it just influence the beginWord, we delete the beginWord if it exist in toVisit.

好像有点道理;

Update for New Interface.
N*log(N)

class Solution {  
public:  
    int ladderLength(string beginWord, string endWord, vector<string>& wordList) {  
        unordered_set<string> wordDict(wordList.begin(), wordList.end());  
        //wordDict.insert(endWord);  
        queue<string> toVisit;  
        addNextWords(beginWord, wordDict, toVisit);  
        int dist = 2;  
        while (!toVisit.empty()) {  
            int num = toVisit.size();  
            for (int i = 0; i < num; i++) {  
                string word = toVisit.front();  
                toVisit.pop();  
                if (word == endWord) return dist;  
                addNextWords(word, wordDict, toVisit);  
            }  
            dist++;  
        }  
        return 0;  
    }  
private:  
    void addNextWords(string word, unordered_set<string>& wordDict, queue<string>& toVisit) {  
        wordDict.erase(word);  
        for (int p = 0; p < (int)word.length(); p++) {  
            char letter = word[p];  
            for (int k = 0; k < 26; k++) {   
                word[p] = 'a' + k;  
                if (wordDict.find(word) != wordDict.end()) {  
                    toVisit.push(word);  
                    wordDict.erase(word);  
                }  
            }  
            word[p] = letter;  
        }   
    }   
};

哪来的log? 没有人说过会balanced;


https://leetcode.com/problems/word-ladder/discuss/40711/Two-end-BFS-in-Java-31ms.

他原来的代码接口不对, 而且被一个新的case给break掉了; 这个版本还是很快34(91);

public class Solution {  
    public int ladderLength(String beginWord, String endWord, List<String> _wordList) {  
        Set<String> beginSet = new HashSet<String>(), endSet = new HashSet<String>(), wordList = new HashSet<> ();  
        wordList.addAll (_wordList);  
        if (!wordList.contains(endWord)) return 0;      // deals with the new test case  
        int len = 1;  
        int strLen = beginWord.length();  
        HashSet<String> visited = new HashSet<String>();  
        beginSet.add(beginWord);  
        endSet.add(endWord);  
        while (!beginSet.isEmpty() && !endSet.isEmpty()) {  
            if (beginSet.size() > endSet.size()) {  
                Set<String> set = beginSet;  
                beginSet = endSet;  
                endSet = set;  
            }  

            Set<String> temp = new HashSet<String>();  
            for (String word : beginSet) {  
                char[] chs = word.toCharArray();  

                for (int i = 0; i < chs.length; i++) {  
                    for (char c = 'a'; c <= 'z'; c++) {  
                        char old = chs[i];  
                        chs[i] = c;  
                        String target = String.valueOf(chs);  

                        if (endSet.contains(target)) {  
                            return len + 1;  
                        }  

                        if (!visited.contains(target) && wordList.contains(target)) {  
                            temp.add(target);  
                            visited.add(target);  
                        }  
                        chs[i] = old;  
                    }  
                }  
            }  

            beginSet = temp;  
            len++;  
        }  

        return 0;  
    }  
}

I think the speed up is because of avoid visiting the blue part. But the time complexity is still the same as one-end.

这个有一定道理, 其实就是两个方向同时一个三角形BFS搜索开始, 然后找交汇;

整体思路跟上一个解法是类似的, 不过没有引入用指针交换顺序的做法; 不对, 好像有, 直接就是用普通的swap来做的;

其实java里面也可以用类似之前的那个ph, pt的指针的方式来做的; 用两个指针(不允许改变的)存住实际的head和tail, 然后另外两个指针ph和pt来改变指向head还是tail就行了; 不过他这里的swap的做法其实也是指针操作, 所以最后速度应该是差不多的;

"The idea behind bidirectional search is to run two simultaneous searches—one forward from
the initial state and the other backward from the goal—hoping that the two searches meet in
the middle. The motivation is that b^(d/2) + b^(d/2) is much less than b^d. b is branch factor, d is depth. "
----- section 3.4.6 in Artificial Intelligence - A modern approach by Stuart Russel and Peter Norvig

所以这个优化其实是有说法的; 可以记一下, 面试的时候可以吹一吹;

At last I’m able to understand. I learned a lot from this solution.

  • It’s much faster than one-end search indeed as explained in wiki.
  • BFS isn’t equivalent to Queue. Sometimes Set is more accurate representation for nodes of level. (also handy since we need to check if we meet from two ends)
  • It’s safe to share a visited set for both ends since if they meet same string it terminates early. (vis is useful since we cannot remove word from dict due to bidirectional search)
  • It seems like if(set.add()) is a little slower than if(!contains()) then add() but it’s more concise.

Thank you all for sharing and explaining!
update: the dictList is of List type now. And all transformed words including endWord must be in dictList.

    public int ladderLength(String beginWord, String endWord, List<String> wordList) {  
        Set<String> dict = new HashSet<>(wordList), qs = new HashSet<>(), qe = new HashSet<>(), vis = new HashSet<>();  
        qs.add(beginWord);  
        if (dict.contains(endWord)) qe.add(endWord); // all transformed words must be in dict (including endWord)  
        for (int len = 2; !qs.isEmpty(); len++) {  
            Set<String> nq = new HashSet<>();  
            for (String w : qs) {  
                for (int j = 0; j < w.length(); j++) {  
                    char[] ch = w.toCharArray();  
                    for (char c = 'a'; c <= 'z'; c++) {  
                        if (c == w.charAt(j)) continue; // beginWord and endWord not the same, so bypass itself  
                        ch[j] = c;  
                        String nb = String.valueOf(ch);  
                        if (qe.contains(nb)) return len; // meet from two ends  
                        if (dict.contains(nb) && vis.add(nb)) nq.add(nb); // not meet yet, vis is safe to use  
                    }  
                }  
            }  
            qs = (nq.size() < qe.size()) ? nq : qe; // switch to small one to traverse from other end  
            qe = (qs == nq) ? qe : nq;  
        }  
        return 0;  
    }

对于BFS的理解确实最好不要跟queue一定联系在一起; 其实好像以前也见过类似的题目, BFS问题, 最关键的是你有一个collection, 始终只包含just and nothing more than一个level的内容; 以前见过的一个做法就是直接不停的用类似level and next_level两个collection的update来模拟一个level前进的过程, 有点想1D DP的时候的空间优化常用的技巧;

hi @watera427, i think the intuition is that you are replacing a big search tree in the one-end solution with two smaller trees in the two-end solution. Both solutions have the same TOTAL depth, yet tree width grows exponentially w.r.t. depths, and the two-end solutions results in less nodes being visited.

这个理解方式也不错;

@kleklokin I don’t think that’s the case, since the time complexicity is 26 * O(n), not exponential, n is the length of wordList(only the word in the dict will do a 26-loop, and each word only do it once). The reason that this approach is fast because after each bfs, it always choose the set that has the smaller size, which means it always try to waste less computation to meet the goal(the set that has the maximum size won’t need to generate new words). if you delete the code "if (beginSet.size() > endSet.size()) ", just simply swtich the role of begin and end each time, it will take 80+ms again.

所以这个process smaller的优化一点都不trivial;


https://leetcode.com/problems/word-ladder/discuss/40708/Share-my-two-end-BFS-in-C++-80ms.

//BFS, two-end method  
//traverse the path simultaneously from start node and end node, and merge in the middle  
//the speed will increase (logN/2)^2 times compared with one-end method  
int ladderLength(string start, string end, unordered_set<string> &dict) {  
    unordered_set<string> begSet, endSet, *set1, *set2;  
    begSet.insert(start);  
    endSet.insert(end);  
    int h=1, K=start.size();  
    while(!begSet.empty()&&!endSet.empty()){  
        if(begSet.size()<=endSet.size()){   //Make the size of two sets close for optimization  
            set1=&begSet;   //set1 is the forward set  
            set2=&endSet;   //set2 provides the target node for set1 to search  
        }  
        else{  
            set1=&endSet;  
            set2=&begSet;  
        }  
        unordered_set<string> itmSet;   //intermediate Set  
        h++;  
        for(auto i=set1->begin();i!=set1->end();i++){  
            string cur=*i;  
            for(int k=0;k<K;k++){   //iterate the characters in string cur  
                char temp=cur[k];  
                for(int l=0;l<26;l++){  //try all 26 alphabets  
                    cur[k]='a'+l;  
                    auto f=set2->find(cur);  
                    if(f!=set2->end())return h;  
                    f=dict.find(cur);  
                    if(f!=dict.end()){  
                        itmSet.insert(cur);  
                        dict.erase(f);  
                    }  
                }  
                cur[k]=temp;  
            }  
        }  
        swap(*set1, itmSet);  
    }  
    return 0;  
}

这个好像是第一个想到用两头做法的人;

大概是思路还是类似的;

Hi! The two pointers are unnecessary. Just use ‘swap’ function. This is My solution. It takes 72 ms.

这个就是为什么很多人认为实际上在c++场合下, 直接用swap也比用两个指针更加方便;

但是OP对于加速效率的分析好像有点问题:

I agree with sqrt(N)/2. Say the ladder is of length ‘len’, and each word branches out ‘m’ new words.
From one end, we have to generate O(m^len) branches. Yet from both ends, we are only supposed to generate O(2m^(len/2)) braches. So we are sqrt(m^len)/2 times faster. Correct me if I’m wrong.

注意这里的height不是单纯的取log;

在有hit的情况下, 这个len应该就是beginWord和endWord之间实际的ladder距离, 因为无论怎么搜, 两种做法最后都是这么多层之后就找到了;

在没有hit的情况下, 有区别吗? 感觉好像没有, 都是要遍历wordList里面所有的word;


https://leetcode.com/problems/word-ladder/discuss/40717/Another-accepted-Java-solution-(BFS)

public int ladderLength(String start, String end, Set<String> dict) {  
  // Use queue to help BFS  
  Queue<String> queue = new LinkedList<String>();  
  queue.add(start);  
  queue.add(null);  

  // Mark visited word  
  Set<String> visited = new HashSet<String>();  
  visited.add(start);  

  int level = 1;  

  while (!queue.isEmpty()) {  
    String str = queue.poll();  

    if (str != null) {  
      // Modify str's each character (so word distance is 1)  
      for (int i = 0; i < str.length(); i++) {  
        char[] chars = str.toCharArray();  

        for (char c = 'a'; c <= 'z'; c++) {  
          chars[i] = c;  

          String word = new String(chars);  

          // Found the end word  
          if (word.equals(end)) return level + 1;  

          // Put it to the queue  
          if (dict.contains(word) && !visited.contains(word)) {  
            queue.add(word);  
            visited.add(word);  
          }  
        }  
      }  
    } else {  
      level++;  

      if (!queue.isEmpty()) {   
        queue.add(null);  
      }  
    }  
  }  

  return 0;  
}

大概还是类似的思路; 看来这题用pattern并没有什么特别明显的受益?

Adding and polling null is not necessary, if you can check the queue size in BFS. I modified the codes a bit and it works as well:

这个用Null来delimit一个level的做法以前是见过的, 是一种比较老派的BFS的写法;

后面看到不少java做法里面, 也使用了大量的Set.remove, 看来hash结构的remove好像并没有那么可怕;


submission最优解: 25(99.6):

class Solution {  
    public int ladderLength(String beginWord, String endWord, List < String > wordList) {  
        // edge case  
        if (wordList == null || wordList.size() == 0) {  
            return 0;  
        }  

        Set < String > setTotal = new HashSet < String > (wordList);  
        if (!setTotal.contains(endWord)) {  
            return 0;  
        }  

        setTotal.add(beginWord);  

        Queue < String > qSmall = new ArrayDeque < String > ();  
        Queue < String > qLarge = new ArrayDeque < String > ();  
        Set < String > setVisit = new HashSet < String > ();  

        qSmall.add(beginWord);  
        qLarge.add(endWord);  
        setVisit.add(beginWord);  
        setVisit.add(endWord);  
        int len = 2;  

        // BFS traversal  
        while (!qSmall.isEmpty() && !qLarge.isEmpty()) {  
            // follow narrow bredth path (smaller queue size path)  
            if (qSmall.size() > qLarge.size()) {  
                Queue < String > temp = qSmall;  
                qSmall = qLarge;  
                qLarge = temp;  
            }  


            Set < String > set = new HashSet < String > (qLarge);  
            int size = qSmall.size();  

            for (int i = 0; i < size; i++) {  
                String curr = qSmall.poll();  
                char[] arr = curr.toCharArray();  

                for (int j = 0; j < arr.length; j++) {  
                    char old = arr[j];  

                    for (char c = 'a'; c <= 'z'; c++) {  
                        if (c == old) {  
                            continue;  
                        }  

                        arr[j] = c;  
                        String next = String.valueOf(arr);  
                        if (!setTotal.contains(next)) {  
                            continue;  
                        }  


                        if (set.contains(next)) {  
                            return len;  
                        }  

                        if (setVisit.add(next)) {  
                            qSmall.add(next);  
                        }  
                    }  

                    arr[j] = old;  
                }  
            }  

            len++;  
        }  

        return 0;  
    }  
}

好像就是用visited来代替remove的操作, 然后就是正常的两头做法;

这题学到的一个重要的技巧就是:

Two-End optimization in BFS reachablility


Problem Description

Given two words (beginWord and endWord), and a dictionary's word list, find the length of shortest transformation sequence from beginWord to endWord, such that:

  • Only one letter can be changed at a time.
  • Each transformed word must exist in the word list. Note that beginWord is not a transformed word.

For example,
Given:

beginWord = "hit"  
endWord = "cog"  
wordList = ["hot","dot","dog","lot","log","cog"]

As one shortest transformation is "hit" -> "hot" -> "dot" -> "dog" -> "cog",
return its length 5.

Note:

  • Return 0 if there is no such transformation sequence.
  • All words have the same length.
  • All words contain only lowercase alphabetic characters.
  • You may assume no duplicates in the word list.
  • You may assume beginWord and endWord are non-empty and are not the same.

UPDATE (2017/1/20):

  • The wordList parameter had been changed to a list of strings (instead of a set of strings). Please reload the code definition to get the latest changes.

Difficulty:Medium
Total Accepted:155.6K
Total Submissions:780.8K
Contributor:LeetCode
Companies
facebookamazonlinkedinsnapchatyelp
Related Topics
breadth-first search
Similar Questions
Word Ladder IIMinimum Genetic Mutation

results matching ""

    No results matching ""