ImplementMagicDictionary [source code]


public class ImplementMagicDictionary {
static
/******************************************************************************/
class MagicDictionary {
    TrieNode root;

    /** Initialize your data structure here. */
    public MagicDictionary() {
        root = new TrieNode ();
        root.word = "-";
    }

    /** Build a dictionary through a list of words */
    public void buildDict(String[] dict) {
        for (String word : dict) {
            TrieNode cur = root;
            for (char c : word.toCharArray ())
                cur = cur.links.computeIfAbsent (c, key -> new TrieNode ());
            cur.word = word;
        }
    }

    /** Returns if there is any word in the trie that equals to the given word after modifying exactly one character */
    public boolean search(String word) {
        char[] chs = word.toCharArray ();
        for (int i = 0; i < chs.length; i++) {
            // I is the position where the differing letter is at
            TrieNode cur = root;
            int j = 0;
            for (; j < chs.length; j++) {
                // J is the index to iterate through WORD
                if (j == i) {
                    // We should differ (choose a different letter) here
                    boolean success = false;
                    for (Character c : cur.links.keySet ()) {
                        // only consider differing letters that are ALSO valid
                        if (c != chs[j]) {
                            success = true;
                            cur = cur.links.get (c);
                            break;
                        }
                    }
                    if (!success)
                        break;
                } else {
                    if ((cur = cur.links.get (chs[j])) == null)
                        break;
                }
            }
            /* Failure, if J does not chase through the entire WORD (e.g. can't find a differing letter at some 
                position); or because even though we finished chasing, this differ-by-one string is not actually
                in the dictionary (the terminating node does not store anything) */
            if (j < chs.length)
                continue;
            if (cur.word.length () == 0)
                continue;
            return true;
        }
        return false;
    }

    static class TrieNode {
        String word = "";
        Map<Character, TrieNode> links = new HashMap<> ();
    }
}
/******************************************************************************/

/**
 * Your MagicDictionary object will be instantiated and called as such:
 * MagicDictionary obj = new MagicDictionary();
 * obj.buildDict(dict);
 * boolean param_2 = obj.search(word);
 */

    public static void main(String[] args) {
        String[][] inputs = {
            {"hello", "leetcode"}, {"hello"}, {"false"},
            {"hello", "leetcode"}, {"hhllo"}, {"true"},
            {"hello", "leetcode"}, {"hell"}, {"false"},
            {"hello", "leetcode"}, {"leetcoded"}, {"false"},

            {"hello", "leetcode", "hallo"}, {"hello"}, {"true"},
            {"hello", "leetcode", "hallo"}, {"hhllo"}, {"true"},
            {"hello", "leetcode", "hallo"}, {"hell"}, {"false"},
            {"hello", "leetcode", "hallo"}, {"leetcoded"}, {"false"},

            {"a","b","ab","abc","abcabacbababdbadbfaejfoiawfjaojfaojefaowjfoawjfoawj","abcdefghijawefe","aefawoifjowajfowafjeoawjfaow","cba","cas","aaewfawi","babcda","bcd","awefj"}, {"aa"}, {"true"},
        };
        for (int i = 0; i < inputs.length / 3; i++) {
            System.out.println (Printer.separator ());
            ImplementMagicDictionary.MagicDictionary tester = new ImplementMagicDictionary.MagicDictionary();
            String[] dict = inputs[3 * i];
            tester.buildDict (dict);
            String word = inputs[3 * i + 1][0];
            boolean ans = inputs[3 * i + 2][0].equals ("true");
            boolean output = tester.search (word);
            System.out.printf ("search [%s] in [%s] -> %s, expected: %b\n", 
                word, Printer.array (dict), Printer.wrapColor (output + "", output == ans ? "green" : "red"), ans
            );
        }
    }
}

笨办法好像倒不是很难, 直接扔到一个HashMap里面去, 然后search的时候直接就是对每一个字母进行替换, 然后查询; 因为这个题目, 你注意, 只让你设计get, 但是没有让你设计put, 所以最好的思路应该是能让get的cost降低; 我上面的做法, build的速度很快, 但是get并不是很快, 因为对于word的每一个位置, 最坏情况有25次查询; 常数级别内, 这个算法的速度并不快;

这个题目的tag里面给的提示有trie, 所以说能够设计一个trie的解法让search更快?

想其实还是想写一个trie的做法, 不过实际上完全没有把握说这个trie的解法就比上面的笨办法要好一些, 不过还是写写看;

这题后面吃了一个亏就是审题不够仔细, 这里的所有的不是限定在小写字母的范围内的, 所以用array来存links就不如用Map好了; //是我自己看错了, 这题实际上只要考虑小写字母就行了;

超时了, 不过最后吭哧吭哧的还是做出来了, 速度很一般, 115ms (52%);

既然是用trie的做法, 那么一个需要考虑的地方就是最后你比如类似search的时候, 你应该怎么traverse? 其实这题可以用recursion来算, 然后参数里保存一个类似diff_count的东西, 来记录可以容忍的错误字母的个数; 不过这里我最后还是选择了用循环来做; 最后写起来不得不说其实还是挺有意思的; 很有点当时写Pintos的时候的感觉; 而且这题还锻炼了一下我控制multiple break的能力, 这个题目里面用index和flag来控制multiple break的例子都有, 看起来很舒服;

另外, 这个traverse本身最后的逻辑也有一点弯弯绕, 最后都被几个test给揪出来了, 只能说Google还是你家的Google;

另外, 关于iteration的wrong digit的位置的控制我写的其实也很简单, 其实就是对于每一个位置分别允许error, 然后继续向下search; 感觉这个结构不是特别的快, 但是应该还算ok;

putIfAbsent vs. computeIfAbsent

实际上写的时候, 因为碰巧换成了Map, 正好还见识了一次这两个函数的区别; 之前看awice的答案的时候, 他用过putIfAbsent这个函数, 但是当时没有仔细看, 我这里一开始写的时候用了cur = cur.links.putIfAbsent (c)这样的结构, 但是这个实际上是不对的; awice当时的做法是putIfAbsent之后, 立刻还要跟一个get. 究其原因, 其实是因为putIfAbsent的返回值其实是跟put类似的, 也就是说, 成功的时候, 是返回Null的!

这一点, computeIfAbsent这个函数就不一样, 直接把对应的value返回给你, 而不是类似上面那种反向返回的做法; 应用这个函数, 我的更新可以用cur = cur.links.computeIfAbsent (c, key -> new TrieNode ())这么一行就写掉;

后来我尝试把TrieNode里面的Map改成array, 奇怪的是速度几乎没有得到改善;

我觉得我这个trie算法的一个问题还是没有做到将get的cost转嫁给put, 这个也是下面的editorial采用的策略;

事实上, 在很多数据结构设计的题目里面的核心难点就是怎么完成这个转嫁的过程;

另外, 关于这道题目也是给自己提一个醒: 做题的时候不要给自己要求定的太高; 我刚开始看到这个题目的时候, 一个很快的想法是这个题目的最优解法肯定是一个通过一个聪明的trie的实现来完成的; 也就是说用trie做, 但是trie里面存的东西很有讲究. 这样一个思路方向听上去确实是很厉害的, 但是实际上最后做出来的答案, 无论是我自己的, 还是editorial的, 完全没有做到这种鬼斧神工的操作; 所以做题目, 还是心态放低一点, 不要对题目的期待太高, 以为就一定有什么巧夺天工的解法; 还是老老实实从笨办法开始优化找思路, 这个才是最牢靠的思考方式;

editorial

Approach #1: Brute Force with Bucket-By-Length [Accepted]

Intuition and Algorithm

Call two strings neighbors if exactly one character can be changed in one to make the strings equal (ie. their hamming distance is 1.)

Strings can only be neighbors if their lengths are equal. When searching a new word, let's check only the words that are the same length.

class MagicDictionary {  
    Map<Integer, ArrayList<String>> buckets;  
    public MagicDictionary() {  
        buckets = new HashMap();  
    }  

    public void buildDict(String[] words) {  
        for (String word: words) {  
            buckets.computeIfAbsent(word.length(), x -> new ArrayList()).add(word);  
        }  
    }  

    public boolean search(String word) {  
        if (!buckets.containsKey(word.length())) return false;  
        for (String candidate: buckets.get(word.length())) {  
            int mismatch = 0;  
            for (int i = 0; i < word.length(); ++i) {  
                if (word.charAt(i) != candidate.charAt(i)) {  
                    if (++mismatch > 1) break;  
                }  
            }  
            if (mismatch == 1) return true;  
        }  
        return false;  
    }  
}

大概思考了一下, 这个看似简单的算法实际上速度应该很不错;

而且这个算法本身并不是特别的naive, 我刚开始自己想的笨办法比这个差很多, 是比如拿到一个要search的word, 我是自己针对每一个位置进行mutate (WorstCase 25次), 然后直接HashMap.containsKey这样来检查; 这样就非常的慢, 因为不仅算法很蠢, 而且还额外吃了string操作的cost;

但是他这个算法就是在我的笨办法的基础上进行的优化; 他的优化的一个重要步骤, 就是对于memory organization的优化; 当我们在记录的东西进行了bucket分组之后, 我们就不需要自己mutate出来需要search contains的word, 而是直接可以搜索;

不过这个也有一个问题, 如果是我自己的那个笨办法, 其实是25L^2的速度(L是word的长度), 因为后面的HashMap的contains操作其实是O(1)的, 但是他这个是遍历一个bucket里面的, WorstCase是O(N), 所以最后其实很难说他这个算法就比我的算法要好? 不过这个想法本身还是很有意思的;

Approach #2: Generalized Neighbors [Accepted]

Intuition

Recall in Approach #1 that two words are neighbors if exactly one character can be changed in one word to make the strings equal.

Let's say a word 'apple' has generalized neighbors 'pple', 'aple', 'aple', 'appe', and 'appl*'. When searching for whether a word like 'apply' has a neighbor like 'apple', we only need to know whether they have a common generalized neighbor.

Algorithm

Continuing the above thinking, one issue is that 'apply' is not a neighbor with itself, yet it has the same generalized neighbor 'pply'. To remedy this, we'll count how many sources generated 'pply'. If there are 2 or more, then one of them won't be 'apply'. If there is exactly one, we should check that it wasn't 'apply'. In either case, we can be sure that there was some magic word generating '*pply' that wasn't 'apply'.

public class MagicDictionary {  
    Set<String> words;  
    Map<String, Integer> count;  

    public MagicDictionary() {  
        words = new HashSet();  
        count = new HashMap();  
    }  

    private ArrayList<String> generalizedNeighbors(String word) {  
        ArrayList<String> ans = new ArrayList();  
        char[] ca = word.toCharArray();  
        for (int i = 0; i < word.length(); ++i) {  
            char letter = ca[i];  
            ca[i] = '*';  
            String magic = new String(ca);  
            ans.add(magic);  
            ca[i] = letter;  
        }  
        return ans;  
    }  

    public void buildDict(String[] words) {  
        for (String word: words) {  
            this.words.add(word);  
            for (String nei: generalizedNeighbors(word)) {  
                count.put(nei, count.getOrDefault(nei, 0) + 1);  
            }  
        }  
    }  

    public boolean search(String word) {  
        for (String nei: generalizedNeighbors(word)) {  
            int c = count.getOrDefault(nei, 0);  
            if (c > 1 || c == 1 && !words.contains(word)) return true;  
        }  
        return false;  
    }  
}

pattern and counts

总体来说这个算法不难, 但是很有意思, 人工记录所有的pattern; 当然了, 因为pattern这个东西过于general, 所以他还要为每一个pattern记录count, 然后所有触发了count的word本身都要记录在words里面; 这个算法总体来说还是很有特点的, 我认为这个是一个拓展到其他题目里面的思路; 不算是很先进的算法, 但是有时候临时叫你想你也真未必就想的出来;

awice这题就给了这两个做法, 没有给出trie的做法;

这个是discussion最优解, 和editorial2很类似:

@shawngao said in Easy 14 lines Java solution, HashMap:

For each word in dict, for each char, remove the char and put the rest of the word as key, a pair of index of the removed char and the char as part of value list into a map. e.g.
"hello" -> {"ello":[[0, 'h']], "hllo":[[1, 'e']], "helo":[[2, 'l'],[3, 'l']], "hell":[[4, 'o']]}

During search, generate the keys as in step 1. When we see there's pair of same index but different char in the value array, we know the answer is true. e.g.
"healo" when remove a, key is "helo" and there is a pair [2, 'l'] which has same index but different char. Then the answer is true;

class MagicDictionary {  

    Map<String, List<int[]>> map = new HashMap<>();  
    public MagicDictionary() {  
    }  

    public void buildDict(String[] dict) {  
        for (String s : dict) {  
            for (int i = 0; i < s.length(); i++) {  
                String key = s.substring(0, i) + s.substring(i + 1);  
                int[] pair = new int[] {i, s.charAt(i)};  

                List<int[]> val = map.getOrDefault(key, new ArrayList<int[]>());  
                val.add(pair);  

                map.put(key, val);  
            }  
        }  
    }  

    public boolean search(String word) {  
        for (int i = 0; i < word.length(); i++) {  
            String key = word.substring(0, i) + word.substring(i + 1);  
            if (map.containsKey(key)) {  
                for (int[] pair : map.get(key)) {  
                    if (pair[0] == i && pair[1] != word.charAt(i)) return true;  
                }  
            }  
        }  
        return false;  
    }  
}

总体来说跟editorial2的思路很像, 就是理解了store pattern这个关键的思路, 只不过处理的方式不一样; 严格来说他这里的处理方式比awice的方式更加general一些, 因为储存的信息更多; 但是也因为这个over generalize, 他这个最后的速度是不如awice的解法的: awice的解法大部分时候, 拿到一个pattern之后, 只要检查一下count, 少数情况下才要检查pattern背后对应的word, 而这个作者的写法, 始终是完成一个traverse words of a pattern的操作;

在这个基础上, 有人提出了改进:

@Yang_BU said in Easy 14 lines Java solution, HashMap:

@mzchen Thanks. This method has to be non-repetitive words to build a dictionary.

    Map<String, Character> map;    
    public MagicDictionary() {  
        map = new HashMap<>();  
    }   
    public void buildDict(String[] dict) {  
        for (String d : dict) {  
            StringBuilder sb = new StringBuilder(d);  
            for (int i = 0; i < sb.length(); i++) {   
                sb.setCharAt(i, '*');  
                map.put(sb.toString(), map.containsKey(sb.toString())? '*' : d.charAt(i));  
                sb.setCharAt(i, d.charAt(i));  
            }       
        }  
    }    
    public boolean search(String word) {  
        StringBuilder sb = new StringBuilder(word);  
        for (int i = 0; i < sb.length(); i++) {  
            sb.setCharAt(i, '*');  
            if (map.containsKey(sb.toString()) && word.charAt(i) != map.get(sb.toString())) return true;  
            sb.setCharAt(i, word.charAt(i));  
        }  
        return false;  
    }

这个就是awice处理pattern的方式, 不过细节上有些不同, 这个人的这个解法严格上来说还是跟@shawngao的解法更加相似, 区别则在于, 他把@shawngao放在一个entry里面的东西放到了很多个Map entry里面去, 这样实际上最后是对速度有帮助的;

看了一下, submission里面的trie解法大部分跟我的速度都差不多, 波动差别而已;

Problem Description

Implement a magic directory with buildDict, and search methods.

For the method buildDict, you'll be given a list of non-repetitive words to build a dictionary.

For the method search, you'll be given a word, and judge whether if you modify exactly one character into another character in this word, the modified word is in the dictionary you just built.

Example 1:

Input: buildDict(["hello", "leetcode"]), Output: Null  
Input: search("hello"), Output: False  
Input: search("hhllo"), Output: True  
Input: search("hell"), Output: False  
Input: search("leetcoded"), Output: False

Note:

You may assume that all the inputs are consist of lowercase letters a-z.
For contest purpose, the test data is rather small by now. You could think about highly efficient algorithm after the contest.
Please remember to RESET your class variables declared in class MagicDictionary, as static/class variables are persisted across multiple test cases. Please see here for more details.

Difficulty:Medium
Total Accepted:8.4K
Total Submissions:16.9K
Contributor:今が最高
Companies
google
Related Topics
hash tabletrie
Similar Questions
Implement Trie (Prefix Tree)Longest Word in Dictionary

LC676 ImplementMagicDictionary