GeneralizedAbbreviation [source code]


public class GeneralizedAbbreviation {
static
/******************************************************************************/
class Solution {
    public List<String> generateAbbreviations(String word) {
        List<String> res = new ArrayList<>();
        helper(res, 0, new StringBuilder(), word, 0);
        return res;
    }

    public void helper(List<String> ls, int start, StringBuilder path, String s, int count) {
        int len = path.length();
        if (start == s.length()) {
            if (count > 0)
                path.append(count);
            ls.add(path.toString());
        } else {
            helper(ls, start + 1, path, s, count + 1);
            if (count > 0)
                path.append(count);
            path.append(s.charAt(start));
            helper(ls, start + 1, path, s, 0);
        }
        path.setLength(len);
    }
}
/******************************************************************************/

    public static void main(String[] args) {
        GeneralizedAbbreviation.Solution tester = new GeneralizedAbbreviation.Solution();
    }
}

咋一看, 这个问题是明显有一个tree结构在里面的, 而且好像有点类似之前的加法拆分的思路?

但是加法拆分思路有一个难题好像不好解决, 比如"word", 如果用加法拆分思路来做, 我们可能会做出w11d这样的, 但是这个在题目里面是不允许的, 因为这里两个数字连起来了;

这个题目刚开始的时候感觉是能做出来, 然后尝试用加法拆分的方式, 把"word"给拆分开来, 做了一个tree结构(为了避免duplicate, 一个做法是可以保证在每一个level得到的两半当中, 只有一半进行继续拆分, 另外一半直接保留; 另外, 我们之前有一个加法拆分的题目的时候提到, 如果要消除duplicate, 另外一个思路就是break symmetry, 不过那个题目好像是允许一个break symmetry的操作, 这个题目想了一下好像没有, 也不需要: 不对, 未必不需要: "w|(or|ord)" and "(w|or)|ord");

但是即使这样, 这个traversal结构好像还是不太好直接和题目要求的东西直接套起来;

反正设计backtracking算法, 或者大部分的recursion算法, 一个很好的思路就是先把tree traversal画出来, 你想要一个什么样的DFS过程; 这个题目我的tree traversal画出来了一半, 欠缺的一半是, 怎么处理这个letter to digit的翻译过程, 想了半天没有想到什么很好的办法, 放弃了;

最后这个代码是根据editorial1的写法来写的, 速度是15ms (85%). 刚开始写的时候还有一个错误, 就是这个undo by set length的操作忘记涵盖最上面的leaf base case了; 事实上这个undo的操作是无论当前的iteration是base case还是recursive case, 都要发生的;

仔细想想这个问题, 之所以最后没有做出来, 其实还是在加法拆分这个思路上吊死了; 最后看完了discussion实际上也没有发现类似加法拆分的这种思路; 当然这里的这个backtracking做法也是一个BST, 但是并不是加法拆分那样的BST;

纯粹思考加法拆分的思路的问题是, 好像虽然想到了tree本身什么样, 但是实际上你并没有想到; 你想到的tree只涵盖了split操作, 而没有涵盖abbreviate操作;

要说这个失败有什么教训, 我暂时也说不好, 毕竟我本来这种类比加法拆分来做backtracking的思路也不能说就是大错, 之前有一个题目就是这样最后想到了一个解法的了; TODO: figure out what's wrong;


这个是这个题目的editorial, 写的非常好: https://leetcode.com/problems/generalized-abbreviation/solution/

首先, 他认为abbreviation的个数应该是2^N; 这个计算方法是因为思考针对某一个letter, 是否将其转化为digit; 这个思考方式也暗示了一个他这里思考这个问题的思路: 他们用的不是我的加法拆分的思路, 而是一个letter一个letter的处理, 转换为1之后, 再accumulate这些数字;

editorial1:

The backtracking algorithm enumerates a set of partial candidates that, in principle, could be completed in several choices to give all the possible solutions to the problem. The completion is done incrementally, by extending the candidate in many steps. Abstractly, the partial candidates can be seen as nodes of a tree, the potential search tree. Each partial candidate is the parent of the candidates that derives from it by an extension step; the leaves of the tree are the partial candidates that cannot be extended any further.

In our problem, the partial candidates are incomplete abbreviations that can be extended by one of the two choices:

  1. keep the next character;
  2. abbreviate the next character.
    We extend the potential candidate in a depth-first manner. We backtrack when we reach a leaf node in the search tree. All the leaves in the search tree are valid abbreviations and shall be put into a shared list which will be returned at the end.

可以看到, 他这个算法跟我之前的思考过程还是有一点共同之处的: 要思考

  • tree是什么
  • node是什么
    • furthermore, 用root level recursion的思路, 来分析parent关系: 一个node应该做什么样的recursive calls, 以及对应的, 这个node的children应该是什么样的node;

另外, 我自己尝试用加法拆分的思路来做这个问题的时候, 记得以前我们总结backtracking的种类:

  • pick
  • decide
  • arrange
    我的加法拆分的思路对应的应该是arrange的类型; 但是他这里这个做法, 你可以看他写的那个keep/abbreviate的两个case代表两个recursive calls, 其实就是对应的decision类型: 前面稍微提到过, decision的backtracking很多时候最后的tree看起来很想BST, 当然这个还没有确定, 未必就一定是对的;
public class Solution {  
    public List<String> generateAbbreviations(String word){  
        List<String> ans = new ArrayList<String>();  
        backtrack(ans, new StringBuilder(), word, 0, 0);  
        return ans;  
    }  

    // i is the current position  
    // k is the count of consecutive abbreviated characters  
    private void backtrack(List<String> ans, StringBuilder builder, String word, int i, int k){  
        int len = builder.length(); // keep the length of builder  
        if(i == word.length()){  
            if (k != 0) builder.append(k); // append the last k if non zero  
            ans.add(builder.toString());  
        } else {  
            // the branch that word.charAt(i) is abbreviated  
            backtrack(ans, builder, word, i + 1, k + 1);  

            // the branch that word.charAt(i) is kept  
            if (k != 0) builder.append(k);  
            builder.append(word.charAt(i));  
            backtrack(ans, builder, word, i + 1, 0);  
        }  
        builder.setLength(len); // reset builder to the original state  
    }  
}

这个backtracking写的是非常简练的; 注意他recursion函数写的结构, 上面的是base case, 也就是在leaf的位置才整个处理(这个是很多DFS问题当中的共性: 关键位置在leaf位置); 另外注意这里的leaf位置判断的是len而不是len-1: 超界了才收尾, 而不是到了end就开始收尾了;

另外这里的builder的作用其实就是一个path accum的作用, 这个东西也是很常见了, 也可以说是tag, 不过tag并不如path accum贴切; 另外注意node和参数表的对应性, 这个也是老生常谈, 这里的root call, 给的builder就是一个"", 所以这个tree最后的root的accum是"";

最后还一个就是最后一行用setLength完成的一个undo过程; 这个在builder类的backtracking问题里面见过好几次了;

最后这个复杂度是N2^N: 2^N的tree, leaf位置每一个贡献N, 最后是N2^N;

空间复杂度是O(N), 因为我们其实只需要两部分的空间:

  • builder, 这个是O(N);
  • recursion stack;

In a recursive program, the space of system stack is linear to the maximum recursion depth which is n in our problem.

不要认为只要是recursion就要维护整个tree, 没有这么夸张;

他这个方法我刚开始看完他的思路之后, 一个想法是直接把决定了要abbreviate的char就直接转化成1, 然后最后leaf的位置的时候专门扫一遍, 把相邻的1全部collapse到一起就行了; 但是他这里这个collapse的过程直接在recursion的同时就进行, 应该说是一个更好的思路, 用的就是一个consecutive count的参数, 作为一个保存历史信息的作用;

可以看到这整个backtracking的过程是非常的有条不紊的; 另外, 在backtracking当中一个重要的问题就是no-repetition: remove duplicates; 上面editorial1的做法, 因为有一个指针从左到右的扫描, 而且是decision结构, 所以最后是肯定不会有重复的;


editorial2是一个很类似上面的问题的做法, 有一个intuition:

If we use 0 to represent a character that is not abbreviated and 1 to represent one that is. Then each abbreviation is mapped to an n bit binary number and vice versa.

上面的backtracking我们最后其实所做的就是针对每一个bit, 我们有一个decision, keep or abbreviate, 这样的一个decision是一个binary的decision; 我们最后要做的就是针对所有的N digits, 找出所有的可能的decision的组合; 找到这个组合的一个办法是他们观察到, 如果我们给每一个位置一个boolean来代表是否abbreviate, 我们最后对于每一个result, 我们都能得到一个唯一的N-bits binary number; 所以只要我们最后能够保证iterate这些binary的时候没有重复(no-repetition is always important), 那么我们也可以保证最后得到的abbreviation没有重复;

public class Solution {  
    public List<String> generateAbbreviations(String word) {  
        List<String> ans = new ArrayList<>();  
        for (int x = 0; x < (1 << word.length()); ++x) // loop through all possible x  
            ans.add(abbr(word, x));  
        return ans;  
    }  

    // build the abbreviation for word from number x  
    private String abbr(String word, int x) {  
        StringBuilder builder = new StringBuilder();  
        int k = 0, n = word.length(); // k is the count of consecutive ones in x  
        for (int i = 0; i < n; ++i, x >>= 1) {  
            if ((x & 1) == 0) { // bit is zero, we keep word.charAt(i)  
                if (k != 0) { // we have abbreviated k characters  
                    builder.append(k);  
                    k = 0; // reset the counter k  
                }  
                builder.append(word.charAt(i));  
            }  
            else // bit is one, increase k  
                ++k;  
        }  
        if (k != 0) builder.append(k); //don't forget to append the last k if non zero  
        return builder.toString();  
    }  
}

他这里这个abbr函数其实就是一个我上面一个解法的时候准备做的collapse的过程; 具体的做法也很简单, 整个算法本身跟一个分段算法非常类似, 用k作为一个计算run的长度的count;

这个算法的核心在主函数上面, 可以看到他就是iterate整个0..2^N-1这么多的binary, 这个是一个binary mask的技巧, 适合在decision类backtracking当中完成一个类似穷搜的过程; 这个技巧在前面有一个题目的时候也见到过, 不过那个时候这个算法给出的解法属于比较差, 但是在这一题就显得还不错了;

最后这个算法的复杂度是跟上面的算法一样的;


discussion最优解, 跟editorial1其实是一个意思, 但是代码本身更简洁:

@soymuybien said in Java backtracking solution:

The idea is: for every character, we can keep it or abbreviate it. To keep it, we add it to the current solution and carry on backtracking. To abbreviate it, we omit it in the current solution, but increment the count, which indicates how many characters have we abbreviated. When we reach the end or need to put a character in the current solution, and count is bigger than zero, we add the number into the solution.

   public List<String> generateAbbreviations(String word){  
         List<String> ret = new ArrayList<String>();  
         backtrack(ret, word, 0, "", 0);  

         return ret;  
     }  

     private void backtrack(List<String> ret, String word, int pos, String cur, int count){  
         if(pos==word.length()){  
             if(count > 0) cur += count;  
             ret.add(cur);  
         }  
         else{  
             backtrack(ret, word, pos + 1, cur, count + 1);  
             backtrack(ret, word, pos+1, cur + (count>0 ? count : "") + word.charAt(pos), 0);  
         }  
     }

注意因为他用的是immutable的string而不是StringBuilder, 所以undo操作就自动被省略了, 当然这个也是有代价的, 见仁见智了;

另外一个差不多的:

@yavinci said in Java 14ms beats 100%:

For each char c[i], either abbreviate it or not.

  1. Abbreviate: count accumulate num of abbreviating chars, but don't append it yet.
  2. Not Abbreviate: append accumulated num as well as current char c[i].
  3. In the end append remaining num.
  4. Using StringBuilder can decrease 36.4% time.

This comes to the pattern I find powerful:

int len = path.length(); // decision point  
... backtracking logic ...  
path.setLength(len);     // reset to decision point  

Similarly, check out [remove parentheses][1] and [add operators][2].


     public List<String> generateAbbreviations(String word) {  
         List<String> res = new ArrayList<>();  
         DFS(res, new StringBuilder(), word.toCharArray(), 0, 0);  
         return res;  
     }  

     public void DFS(List<String> res, StringBuilder path, char[] c, int i, int num) {  
         int len = path.length();    
         if(i == c.length) {  
             if(num != 0) path.append(num);  
             res.add(path.toString());  
         } else {  
             DFS(res, path, c, i + 1, num + 1);               // abbr c[i]  

             if(num != 0) path.append(num);                   // not abbr c[i]  
             DFS(res, path.append(c[i]), c, i + 1, 0);          
         }  
         path.setLength(len);   
     }

[1]: https://leetcode.com/discuss/72208/easiest-9ms-java-solution

[2]: https://leetcode.com/discuss/75308/java-simple-and-fast-solution-beats-96-56%25

这个是一个无聊的"造轮子"式的继续加速优化:

@TWiStErRob said in Java 14ms beats 100%:

For fun you can go even faster if you use a char[] path + an int pos:

  • path's size cannot be bigger than word.length()
  • you don't need setLength because pos is a local variable,
    down the stack it has smaller values
  • append(char) becomes path[pos++] = c[i];
  • and you could write your own "append(int)" which is much simpler because 0 < num < 32
    (List cannot hold more than 2^31 elements):

     if (num >= 10) path[pos++] = '0' + num / 10;  
     if (num > 0) path[pos++] = '0' + num % 10;  
    

随便看看就好;


这个是另外一种全新的思路, 可以说是DP, 也可以说是divide&conquer:

public class Solution {  
    public List<String> generateAbbreviations(String word) {  
        List<String> res = new ArrayList<String>();  
        int len = word.length();  
        res.add(len==0 ? "" : String.valueOf(len));  
        for(int i = 0 ; i < len ; i++)  
            for(String right : generateAbbreviations(word.substring(i+1))){  
                String leftNum = i > 0 ? String.valueOf(i) : "";  
                res.add( leftNum + word.substring(i,i + 1) + right );  
            }  
        return res;  
    }  
}

前面看到的几个DFS做法, 几乎全都是Side Effect类型的recursion: 这个也是一个惯例了: 如果想用recursion实现DFS, 那么一般直接用Side Effect式的写法来完成traversal是最常见的做法; 如果是divide&conquer(包括DP思路), 那么一般是设计让recursion有返回值比较多;

这里的recursion的设计就是有返回值, 最后也带来了两个优势:

  • 不需要helper
  • 可以做memo了: 这个以前是总结过的, void的recursion是非常不好做memo的;

看循环的时候如果不知道这个Loop Var i是用来干什么的: again, 直接在body里面找i是怎么用的就行了;

整个算法的思路还是很简单的, 就很类似大部分的OCaml程序的思路, recursion返上来, 只需要思考当前level的head怎么处理就行了;

他这个算法如果不加memo的话, 复杂度是非常可怕的, 整个recursion的深度是N, 每一个level的fanning是两层循环嵌套!

尽管看起来是一个很直白的OCaml思路, 但是真正设计这个算法的时候的一些思路还是要明确: i的作用上面已经讲了; 对于某一个string, 长度为N的给你, 他的做法是, 首先, 我们有一个"N", 这个是肯定有的; 下面就是怎么用一个recursive的结构来处理剩下的字母了: 这里他用了一个类似Generate Parens那个题目里面的思路: 在那个题目里面, 我们认为无论如何arrange, 肯定有一组"("和")", 我们最后需要做的决定就是考虑哪些放在这一组括号的里面, 哪些放在外面; 这个题目里面, 当我们排除了"N"这个result之后, 我们可以确定, 剩下的result for length N肯定至少含有一个字母, 所以我们只要考虑这个字母在哪个位置就行了; 这里的i这个index就是这个守门员字母的位置; 然后为了避免重复, 他这里又impose了一个rule, 就是他规定守门员的左边全部abbreviate, 然后右边正常的recurse;

为什么这样的一个rule就可以避免重复? 刚开始我也没有想通, 但是大概纸上画一画:

a b c d e f  
1 b  
  2 c  
    3 d  
      4 4  
        5 f

可以看到, 这样, 每一个iteration之间最后产生的result string肯定不会有重复! 它们的开头就不一样;

这个是我在帖子下面贴的一个解释:

a b c d e f  
1 b + recurse on rest  
  2 c + recurse on rest  
    3 d + recurse on rest  
      4 e + recurse on rest   
        5 f + recurse on rest

For the case of 1b + recurse(cdef), if recurse(cdef) returns "1d1f", then we just have to append "1b" to the head of that subsolution.

Duplicate is removed obviously as you can see: no two iterations shares the same headings.

Of course you also have to include solutions like "6" for "abcdef" but that can be handled as a trivial base case. Once that is handled, you can rest assured that there will be at least one letter in the other results. The loop iterate on this survivor letter's positin (i) and do the recursion as shown above.

另外这个守门员的思路, 有两点启发:

  • 这个还是符合我们以前总结过的一个enumerate for additional assumptions的一个总结. 因为我们先把"N"这样的enumerate掉了, 所以我们才可以确定"至少有一个守门员"这样的条件;
  • 这个守门员而且保证了不会出现"w1"+"1d"这样组合了之后还需要进一步collapse的问题;

因为这个算法的recursion的参数只有一个string, 所以加memo也是非常的简单:

@piyush121 said in 9 line easy JAVA solution:

Here is the Memoized version of this method which significantly improves its runtime. I also realized that it is similar to String permutation problem.
I don't know whether this kind of solution is acceptable in a real interview.

 public class Solution {  
    Map<String, List<String>> map = new HashMap<>();  
    public List<String> generateAbbreviations(String word) {  
          List<String> res = new ArrayList<>();  
          if(word.length() == 0 ) {  
              res.add("");  
              return res;  
          }  
          if(map.containsKey(word))  
              return map.get(word);  
          res.add(String.valueOf(word.length()));      
          for(int i = 0; i < word.length(); i++) {  
              String s = word.substring(i + 1);  
              String left = i == 0 ? "" : "" + i;  
              String ch = "" + word.charAt(i);  
              for(String right : generateAbbreviations(s)) {  
                  res.add(left + ch + right);  
              }  
          }  
          map.put(word, res);  
          return res;  
      }      
}

这个人真实在面试里面碰到了这一题:

@skylinebaby said in Meet in Google Interview Solution with concise explanation.:

I meet this problem in Google Interview. However, I didn't solve it at that time because I was totally out of mind when I meet with this problem. The interviewer didn't say much about the output and he first ask me how many abbreviation are there with a given word length of n. It took me a long time to guess it was 2^n.
The following solution is what based on the idea that each position has the chance to be abbreviated to 1 or not with a recursion function call. It is quite similar to the question of SubsetsII.

 public class Solution {  
     public List<String> generateAbbreviations(String word) {  
         List<String> res = new LinkedList<>();  
         recurse(res, word, 0);  
         return res;  
     }  
     private void recurse(List<String> res, String word, int pos){  
         if(pos==word.length()){   
           res.add(word);  
           return;  
           }  
         // The current position does not abbreviate to 1 and call the recursion with the next position   

         recurse(res, word, pos+1);  
         String nstring = word.substring(0,pos)+"1"+word.substring(pos+1);  

       // Abbreviate the current position and we have to check the position prior to this position.  
        If the position prior to this position is a number, we have to combine them together.   
       But there is still a little tricky to deal with the output because if the combined output is   
       those 9, 99, 999, then the next position should be pos+1 with recursion call. If not,  
      the next position should remain the same pos.   

         if(pos>0 && Character.isDigit(word.charAt(pos-1))){  
             int count = 0;  

            //count the prior characters which is digits and we should combine them with 1   

             while((pos-count-1)>=0 && Character.isDigit(word.charAt(pos-count-1))){  
                 count++;  
             }  
             int num = Integer.parseInt(word.substring(pos-count, pos));  
             num = num+1;  
             String nnum = num+"";  
             if(nnum.length()>count){  
                 nstring = word.substring(0, pos-count)+nnum+word.substring(pos+1);  
                 recurse(res, nstring, pos+1);  
             }else{  
                 nstring = word.substring(0, pos-count)+nnum+word.substring(pos+1);  
                 recurse(res, nstring, pos);  
             }  
         }else{  
             recurse(res, nstring, pos+1);  
         }  
     }  
 }

这个人的代码还是比较乱的, 而且他没有意识到用consecutive count来处理数字collapse的问题, 而是在每一个iteration还要专门回头去看, 这个就很姜了; 其他的部分其实都差不多;

如果我自己面试的时候碰到这一题, 好像我其实也没有把握能够立刻回答出来2^N这个数字;


这个是另外一个backtracking:

public class Solution {  
    public List<String> generateAbbreviations(String word) {  
        List<String> res = new ArrayList<String>();  
        helper(word, 0, "", res, false);  
        return res;  
    }  
    // isAbbrPrev: false, we can add alphabet or abbreviation(number) next round  
    // isAbbrPrev: true,  we can add alphabet ONLY next round  
    public void helper(String word, int start, String cur, List<String> res, boolean isAbbrPrev) {  
        if (start == word.length()) {  
            res.add(cur);  
            return;  
        }  
        if (isAbbrPrev == false) { // we can add abbreviation (num)  
            for(int end=start+1; end<=word.length(); end++) { // iterate all abbreviations from 'start'  
                helper(word, end, cur + Integer.toString(end-start), res, true);  
            }  
        }  
        helper(word, start+1, cur + word.charAt(start), res, false); // adding one word each time  
    }  
}

这个的思路其实不是上面说过的decision思路, 而是类似上面的divide&conquer的思路; 我再想了一下, 这他妈就是一个divide&conquer吧? 只不过是用recursion而非iteration来写而已;

这个是他自己写的iteration版本的divide&conquer:

public class Solution {  
    public List<String> generateAbbreviations(String word) {  
        Set<String> s = helper(word);  
        List<String> res = new ArrayList<String>(s);  
        return res;  
    }  

    public Set<String> helper(String word) {  
        int length = word.length();  
        Set<String> res = new HashSet<String>();  
        if (length == 0) {  
            res.add("");  
            return res;  
        }  
        res.add(Integer.toString(length));  
        for(int i=0; i<length; i++) {    // we separate String into two parts with word.charAt(i)  
            Set<String> left = helper(word.substring(0,i));  
            Set<String> right = helper(word.substring(i+1, length));  
            for(String strLeft : left) {  
                for(String strRight : right) {  
                    res.add(strLeft + word.charAt(i) + strRight);  
                }  
            }  
        }  
        return res;  
    }  
}

Problem Description

Write a function to generate the generalized abbreviations of a word.

Example:

Given word = "word", return the following list (order does not matter):  
["word", "1ord", "w1rd", "wo1d", "wor1", "2rd", "w2d", "wo2", "1o1d", "1or1", "w1r1", "1o2", "2r1", "3d", "w3", "4"]

Difficulty:Medium
Total Accepted:22.9K
Total Submissions:51.1K
Contributor: LeetCode
Companies
google
Related Topics
backtracking bit manipulation
Similar Questions
Subsets Unique Word Abbreviation Minimum Unique Word Abbreviation

results matching ""

    No results matching ""