GroupAnagrams [source code]


public class GroupAnagrams {
static
/******************************************************************************/
class Solution {
    public List<List<String>> groupAnagrams(String[] strs) {
        Map<String, List<String>> map = new HashMap<> ();
        for (String s : strs) {
            String hash = hash (s);
            map.computeIfAbsent (hash, key -> new ArrayList<> ()).add (s);
        }
        return new ArrayList<> (map.values ());
    }

    String hash (String s) {
        int[] counts = new int[26];
        char[] chs = s.toCharArray ();
        for (char c : chs)
            counts[c - 'a']++;
        StringBuilder sb = new StringBuilder ();
        for (int count : counts)
            sb.append (count + "");
        return sb.toString ();
    }
}
/******************************************************************************/

    public static void main(String[] args) {
        GroupAnagrams.Solution tester = new GroupAnagrams.Solution();
    }
}

group的题目, 感觉除了用HashTable也没有其他什么特别好的思路了;

大概有点思路, 核心就是计算一个hash, 直接开写; 另外, 写的过程当中发现, 这题好像用integer hash不行, 最后得到的数字肯定能够溢出掉int, 就算用的base是10; 而且10本身其实还未必够大, base其实最好还是用string的长度; 所以最后干脆直接用string hash来写算了;

最后很轻松的写出来一次AC, 但是速度是105ms (2%), 这个就有点不对头了, 这题应该是漏掉了点什么东西;

editorial

Approach #1: Categorize by Sorted String [Accepted]

Intuition

Two strings are anagrams if and only if their sorted strings are equal.

Algorithm

Maintain a map ans : {String -> List} where each key
K is a sorted string, and each value is the list of strings from the initial input that when sorted, are equal to
K.

In Java, we will store the key as a string, eg. code. In Python, we will store the key as a hashable tuple, eg. ('c', 'o', 'd', 'e').

class Solution {  
    public List<List<String>> groupAnagrams(String[] strs) {  
        if (strs.length == 0) return new ArrayList();  
        Map<String, List> ans = new HashMap<String, List>();  
        for (String s : strs) {  
            char[] ca = s.toCharArray();  
            Arrays.sort(ca);  
            String key = String.valueOf(ca);  
            if (!ans.containsKey(key)) ans.put(key, new ArrayList());  
            ans.get(key).add(s);  
        }  
        return new ArrayList(ans.values());  
    }  
}

很直接的做法;

Approach #2: Categorize by Count [Accepted]

Intuition

Two strings are anagrams if and only if their character counts (respective number of occurrences of each character) are the same.

Algorithm

We can transform each string
s into a character count,
count, consisting of 26 non-negative integers representing the number of
a's,
b's,
c's, etc. We use these counts as the basis for our hash map.

In Java, the hashable representation of our count will be a string delimited with '#' characters. For example, abbccc will be #1#2#3#0#0#0...#0 where there are 26 entries total. In python, the representation will be a tuple of the counts. For example, abbccc will be (1, 2, 3, 0, 0, ..., 0), where again there are 26 entries total.

class Solution {  
    public List<List<String>> groupAnagrams(String[] strs) {  
        if (strs.length == 0) return new ArrayList();  
        Map<String, List> ans = new HashMap<String, List>();  
        int[] count = new int[26];  
        for (String s : strs) {  
            Arrays.fill(count, 0);  
            for (char c : s.toCharArray()) count[c - 'a']++;  

            StringBuilder sb = new StringBuilder("");  
            for (int i = 0; i < 26; i++) {  
                sb.append('#');  
                sb.append(count[i]);  
            }  
            String key = sb.toString();  
            if (!ans.containsKey(key)) ans.put(key, new ArrayList());  
            ans.get(key).add(s);  
        }  
        return new ArrayList(ans.values());  
    }  
}

这个就是我的做法啊, 为什么我的这么慢;

哦, 他这个也很慢, 那心理平衡了;

@wz366 said in Share my short JAVA solution:

public class Solution {  
    public List<List<String>> groupAnagrams(String[] strs) {  
        if (strs == null || strs.length == 0) return new ArrayList<List<String>>();  
        Map<String, List<String>> map = new HashMap<String, List<String>>();  
        for (String s : strs) {  
            char[] ca = s.toCharArray();  
            Arrays.sort(ca);  
            String keyStr = String.valueOf(ca);  
            if (!map.containsKey(keyStr)) map.put(keyStr, new ArrayList<String>());  
            map.get(keyStr).add(s);  
        }  
        return new ArrayList<List<String>>(map.values());  
    }  
}

没有新鲜感的解法; 不过sort之后的string直接作为hash, 这个点子还是值得学习一下的;

底下好像有争论, 实际上, editorial2这种counts hash的方法的复杂度应该是NK, 而这里这个sort的方法是NKlgK, 但是实际上这个sort的方法要快的多, 不知道什么鬼;

@jdrogin said in Share my short JAVA solution:

@wz366 nice solution, I think the basic idea of forming the grouping key by sorting the characters in each string is the best option. But, there are different approaches to sorting the strings. Seems counting sort is the best option for this sort given that the values are confined to a narrow range. Here's my solution using counting sort. Notice the optimization of re-using the counting map to reduce dynamic memory allocations.

    public IList<IList<string>> GroupAnagrams(string[] strs)   
    {  
        Dictionary<string, IList<string>> map = new Dictionary<string, IList<string>>();  
        int[] countingMap = new int[26];  

        foreach (string s in strs)  
        {  
            string key = SortString(s, countingMap);  
            if (!map.ContainsKey(key)) map[key] = new List<string>();  
            map[key].Add(s);  
        }  
        return map.Values.ToList();  
    }  

    public string SortString(string s, int[] map)  
    {  
        char[] chars = new char[s.Length];  
        foreach (char c in s)  
        {  
            map[c - 'a']++;  
        }  
        int pos = 0;  
        for (int i = 0; i < 26; i++)  
        {  
            while (map[i] > 0)  
            {  
                chars[pos++] = (char)('a' + i);  
                map[i]--;  
            }  
        }  
        return new string(chars);  
    }

这么一说, 好像counts的做法跟sort的做法也没有太大的不同. 那么最后的差别实际上是StringBuilder造成的?

@ISherry said in Java beat 100%!!! use prime number:

public static List<List<String>> groupAnagrams(String[] strs) {   
   int[] prime = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103};//最多10609个z  

        List<List<String>> res = new ArrayList<>();  
        HashMap<Integer, Integer> map = new HashMap<>();  
        for (String s : strs) {  
            int key = 1;  
            for (char c : s.toCharArray()) {  
                key *= prime[c - 'a'];  
            }  
            List<String> t;  
            if (map.containsKey(key)) {  
                t = res.get(map.get(key));  
            } else {  
                t = new ArrayList<>();  
                res.add(t);  
                map.put(key, res.size() - 1);  
            }  
            t.add(s);  
        }  
        return res;  
}

这个解法还是有点意思的, 最后还是成功的用integer hash做出来了; 跟我的想法不同, 我的想法就是单纯的控制每一位的base要高于下一位的总和, 这样来避免collision, 这样的思路难免要导致base快速的增加; 但是他这里的做法, 是直接利用prime本身的性质, 很聪明, 这个也算是学会了一招integer hash overflow的补丁;

不对... 这个做法好像还是会overflow:

@ljj19921026 said in Java beat 100%!!! use prime number:

For the input { "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" } the answer is wrong...

所以prime并没有什么卵用, 看看就行了;

@jianchao.li.fighter said in 10-lines 76ms Easy C++ Solution (Updated Function Signature):

The function signature has been updated to return a more intuitive vector<vector<string>> which treats a single string as a group of anagrams consisting of only itself.

The idea is to use an unordered_map to store those strings that are anagrams. We use the sorted string as the key and the string itself as the value. The strings are stored in a multiset since there may be duplicates. Moreover, multiset will sort them by default as we desire.

The code is as follows.

class Solution {  
public:  
    vector<vector<string>> groupAnagrams(vector<string>& strs) {  
        unordered_map<string, multiset<string>> mp;  
        for (string s : strs) {  
            string t = s;   
            sort(t.begin(), t.end());  
            mp[t].insert(s);  
        }  
        vector<vector<string>> anagrams;  
        for (auto m : mp) {   
            vector<string> anagram(m.second.begin(), m.second.end());  
            anagrams.push_back(anagram);  
        }  
        return anagrams;  
    }  
};

Update: as suggested by yswu1234 in the answer, general sort takes O(nlogn) time. In this problem, since the string only contains lower-case alphabets, we can write a sorting function using counting sort (O(n) time) to speed up the sorting process. I write a string sorting function strSort below and using it to sort the string achieves the overall running time 72ms for this problem.

class Solution {  
public:  
    vector<vector<string>> groupAnagrams(vector<string>& strs) {  
        unordered_map<string, multiset<string>> mp;  
        for (string s : strs) {  
            string t = strSort(s);  
            mp[t].insert(s);  
        }  
        vector<vector<string>> anagrams;  
        for (auto m : mp) {   
            vector<string> anagram(m.second.begin(), m.second.end());  
            anagrams.push_back(anagram);  
        }  
        return anagrams;  
    }  
private:  
    string strSort(string& s) {  
        int count[26] = {0}, n = s.length();  
        for (int i = 0; i < n; i++)  
            count[s[i] - 'a']++;  
        int p = 0;  
        string t(n, 'a');  
        for (int j = 0; j < 26; j++)  
            for (int i = 0; i < count[j]; i++)  
                t[p++] += j;  
        return t;  
    }   
};

还是类似的思路, 大概看看;

submission最优解:22(98):

class Solution {  
    public List<List<String>> groupAnagrams(String[] strs) {  
        List<List<String>> result = new ArrayList<List<String>>();  
        if (strs == null || strs.length == 0){  
            return result;  
        }  
        int[] prime = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103};  


        Map<Integer, List<String>> map = new HashMap<Integer,List<String>>();  
        for (String st: strs) {  
            char[] str = st.toCharArray();  
            int key = 1;  
            for (int i = 0; i < str.length; i++) {  

                key *= prime[str[i]-'a'];   
            }  
            List<String> list;  
            if (map.containsKey(key)){  
                list = map.get(key);  
            } else {  
                list = new ArrayList<String>();  
                result.add(list);  
                map.put(key,list);  
            }  
            list.add(st);  

        }  
        return result;  
    }  
}

丢人, 错误的解法, 大概看看就行了;

另外, 这题主要是学会一个grouping problem的一般思路; group by what?

Problem Description

Given an array of strings, group anagrams together.

For example, given: ["eat", "tea", "tan", "ate", "nat", "bat"],
Return:

[  
  ["ate", "eat","tea"],  
  ["nat","tan"],  
  ["bat"]  
]

Note: All inputs will be in lower-case.

Difficulty:Medium
Total Accepted:181.1K
Total Submissions:481.7K
Contributor:LeetCode
Companies
facebookamazonbloomberguberyelp
Related Topics
hash tablestring
Similar Questions
Valid AnagramGroup Shifted Strings

LC49 GroupAnagrams

GroupAnagrams [source code]

editorial

Approach #1: Categorize by Sorted String [Accepted]

Approach #2: Categorize by Count [Accepted]

Problem Description

results matching ""

No results matching ""