SortCharactersByFrequency [source code]


public class SortCharactersByFrequency {
static
/******************************************************************************/
public class Solution {
    public String frequencySort(String s) {
        int[] counts = new int[256];
        for (char c : s.toCharArray())
            counts[c]++;
        List<Integer>[] b = new List[s.length() + 1];
        for (int i = 0; i < 256; i++) {
            if (counts[i] > 0) {
                if (b[counts[i]] == null)
                    b[counts[i]] = new ArrayList<Integer>();
                b[counts[i]].add(i);
            }
        }
        StringBuilder sb = new StringBuilder();
        for (int i = b.length - 1; i > 0; i--) {
            if (b[i].size() > 0) {
                for (int num : b[i])
                    for (int j = 0; j < i; j--)
                        sb.append((char) num);
            }
        }
        return sb.toString();
    }
}
/******************************************************************************/

    public static void main(String[] args) {
        SortCharactersByFrequency.Solution tester = new SortCharactersByFrequency.Solution();
    }
}

感觉还是一个没什么想法的问题, 先想一个笨办法; 这里提出一个新的要求, 以后笨办法不能光是脑子里动一下就拉倒了, 一定要笔头动一动, 歇下来看看:

脑子里动一动, 有些细节问题被忽略, 有些实现起来的难点部分没有注意到, 最后的整个办法本身可能都是无法实现的;
真正面试的时候这个写笨办法的过程也是要有的, 所以也是一个提前锻炼;

这个问题的笨办法无非是:

count all to a map
collect all entry values to an array, also build a reverse map with count as key;
sort the array. Then get the key from the reverse map one by one;

不过这个办法明显是又臭又长的, 速度也是NlgN, 看了一眼discussion, 全都是O(N)的. 没想法了, 不知道怎么优化;

这里有两个点需要注意一下, 一个就是这个长度之所以多1, 是为了避免后面index和value(count)之间的+1/-1的转换;

List<Integer>[] b = new ArrayList<Integer>[s.length() + 1];

其次, 这个写法是错的, 因为一个array declare的时候, 你也只能使用generic的; 也就是跟上面实际的代码那样写; 不仅不能写ArrayList, 连brace尖括号都不能有; 这个是因为你现在declare的这个object其实就是这个array, 而不是array里面的东西; 你array里面的entry具体是怎么实现的一个list, 无所谓的, 但是你这个array本身这个object, 必须要用generic的方式来定义;

另外一个:

for (int num : b[i])  
    for (int j = 0; j < i; j--)  
        sb.append((char) num);

注意这里, num的type, 必须写int而不能用integer: 否则下面的char的cast会失败. 这个是因为我们虽然知道autoboxing and autounboxing, 但是这个其实只有当你直接在, 比如int和integer之间转换的时候, 才会触发; 你integer到char的时候还想让JVM先自动转换到int, 然后再来搞成char, 这个是不行的, 没有这样的操作;

MLE或者TLE很多时候最可能的原因就是有死循环; 比较常见的原因有, 比如Loop Var的增加还是减少写反了;

最后的速度是11ms (98%), 还可以; 虽然说算法本身也不是自己写的;

这个是discussion里面一个最优解:

The logic is very similar to NO.347 and here we just use a map a count and according to the frequency to put it into the right bucket. Then we go through the bucket to get the most frequently character and append that to the final stringbuilder.

public class Solution {  
    public String frequencySort(String s) {  
        Map<Character, Integer> map = new HashMap<>();  
        for (char c : s.toCharArray()) {  
            if (map.containsKey(c)) {  
                map.put(c, map.get(c) + 1);  
            } else {  
                map.put(c, 1);  
            }  
        }  
        List<Character> [] bucket = new List[s.length() + 1];  
        for (char key : map.keySet()) {  
            int frequency = map.get(key);  
            if (bucket[frequency] == null) {  
                bucket[frequency] = new ArrayList<>();  
            }  
            bucket[frequency].add(key);  
        }  
        StringBuilder sb = new StringBuilder();  
        for (int pos = bucket.length - 1; pos >=0; pos--) {  
            if (bucket[pos] != null) {  
                for (char num : bucket[pos]) {  
                    for (int i = 0; i < pos; i++) {  
                        sb.append(num);  
                    }  
                }  
            }  
        }  
        return sb.toString();  
    }  
}

所以其实整体的思路我的想法是对的, 毕竟这里要找的是一个按照frequency的sort, 所以count是一定要做的. 关键是这里count出来的结果怎么处理;

这里他这个方法注意到一个隐藏条件: frequency的domain是有限的!

所以我们真正sort frequency的时候可以用counting sort, 也就是hashing, 而不是Std的sort, 这样最后就能做到O(N)而不是NlgN了;

我本来的想法是针对第一个Map的value, 建立一个array, 然后一个反向Map, 不过这里想到bucket sort的话, 这两个问题就可以合成一个了(value as domain正好同时完成了两个功能);

这个人还给了一个用PriorityQueue的解法:

public class Solution {  
    public String frequencySort(String s) {  
        Map<Character, Integer> map = new HashMap<>();  
        for (char c : s.toCharArray()) {  
            if (map.containsKey(c)) {  
                map.put(c, map.get(c) + 1);  
            } else {  
                map.put(c, 1);  
            }  
        }  
        PriorityQueue<Map.Entry<Character, Integer>> pq = new PriorityQueue<>(  
            new Comparator<Map.Entry<Character, Integer>>() {  
                @Override  
                public int compare(Map.Entry<Character, Integer> a, Map.Entry<Character, Integer> b) {  
                    return b.getValue() - a.getValue();  
                }  
            }  
        );  
        pq.addAll(map.entrySet());  
        StringBuilder sb = new StringBuilder();  
        while (!pq.isEmpty()) {  
            Map.Entry e = pq.poll();  
            for (int i = 0; i < (int)e.getValue(); i++) {  
                sb.append(e.getKey());  
            }  
        }  
        return sb.toString();  
    }  
}

这个解法还是值得学习一下的, 因为我对PriorityQueue这个东西还是不太熟悉; 不过大概看了一下, 他这里其实还是依赖PriorityQueue做一个sort by value的工作; 这个sort by value (of key-value pair)的东西估计以后还是有可能碰到, 值得学习一下这个方法;

不过要知道这个PriorityQueue的方法其实最后完成的就是一个sort, 只不过是把我笨办法上面描述的那种sort给streamline了一下; 最后的速度其实还是NlgN;

I have a question related to the complexity. That is since the set of character must be in a constant space whose size is either 128 or 256 or... Will we still treat the complexity of insertion into the heap as O(logn)? Or should we treat it as constant?

这个还真是一个想法, 因为这个counts Map最后实际上只可能有这么多的entry, 所以这个PriorityQueue的size其实是不会太大的;

另外注意, 按照他的说法, 其实heap就是PriorityQueue;

You can add a max to save the space.

也就是进一步缩小一下counts的domain. 一个小优化; 可以节省一点的space;

另外一个优化就是其实第一个pass的这个HashMap也是不需要的, 因为我们找的是character, 所以就算没有跟你说只限定于大小写字母, 你还是可以确定这些character的domain是固定的, 所以也可以同样使用bucket sort;

You are correct if we can assume ascii 256. Good call.

public String frequencySort(String s) {  
    if(s.length() < 3)  
        return s;  
    int max = 0;  
    int[] map = new int[256];  
    for(char ch : s.toCharArray()) {  
        map[ch]++;  
        max = Math.max(max,map[ch]);  
    }  
    String[] buckets = new String[max + 1]; // create max buckets  
    for(int i = 0 ; i < 256; i++) { // join chars in the same bucket  
        String str = buckets[map[i]];  
        if(map[i] > 0)  
            buckets[map[i]] = (str == null) ? "" + (char)i : (str + (char) i);  
    }  
    StringBuilder strb = new StringBuilder();  
    for(int i = max; i >= 0; i--) { // create string for each bucket.  
        if(buckets[i] != null)  
            for(char ch : buckets[i].toCharArray())  
                for(int j = 0; j < i; j++)  
                    strb.append(ch);  
    }  
    return strb.toString();  
}

不过有人提出, 这题这个heap的解法其实更优秀:

@piyush121 another thing to consider here is the space complexity when your frequency distribution is almost bimodal; as @nikprashanth pointed out earlier, if you have a string with a single character with a high frequency (ex: 1 million) and the rest of the characters in the single digits, you're instantiating an array (for your buckets) w/ size 1 million. It is in situations like these that the heap-based solutions are space-bound by the number of unique characters (as opposed to their counts which arguably has a higher upper bound). Naturally, the bucket-sort solution only works well when the range of characters is constrained (ex: 7-8bit ASCII) but croaks when supplied UTF-8/UTF-16 strings.

虽然说我们可以得到counts的domain, 但是这个domain本身其实是没有任何的bound的; 如果真的是碰到一个很practical的大case, 确实是很容易被坑的;

discussion好像并没有其他的解法了, 主流解法就是这两种, bucket sort或者heap;

Problem Description

Given a string, sort it in decreasing order based on the frequency of characters.

Example 1:

Input:  
"tree"  

Output:  
"eert"  

Explanation:  
'e' appears twice while 'r' and 't' both appear once.  
So 'e' must appear before both 'r' and 't'. Therefore "eetr" is also a valid answer.

Example 2:

Input:  
"cccaaa"  

Output:  
"cccaaa"  

Explanation:  
Both 'c' and 'a' appear three times, so "aaaccc" is also a valid answer.  
Note that "cacaca" is incorrect, as the same characters must be together.

Example 3:

Input:  
"Aabb"  

Output:  
"bbAa"  

Explanation:  
"bbaA" is also a valid answer, but "Aabb" is incorrect.  
Note that 'A' and 'a' are treated as two different characters.

Difficulty:Medium
Total Accepted:31.7K
Total Submissions:62.5K
Contributor: stickypens
Companies
amazon google
Related Topics
hash table heap
Similar Questions
Top K Frequent Elements

LC451 SortCharactersByFrequency

SortCharactersByFrequency [source code]

Problem Description

results matching ""

No results matching ""