DesignHitCounter [source code]

public class DesignHitCounter {
static
/******************************************************************************/
public class HitCounter {
    int[] counts = new int[300];
    int prev = 1;

    /** Initialize your data structure here. */
    public HitCounter() {

    }

    /** Record a hit.
        @param timestamp - The current timestamp (in seconds granularity). */
    public void hit(int timestamp) {
        if (timestamp != prev) {
            clean(timestamp);
            counts[(timestamp - 1) % 300] = 0;
        }
        counts[(timestamp - 1) % 300]++;
        prev = timestamp;
    }

    /** Return the number of hits in the past 5 minutes.
        @param timestamp - The current timestamp (in seconds granularity). */
    public int getHits(int timestamp) {
        clean(timestamp);
        if (timestamp > prev) {
            counts[(timestamp - 1) % 300] = 0;            
        }
        int sum = 0;
        for (int i : counts)
            sum += i;
        prev = timestamp;
        return sum;
    }

    public void clean(int timestamp) {
        int distance = timestamp - prev > 300 ? 300 : (timestamp - prev) % 300 - 1;
        if (timestamp - prev > 0)
            for (int i = 1; i <= distance; i++)
        counts[(prev - 1 + i) % 300] = 0;
    }
}

/**
 * Your HitCounter object will be instantiated and called as such:
 * HitCounter obj = new HitCounter();
 * obj.hit(timestamp);
 * int param_2 = obj.getHits(timestamp);
 */
/******************************************************************************/

    public static void main(String[] args) {
        DesignHitCounter.HitCounter tester = new DesignHitCounter.HitCounter();
        tester.hit(1);
        System.out.println(tester.getHits(1));
        tester.hit(2);
        System.out.println(tester.getHits(2));
        tester.hit(3);
        System.out.println(tester.getHits(4));
        tester.hit(300);
        System.out.println(tester.getHits(300));
        System.out.println(tester.getHits(301));
        System.out.println(tester.getHits(310));
        System.out.println(tester.getHits(400));
    }
}

感觉这种数据设计的题目很大的一个坑就是overkill. 最优秀的解法, 往往一般就是刚刚好满足题目的需求, 不多不少.

这题瞄了一眼discussion, 好像我整体的思路是对的, 用circular array来做;

design的题目的uncertainty尤其多, 不要怕, 举多个例子, 来一个一个的排查, 看看逻辑到底怎么划分;

吭哧吭哧总算是写完了, 最后的速度是141ms (2%), 非常的差. 不过我这个我感觉应该是实现了Follow-Up的; //重新submit一次之后又编程 97ms (74%) 了, 大题目好像波动性非常大;

总体的思路其实就是针对一个circular array, 然后记录上一次operation(无论是hit还是get, 反正就是算作是一个operation)的时间在prev里面; 后面每一个operation之前, 都要清理一次这个array. 其实我感觉, 因为counts这个array的大小是固定的, 所以这个清理的过程的cost按说应该不是特别大, 不过最后看来我这个解还是比较慢的;

清理的原因当然就是因为我们最终只维护5分钟, 也就是300个hit的存在, 所以每一次最新的operation之前, 我们都要把过期的hit记录全部都清理掉;

说起来很简单, 真正设计这个清理方案的时候还是花了很多时间. 浪费很多时间的一个原因就是因为Premature Optmization: 这个clean函数我当时感觉在hit和get里面都要用到, 所以代码还没有写完就直接先把这个给提取出来了, 结果后来发现在hit和get在某些细节处理, 尤其是prev什么时候更新, 以及timestamp位置的entry的更新的时机的选择上面都有小的差别; 结果最后debug的时候完全没有朝这个方向去想, 浪费了很多时间;

这个题目卡太久的另外一个原因就是其实整体的heuristic还没有想好就开始写代码, 结果越写越乱;

教训:

  • 一定要把heuristic, 尤其是: 你判断什么, 你得到这个判断之后, 你怎么相应的反应; 这些东西想一个大概之后再写代码;
  • 切记不要Premature Optmization, 就算是需要复制粘贴代码, 那么在第一个AC版本写完之前, 这个也是没有毛病的. 就算是实际的面试的时候, 也可以一边复制粘贴一边解释: 我暂时先复制粘贴, 等写完之后我在refactor;

一个小的习惯就是: prev这种东西的更新, 宁可晚, 都不可以早. 这种变量名字你就知道是用来干什么的, 更新的太早就覆盖掉之前好不容易保留下来的数据了;

这题之所以我认为我这个代码完成了Follow-Up是因为我每一个time都有一个bucket, 所以你在一个time有多少个hit都无所谓;

其实这种design的题目, 很多时候难点就是internal data structure的选择和使用; 我这里选择的这个array, 一定程度上跟counting sort或者说hashing的思路有点类似;

其实最怕的就是这种题目, 也不能说是完全不会写的, 好歹还是有一点思路, 直接看答案就是觉得很不甘心. 但是自己写吧, 又吭哧吭哧谢不对, 最是浪费时间就是这种题目了;


这个是discussion最优解, 虽然最后的速度其实不怎么样(重新submit一次之后速度又还可以了, 感觉这种很大的题目, 好像OJ的表现不太稳定):

O(s) s is total seconds in given time interval, in this case 300.
basic ideal is using buckets. 1 bucket for every second because we only need to keep the recent hits info for 300 seconds. hit[] array is wrapped around by mod operation. Each hit bucket is associated with times[] bucket which record current time. If it is not current time, it means it is 300s or 600s... ago and need to reset to 1.

public class HitCounter {  
    private int[] times;  
    private int[] hits;  

    public HitCounter() {  
        times = new int[300];  
        hits = new int[300];  
    }  

    public void hit(int timestamp) {  
        int index = timestamp % 300;  
        if (times[index] != timestamp) {  
            times[index] = timestamp;  
            hits[index] = 1;  
        } else {  
            hits[index]++;  
        }  
    }  

    public int getHits(int timestamp) {  
        int total = 0;  
        for (int i = 0; i < 300; i++) {  
            if (timestamp - times[i] < 300) {  
                total += hits[i];  
            }  
        }  
        return total;  
    }  
}

他的思路跟我其实是类似的, 也是一个bucket的做法, 不同的是, 他用两个bucket, 最后的维护过程比我的简单太多!!! 我发现我对于design题目还是存在一定的误解, 总是认为如果一个structure能够搞定, 就是最聪明的; 而实际上, 往往是middle难度及以上的题目, 哪能那么简单呢. 这个题目, 如果认个怂, 多用一点space, 最后的设计难度大幅度的降低.

use more resorts in Design problems!

另外他这个设计在设计的思路上面其实跟我有差别的. 我设计的时候, 对于这个circular array的理解, 更多的其实还是一个iteration的角度上来理解, 也就是有一个类似滑动窗口的过程在做这个; 但是他这个用的是一个类似mapping的观点, 也就是一个映射的观点: 任何一个timestamp, 其实都可以通过circular映射到300当中的一个index; 这300个index当中的每一个, 保存的其实是the latest hit that maps to this index. 我认为这个好像是我们两种做法的核心区别;

这个帖子里面好像还讨论了一些多线程的东西, 不过有些争论, 暂时就不记在这里了;


这个是discussion上面一个最naive的做法:

In this problem, I use a queue to record the information of all the hits. Each time we call the function getHits( ), we have to delete the elements which hits beyond 5 mins (300). The result would be the length of the queue : )

public class HitCounter {  
    Queue<Integer> q = null;  

    public HitCounter() {  
        q = new LinkedList<Integer>();  
    }  

    public void hit(int timestamp) {  
        q.offer(timestamp);  
    }  

    public int getHits(int timestamp) {  
        while(!q.isEmpty() && timestamp - q.peek() >= 300) {  
            q.poll();  
        }  
        return q.size();  
    }  
}

说到storeing only a fixed size portion of a stream这样的题目, 一个FIFO的queue是最自然的选择;

这个naive的做法是无法处理Follow-Up的, 因为最后这个queue的size会非常的大.

if huge amount of hits happened at the same timestamp, this solution will takes too much memory since each element in queue is a single hit. It's better to map timestamp to the number of hits at this timestamp.

不过这个原来的作者对于time上面其实已经有所顾及: 他故意让resize的过程发生在get当中, 因为Follow-Up给出的条件暗示最后实际上hit的数量要远高于get;


这个是discussion上面一个非常类似我的做法的解:

This solution is based on the idea in this post:
https://nuttynanaus.wordpress.com/2014/03/09/software-engineer-interview-questions/
There are two solutions, the first one we choose 1s as granularity, the other is full accuracy(see the post).
We call move() before hit() and getHits(). move() will take time at most O(N), where N is the length of the array.

public class HitCounter {  
    int N;  
    int[] count;  
    int lastPosition;  
    int lastTime;  
    int sum;  

    public HitCounter() {  
        N = 300;  
        count = new int[N];  
        lastPosition = 0;  
        lastTime = 0;  
        sum = 0;  
    }  

    public void hit(int timestamp) {  
        move(timestamp);  
        count[lastPosition]++;  
        sum++;  
    }  

    public int getHits(int timestamp) {  
        move(timestamp);  
        return sum;  
    }  

    void move(int timestamp){  
        int gap = Math.min(timestamp-lastTime, N);  
        for(int i=0; i<gap;i++){  
            lastPosition = (lastPosition+1)%N;  
            sum -= count[lastPosition];  
            count[lastPosition] = 0;  
        }  
        lastTime = timestamp;  
    }  
}

一个观点:

Since this is a design question, we need to ask interviewer how this class is going to be used?
A working code is not the answer to this question, but how you adjust your program to meet different use cases.

Consider: There are 1000 frequent hit() followed by 1 getHits(). If we only do removal in getHits() function, it will be very time consuming. For me, I prefer to do removal in both hit() and getHits(), so that the program avoids system lag in this case.
This is important when you design a time-critical system.

stefan的意见:

Good point about the lag. Another reason would be space. In two of my solutions I removed in both functions as well, allowing me to use a circular buffer of fixed capacity 300.

不过底下好像有一个喷子跟这个OP喷的厉害;

另外:

One follow up is that, if the duration is much longer than 300 seconds, say 3 days instead, does our method handle space growth efficiently? Is sub-linear growth possible?

所以这个题目其实还是很值得好好思考的;


Problem Description

Design a hit counter which counts the number of hits received in the past 5 minutes.

Each function accepts a timestamp parameter (in seconds granularity) and you may assume that calls are being made to the system in chronological order (ie, the timestamp is monotonically increasing). You may assume that the earliest timestamp starts at 1.

It is possible that several hits arrive roughly at the same time.

Example:

HitCounter counter = new HitCounter();  

// hit at timestamp 1.  
counter.hit(1);  

// hit at timestamp 2.  
counter.hit(2);  

// hit at timestamp 3.  
counter.hit(3);  

// get hits at timestamp 4, should return 3.  
counter.getHits(4);  

// hit at timestamp 300.  
counter.hit(300);  

// get hits at timestamp 300, should return 4.  
counter.getHits(300);  

// get hits at timestamp 301, should return 3.  
counter.getHits(301);

Follow up:
What if the number of hits per second could be very large? Does your design scale?

Credits:
Special thanks to @elmirap for adding this problem and creating all test cases.

Difficulty:Medium
Total Accepted:15.8K
Total Submissions:29.5K
Contributor: LeetCode
Companies
dropbox google
Related Topics
design
Similar Questions
Logger Rate Limiter

results matching ""

    No results matching ""