LinkedListRandoMNode [source code]


public class LinkedListRandoMNode {
static
/******************************************************************************/
public class Solution {
    private ListNode head;
    private Random rd;

    /** @param head The linked list's head.
        Note that the head is guaranteed to be not null, so it contains at least one node. */
    public Solution(ListNode head) {
        this.head = head;
        rd = new Random();
    }

    /** Returns a random node's value. */
    public int getRandom() {
        ListNode cur = head.next;
        int res = head.val;
        int i = 1;
        while (cur != null) {
            int j = rd.nextInt(i + 1);
            if (j < 1)
                res = cur.val;
            i++;
            cur = cur.next;
        }
        return res;
    }
}
/******************************************************************************/

    public static void main(String[] args) {
    }
}

连tag都看不懂是干啥的, 有点不想写了;

笨办法, 直接就是在constructor的时候先查出来长度是多少, 然后每一次get, 生成一个长度范围内的随机数, 然后丢出去就行了; 这个能满足feature, 不过感觉performance不是特别好;

discussion认为在长度已知的情况下, shuffle是比较好的; 不过一次shuffle估计也是O(N)的时间? 另外为什么他们不用我这个直接找到一个随机index的方法? 估计是太trivial了. 毕竟是Google的题目, 你给一个这样的答案, 用random找random, 确实有点说不过去, 自己骗自己;

自己参照着其他人的解释, 写了一个, 最后的速度是142ms (32%), 有点让人失望; 好像是大case的波动, 重新提交了一次, 就是129ms (64%)了;

注意我这里的nextInt的参数应该是i + 1而不是i; 可以从i = 1开始举例看看就知道了; 反正你要保证的是j是一个从0..(k+i-1)里面随机选择一个0..k-1的数字; 如果方便记忆, 可以把k和k + i都记忆成length型数值; 这样就很好理解为什么要用i+1了: 第一个循环是在头两个数里面选一个, 所以参数用i+1=2是很自然的;

当然你也可以把i初始化到2, 不过感觉这样你其实就是把i给设定成了一个length型变量而不是一个index型的变量, 感觉不是很一致也不太符合常规; 反正自己知道怎么算就行了;

but again, 这种变量更新Corner Case的问题, 还是举例子最方便. 拿着几个小符号搞来搞去, 最后搞的其实是你自己;


这个是discussion上面给出的一个关于reservoir ['rezəvwɑ:] sampling的很好的解释:

@WTIFS said in Brief explanation for Reservoir Sampling:

Problem:

  • Choose k entries from n numbers. Make sure each number is selected with the probability of k/n

Basic idea:

  • Choose 1, 2, 3, ..., k first and put them into the reservoir.
  • For k+1, pick it with a probability of k/(k+1), and randomly replace a number in the reservoir.
  • For k+i, pick it with a probability of k/(k+i), and randomly replace a number in the reservoir.
  • Repeat until k+i reaches n

Proof:

  • For k+i, the probability that it is selected and will replace a number in the reservoir is k/(k+i)
  • For a number in the reservoir before (let's say X), the probability that it keeps staying in the reservoir is
    • P(X was in the reservoir last time) × P(X is not replaced by k+i)
    • = P(X was in the reservoir last time) × (1 - P(k+i is selected and replaces X))
    • = k/(k+i-1) × (1 - k/(k+i) × 1/k
    • = k/(k+i)

这里要做的其实就是一个inductive proof: 从p(in) = k/(k+i-1)推导到`p(in) = k / (k + i);

  • When k+i reaches n, the probability of each number staying in the reservoir is k/n

Example

  • Choose 3 numbers from [111, 222, 333, 444]. Make sure each number is selected with a probability of 3/4
  • First, choose [111, 222, 333] as the initial reservior
  • Then choose 444 with a probability of 3/4
  • For 111, it stays with a probability of
    • P(444 is not selected) + P(444 is selected but it replaces 222 or 333)
    • = 1/4 + 3/4*2/3
    • = 3/4
  • The same case with 222 and 333
  • Now all the numbers have the probability of 3/4 to be picked

This Problem

  • This problem is the sp case where k=1

P.S. Thanks for @WKVictor for pointing out my mistake!

上面这个人的解释本来是有问题的, 这个是下面帮忙修正的一个:

@WKVictor said in Brief explanation for Reservoir Sampling:

Your explanation is nice, but there is an artifact in your proof and the example.

P(k+i is not chosen) = 1- k/(k+i) = i/(k+i) instead of 1/(k+i). Then, your final result in the proof is not k/(k+i) any more.

As a matter of fact, the probability of X being kept in reservoir should be
P(X was in the reservoir last time) * (P(k+i is not chosen) + P(k+i is chosen but X is not replaced)) = k/(k+i-1) * (i/(k+i) + k/(k+i) * (k-1)/k) = k/(k+i). In another word, the term P(X was in the reservoir) was neglected in your proof. In your example, i happens to be 1, so the result in the example seems to be correct.

Another easier way of calculating the probability is P(X was in the reservoir last time) * P(X is not replaced this time) = P(X was in the reservoir last time) * (1- P(X is replaced this time)) = k/(k+i-1) * (1 - k/(k+i) * 1/k) = k/(k+i).


G4G上面也有解释: http://www.geeksforgeeks.org/reservoir-sampling/

这个是另外一个非常好的解释: https://gregable.com/2007/10/reservoir-sampling.html. 这个的作者正好还是个Google人;

注意仔细读一下这个blog里面的内容, 因为这个题目本身其实是一个k等于1的特殊情形; 如果k比较大, 那么有一个初始化的步骤, 是不能忘记的, 这个blog里面就有介绍;

另外一个: http://blog.cloudera.com/blog/2013/04/hadoop-stratified-randosampling-algorithm/

这个算法本身的证明是需要用induction的, 可能这个也是为什么Google认为这个算法重要;

这些解释上面也给出了一个这个算法的优势:

  • 针对stream的算法, 也就是不需要知道长度;
  • 如果你在stream算法当中也先单独计算一次长度, 然后再生成随机index, 这样你就要走两个pass. 这个reservoir算法就是为了避免这个问题, 一个pass就能够解决;

如果理解了这个算法的证明, 就可以知道这个算法本身其实还remarkable的: 虽然是一个概率问题, 但是这个算法完全没有一点近似. 就是严格的evenly distributed random from a stream;


这个是discussion上面给出的一个解释:

Start...
When we read the first node head, if the stream ListNode stops here, we can just return the head.val. The possibility is 1/1.

When we read the second node, we can decide if we replace the result r or not. The possibility is 1/2. So we just generate a random number between 0 and 1, and check if it is equal to 1. If it is 1, replace r as the value of the current node, otherwise we don't touch r, so its value is still the value of head.

When we read the third node, now the result r is one of value in the head or second node. We just decide if we replace the value of r as the value of current node(third node). The possibility of replacing it is 1/3, namely the possibility of we don't touch r is 2/3. So we just generate a random number between 0 ~ 2, and if the result is 2 we replace r.

We can continue to do like this until the end of stream ListNode.

public class Solution {  

    ListNode head;  
    Random random;  

    public Solution(ListNode h) {  
        head = h;         
        random = new Random();          
    }  

    public int getRandom() {  

        ListNode c = head;  
        int r = c.val;  
        for(int i=1;c.next != null;i++){  

            c = c.next;  
            if(random.nextInt(i + 1) == i) r = c.val;                          
        }  

        return r;  
    }  
}

reservoir sampling好像如果在这个题目k = 1的情况下, 并不是最优的做法, 不如直接一个2pass. 不过如果k比较大, 那么这个算法就有优势了; Stefan认为这个速度不佳的情况可能是因为random的API的调用次数太多, 导致时间不理想;


according to the wiki https://en.wikipedia.org/wiki/Reservoir_sampling
here is sudo code for k size reservoir:

//S has items to sample, R will contain the result  

ReservoirSample(S[1..n], R[1..k])  
  // fill the reservoir array  
  for i = 1 to k  
      R[i] := S[i]  

  // replace elements with gradually decreasing probability  
  for i = k+1 to n  
    j := random(1, i)   // important: inclusive range  
    if j <= k  
        R[j] := S[i]

you need to remember the range [ 0, i ] should be inclusive.

这个伪代码是非常聪明的, 因为这题其实是需要两个随机数的:

  • 一个随机数计算是否选择k + i的概率;
  • 一个随机数计算被k + i替换的数的index;

这里这两个随机数直接用一个j就完成了; 根据上面对于inefficiency的讨论, 减少对random的API的调用次数是有好处的;

这个是这个人写的代码:

class Solution {  
private:  
    ListNode* head;  
public:  
    Solution(ListNode* head) {  
        this->head = head;  
    }  

    int getRandom() {  
        int res = head->val;  
        ListNode* node = head->next;  
        int i = 2;  
        while(node){  
            int j = rand()%i;  
            if(j==0)  
                res = node->val;  
            i++;  
            node = node->next;  
        }  
        return res;  
    }  
};

看起来有点啰嗦, 实际上没毛病. 注意两点:

  • reservoir的实现实际上是要有一个专门的容器来摆放的. 上面general情况的伪代码用的是一个array. 这里我们这一题因为k等于1, 所以直接一个单独的变量保存就可以了;
  • 注意他这里这个循环的写法. 虽然看起来有点丑, 不过如果是在是while循环想不起来怎么写, 就这么写怎么了嘛! 他在while循环开始之前就把node和i给初始化到第一个iteration的位置, 然后在循环里面重复做同样的更新;

这个是看到的一个类似的JAVA的版本, 而且可以针对任何的K:

public class Solution {  
    private ListNode head;  
    private Random rand;  

    public Solution(ListNode head) {  
        this.head = head;  
        this.rand = new Random();  
    }  

    public int getRandom() {  
        int k = 1;  
        ListNode node = this.head;  
        int res = node.val;  
        int i = 0;  
        ArrayList<Integer> reservoir = new ArrayList<Integer>();  

        while (i < k && node != null) {  
            reservoir.add(node.val);  
            node = node.next;  
            i++;  
        }  
        i++; // i == k  =>  i == k+1  
        while (node != null) {  
            int j = rand.nextInt(i);  
            if (j < k) {  
                reservoir.set(j, node.val);  
            }  
            i++;  
            node = node.next;  
        }  
        return reservoir.get(0);// or return reservoir when k > 1;  
    }  
}

但是好像有点不对?

nextInt(int bound):
Returns a pseudorandom, uniformly distributed int value between 0 (inclusive) and the specified value (exclusive), drawn from this random number generator's sequence.

哦对的. 我们这里的i是0-based, 所以我们rand.nextInt(i)其实是对的, 比如i是k + delta, 那么我们这里找到的其实是0 ~ k + delta - 1, 所以实际上是k + delta个值当中取得的一个随机数. 产生的数字是0..k-1, 这个也正好完整覆盖reservoir里面所有的index;

我尝试把这里的参数i改成i + 1, 结果是错了, 所以这里的做法没有毛病;

注意这里的swap实际上实现起来很方便, 因为不需要做到InPlace, 所以实际上就是一个assignment;


Problem Description

Given a singly linked list, return a random node's value from the linked list. Each node must have the same probability of being chosen.

Follow up:
What if the linked list is extremely large and its length is unknown to you? Could you solve this efficiently without using extra space?

Example:

// Init a singly linked list [1,2,3].  
ListNode head = new ListNode(1);  
head.next = new ListNode(2);  
head.next.next = new ListNode(3);  
Solution solution = new Solution(head);  

// getRandom() should return either 1, 2, or 3 randomly. Each element should have equal probability of returning.  
solution.getRandom();

Difficulty:Medium
Total Accepted:28.2K
Total Submissions:60.2K
Contributor: LeetCode
Companies
google
Related Topics
reservoir sampling
Similar Questions
Random Pick Index

/**

  • Your Solution object will be instantiated and called as such:
  • Solution obj = new Solution(head);
  • int param_1 = obj.getRandom();
    */

results matching ""

    No results matching ""