LinkedListRandoMNode [source code]
public class LinkedListRandoMNode {
static
/******************************************************************************/
public class Solution {
private ListNode head;
private Random rd;
/** @param head The linked list's head.
Note that the head is guaranteed to be not null, so it contains at least one node. */
public Solution(ListNode head) {
this.head = head;
rd = new Random();
}
/** Returns a random node's value. */
public int getRandom() {
ListNode cur = head.next;
int res = head.val;
int i = 1;
while (cur != null) {
int j = rd.nextInt(i + 1);
if (j < 1)
res = cur.val;
i++;
cur = cur.next;
}
return res;
}
}
/******************************************************************************/
public static void main(String[] args) {
}
}
连tag都看不懂是干啥的, 有点不想写了;
笨办法, 直接就是在constructor的时候先查出来长度是多少, 然后每一次get, 生成一个长度范围内的随机数, 然后丢出去就行了; 这个能满足feature, 不过感觉performance不是特别好;
discussion认为在长度已知的情况下, shuffle是比较好的; 不过一次shuffle估计也是O(N)的时间? 另外为什么他们不用我这个直接找到一个随机index的方法? 估计是太trivial了. 毕竟是Google的题目, 你给一个这样的答案, 用random找random, 确实有点说不过去, 自己骗自己;
自己参照着其他人的解释, 写了一个, 最后的速度是142ms (32%), 有点让人失望; 好像是大case的波动, 重新提交了一次, 就是129ms (64%)了;
注意我这里的nextInt的参数应该是i + 1而不是i; 可以从i = 1开始举例看看就知道了; 反正你要保证的是j是一个从0..(k+i-1)里面随机选择一个0..k-1的数字; 如果方便记忆, 可以把k和k + i都记忆成length型数值; 这样就很好理解为什么要用i+1了: 第一个循环是在头两个数里面选一个, 所以参数用i+1=2是很自然的;
当然你也可以把i初始化到2, 不过感觉这样你其实就是把i给设定成了一个length型变量而不是一个index型的变量, 感觉不是很一致也不太符合常规; 反正自己知道怎么算就行了;
but again, 这种变量更新Corner Case的问题, 还是举例子最方便. 拿着几个小符号搞来搞去, 最后搞的其实是你自己;
这个是discussion上面给出的一个关于reservoir ['rezəvwɑ:] sampling的很好的解释:
@WTIFS said in Brief explanation for Reservoir Sampling:
Problem:
- Choose
k
entries fromn
numbers. Make sure each number is selected with the probability ofk/n
Basic idea:
- Choose
1, 2, 3, ..., k
first and put them into the reservoir.- For
k+1
, pick it with a probability ofk/(k+1)
, and randomly replace a number in the reservoir.- For
k+i
, pick it with a probability ofk/(k+i)
, and randomly replace a number in the reservoir.- Repeat until
k+i
reachesn
Proof:
- For
k+i
, the probability that it is selected and will replace a number in the reservoir isk/(k+i)
- For a number in the reservoir before (let's say
X
), the probability that it keeps staying in the reservoir is
P(X was in the reservoir last time)
×P(X is not replaced by k+i)
- =
P(X was in the reservoir last time)
× (1
-P(k+i is selected and replaces X)
)- =
k/(k+i-1)
× (1
-k/(k+i)
×1/k
)- =
k/(k+i)
这里要做的其实就是一个inductive proof: 从p(in) = k/(k+i-1)
推导到`p(in) = k / (k + i);
- When
k+i
reachesn
, the probability of each number staying in the reservoir isk/n
Example
- Choose
3
numbers from[111, 222, 333, 444]
. Make sure each number is selected with a probability of3/4
- First, choose
[111, 222, 333]
as the initial reservior- Then choose
444
with a probability of3/4
- For
111
, it stays with a probability of
P(444 is not selected)
+P(444 is selected but it replaces 222 or 333)
- =
1/4
+3/4
*2/3
- =
3/4
- The same case with
222
and333
- Now all the numbers have the probability of
3/4
to be pickedThis Problem
- This problem is the sp case where
k=1
P.S. Thanks for @WKVictor for pointing out my mistake!
上面这个人的解释本来是有问题的, 这个是下面帮忙修正的一个:
@WKVictor said in Brief explanation for Reservoir Sampling:
Your explanation is nice, but there is an artifact in your proof and the example.
P(k+i is not chosen) = 1- k/(k+i) = i/(k+i)
instead of1/(k+i)
. Then, your final result in the proof is notk/(k+i)
any more.As a matter of fact, the probability of
X
being kept in reservoir should beP(X was in the reservoir last time) * (P(k+i is not chosen) + P(k+i is chosen but X is not replaced)) = k/(k+i-1) * (i/(k+i) + k/(k+i) * (k-1)/k) = k/(k+i).
In another word, the termP(X was in the reservoir)
was neglected in your proof. In your example,i
happens to be 1, so the result in the example seems to be correct.Another easier way of calculating the probability is
P(X was in the reservoir last time) * P(X is not replaced this time) = P(X was in the reservoir last time) * (1- P(X is replaced this time)) = k/(k+i-1) * (1 - k/(k+i) * 1/k) = k/(k+i)
.
G4G上面也有解释: http://www.geeksforgeeks.org/reservoir-sampling/
这个是另外一个非常好的解释: https://gregable.com/2007/10/reservoir-sampling.html. 这个的作者正好还是个Google人;
注意仔细读一下这个blog里面的内容, 因为这个题目本身其实是一个k等于1的特殊情形; 如果k比较大, 那么有一个初始化的步骤, 是不能忘记的, 这个blog里面就有介绍;
另外一个: http://blog.cloudera.com/blog/2013/04/hadoop-stratified-randosampling-algorithm/
这个算法本身的证明是需要用induction的, 可能这个也是为什么Google认为这个算法重要;
这些解释上面也给出了一个这个算法的优势:
- 针对stream的算法, 也就是不需要知道长度;
- 如果你在stream算法当中也先单独计算一次长度, 然后再生成随机index, 这样你就要走两个pass. 这个reservoir算法就是为了避免这个问题, 一个pass就能够解决;
如果理解了这个算法的证明, 就可以知道这个算法本身其实还remarkable的: 虽然是一个概率问题, 但是这个算法完全没有一点近似. 就是严格的evenly distributed random from a stream;
这个是discussion上面给出的一个解释:
Start...
When we read the first node head, if the stream ListNode stops here, we can just return the head.val. The possibility is 1/1.
When we read the second node, we can decide if we replace the result r or not. The possibility is 1/2. So we just generate a random number between 0 and 1, and check if it is equal to 1. If it is 1, replace r as the value of the current node, otherwise we don't touch r, so its value is still the value of head.
When we read the third node, now the result r is one of value in the head or second node. We just decide if we replace the value of r as the value of current node(third node). The possibility of replacing it is 1/3, namely the possibility of we don't touch r is 2/3. So we just generate a random number between 0 ~ 2, and if the result is 2 we replace r.
We can continue to do like this until the end of stream ListNode.
public class Solution {
ListNode head;
Random random;
public Solution(ListNode h) {
head = h;
random = new Random();
}
public int getRandom() {
ListNode c = head;
int r = c.val;
for(int i=1;c.next != null;i++){
c = c.next;
if(random.nextInt(i + 1) == i) r = c.val;
}
return r;
}
}
reservoir sampling好像如果在这个题目k = 1的情况下, 并不是最优的做法, 不如直接一个2pass. 不过如果k比较大, 那么这个算法就有优势了; Stefan认为这个速度不佳的情况可能是因为random的API的调用次数太多, 导致时间不理想;
according to the wiki https://en.wikipedia.org/wiki/Reservoir_sampling
here is sudo code for k size reservoir:
//S has items to sample, R will contain the result
ReservoirSample(S[1..n], R[1..k])
// fill the reservoir array
for i = 1 to k
R[i] := S[i]
// replace elements with gradually decreasing probability
for i = k+1 to n
j := random(1, i) // important: inclusive range
if j <= k
R[j] := S[i]
you need to remember the range [ 0, i ] should be inclusive.
这个伪代码是非常聪明的, 因为这题其实是需要两个随机数的:
- 一个随机数计算是否选择k + i的概率;
- 一个随机数计算被k + i替换的数的index;
这里这两个随机数直接用一个j就完成了; 根据上面对于inefficiency的讨论, 减少对random的API的调用次数是有好处的;
这个是这个人写的代码:
class Solution {
private:
ListNode* head;
public:
Solution(ListNode* head) {
this->head = head;
}
int getRandom() {
int res = head->val;
ListNode* node = head->next;
int i = 2;
while(node){
int j = rand()%i;
if(j==0)
res = node->val;
i++;
node = node->next;
}
return res;
}
};
看起来有点啰嗦, 实际上没毛病. 注意两点:
- reservoir的实现实际上是要有一个专门的容器来摆放的. 上面general情况的伪代码用的是一个array. 这里我们这一题因为k等于1, 所以直接一个单独的变量保存就可以了;
- 注意他这里这个循环的写法. 虽然看起来有点丑, 不过如果是在是while循环想不起来怎么写, 就这么写怎么了嘛! 他在while循环开始之前就把node和i给初始化到第一个iteration的位置, 然后在循环里面重复做同样的更新;
这个是看到的一个类似的JAVA的版本, 而且可以针对任何的K:
public class Solution {
private ListNode head;
private Random rand;
public Solution(ListNode head) {
this.head = head;
this.rand = new Random();
}
public int getRandom() {
int k = 1;
ListNode node = this.head;
int res = node.val;
int i = 0;
ArrayList<Integer> reservoir = new ArrayList<Integer>();
while (i < k && node != null) {
reservoir.add(node.val);
node = node.next;
i++;
}
i++; // i == k => i == k+1
while (node != null) {
int j = rand.nextInt(i);
if (j < k) {
reservoir.set(j, node.val);
}
i++;
node = node.next;
}
return reservoir.get(0);// or return reservoir when k > 1;
}
}
但是好像有点不对?
nextInt(int bound)
:
Returns a pseudorandom, uniformly distributed int value between 0 (inclusive) and the specified value (exclusive), drawn from this random number generator's sequence.
哦对的. 我们这里的i是0-based, 所以我们rand.nextInt(i)
其实是对的, 比如i是k + delta, 那么我们这里找到的其实是0 ~ k + delta - 1, 所以实际上是k + delta个值当中取得的一个随机数. 产生的数字是0..k-1, 这个也正好完整覆盖reservoir里面所有的index;
我尝试把这里的参数i改成i + 1, 结果是错了, 所以这里的做法没有毛病;
注意这里的swap实际上实现起来很方便, 因为不需要做到InPlace, 所以实际上就是一个assignment;
Problem Description
Given a singly linked list, return a random node's value from the linked list. Each node must have the same probability of being chosen.
Follow up:
What if the linked list is extremely large and its length is unknown to you? Could you solve this efficiently without using extra space?
Example:
// Init a singly linked list [1,2,3].
ListNode head = new ListNode(1);
head.next = new ListNode(2);
head.next.next = new ListNode(3);
Solution solution = new Solution(head);
// getRandom() should return either 1, 2, or 3 randomly. Each element should have equal probability of returning.
solution.getRandom();
Difficulty:Medium
Total Accepted:28.2K
Total Submissions:60.2K
Contributor: LeetCode
Companies
google
Related Topics
reservoir sampling
Similar Questions
Random Pick Index
/**
- Your Solution object will be instantiated and called as such:
- Solution obj = new Solution(head);
- int param_1 = obj.getRandom();
*/