KthLargestElementInAnArray [source code]
public class KthLargestElementInAnArray {
static
/****************************************************************************/
class Solution {
public int findKthLargest(int[] nums, int k) {
return findK (nums, nums.length - k + 1, 0, nums.length - 1);
}
int findK (int[] nums, int k, int lo, int hi) { // invariant: k is 1-based
int p = partition (nums, lo, hi);
int res = -1;
if (k == p - lo + 1)
res = nums[p];
else if (k < p - lo + 1)
res = findK (nums, k, lo, p - 1);
else
res = findK (nums, k - (p - lo + 1), p + 1, hi);
return res;
}
int partition (int[] nums, int lo, int hi) {
if (lo == hi)
return lo;
int v = nums[lo];
int i = lo, j = hi + 1;
while (true) {
while (nums[++i] < v) if (i == hi) break;
while (nums[--j] > v) if (j == lo) break;
if (i >= j) break;
swap (nums, i, j);
}
swap (nums, lo, j);
return j;
}
void swap (int[] nums, int i, int j) {
int tmp = nums[i];
nums[i] = nums[j];
nums[j] = tmp;
}
}
/****************************************************************************/
public static void main(String[] args) {
KthLargestElementInAnArray.Solution tester = new KthLargestElementInAnArray.Solution();
int[][] inputs = {
{3,2,1,5,6,4}, {2, 5},
{1}, {1, 1},
};
for (int i = 0; i < inputs.length / 2; i++) {
System.out.println (Printer.separator ());
int[] nums = inputs[2 * i];
String input_str = Printer.array (nums);
int k = inputs[2 * i + 1][0], ans = inputs[2 * i + 1][1], output = tester.findKthLargest (nums, k);
System.out.printf ("%s and %d\n%s\n%d\n",
input_str, k, Printer.wrap (output + "", output == ans ? 92 : 91), ans
);
}
}
}
好像最好是有O(N)算法的, 没有什么印象了;
没什么时间, 随便写写;
首先按照例子, 看到这个k是一个1based, 是一个offset量; 记住这个;
没什么好办法, 好像只能用quicksort那个做法试试?
parititon应该返回一个坐标, 这个也是当时Sedgwick上面的做法;
partition就用的Sedgwick的写法; 注意这个写法, 里面的循环, equal也停的; 然后因为用的是前++, 所以停下来的时候, 比如[i], 已经是在一个greater的数字这里了;
然后就是while true & break的组合拳, 来降低循环写的难度; 还是值得记忆的;
最后一个swap为什么是跟j而不是跟i? 因为这个时候v在前面, 占用的是一个应该less的数字的位置; 而因为前++/--的设定, 出来的时候, 其实j指向的是less的位置, 所以跟j对换;
另外判断长度为1的base case也是我自己加的, 这里居然碰到了; 哎, 前++就真的很烦人;
然后就过了; 速度是42(39), 所以这个肯定不是O(N)解法了; 明天早上起来了解一下;
discuss最优解是一个老外写的, 看起来不错, 明天看一下;
editorial
https://leetcode.com/problems/kth-largest-element-in-an-array/discuss/60294/Solution-explained
This problem is well known and quite often can be found in various text books.
You can take a couple of approaches to actually solve it:
O(N lg N) running time + O(1) memory
The simplest approach is to sort the entire input array and then access the element by it’s index (which is O(1)) operation:
public int findKthLargest(int[] nums, int k) {
final int N = nums.length;
Arrays.sort(nums);
return nums[N - k];
}
O(N lg K) running time + O(K) memory
Other possibility is to use a min oriented priority queue that will store the K-th largest values. The algorithm iterates over the whole input and maintains the size of priority queue.Other possibility is to use a min oriented priority queue that will store the K-th largest values. The algorithm iterates over the whole input and maintains the size of priority queue.
public int findKthLargest(int[] nums, int k) {
final PriorityQueue<Integer> pq = new PriorityQueue<>();
for(int val : nums) {
pq.offer(val);
if(pq.size() > k) {
pq.poll();
}
}
return pq.peek();
}
这个PQ的做法实际上复杂度也是可以接受的, 不过因为形式太简单, 真正面试, 尤其是Google的面试, 肯定是会被追问的;
O(N) best case / O(N^2) worst case running time + O(1) memory
The smart approach for this problem is to use the selection algorithm (based on the partion method - the same one as used in quicksort).
public int findKthLargest(int[] nums, int k) {
k = nums.length - k;
int lo = 0;
int hi = nums.length - 1;
while (lo < hi) {
final int j = partition(nums, lo, hi);
if(j < k) {
lo = j + 1;
} else if (j > k) {
hi = j - 1;
} else {
break;
}
}
return nums[k];
}
private int partition(int[] a, int lo, int hi) {
int i = lo;
int j = hi + 1;
while(true) {
while(i < hi && less(a[++i], a[lo]));
while(j > lo && less(a[lo], a[--j]));
if(i >= j) {
break;
}
exch(a, i, j);
}
exch(a, lo, j);
return j;
}
private void exch(int[] a, int i, int j) {
final int tmp = a[i];
a[i] = a[j];
a[j] = tmp;
}
private boolean less(int v, int w) {
return v < w;
}
他这个partition整体的写法跟Sedgwick的那个写法非常相似, 但是我认为比Sedgwick那个好, 因为他是先判断i, j不出界, 然后再移动i和j; 其他的地方你看他跟sedgwick都是差不多的; 比如初始值都是-1位置, 然后相等的情况内循环也停止; 命名exch和less也都跟sedgwick非常类似; 但是他对内循环的这个小改写, 导致我的base case(判断lo == hi)可以不需要了;
主函数他选择用的是循环, 而不是recursion; 因为是一个类似二分的操作(不是真的二分), 所以这个做法也是完全合理的;
复杂度分析, 我们还是从简单的极端情形来分析先; 假如每次都是对半分, 那么注意这个算法跟quicksort本身的区别; 这个算法我们每一个level是只往一个方向下行的, 而不是跟quicksort那样, 两个branching都要走; 这也是导致为什么复杂度好很多; 至于计算, 就是N (1 + 1/2 + 1/4 + ...), 括号里面应该是有lgN项, 不过计算上面可以简单一些, 就假设有无限项, 这个极限求和很容易算出来是2, 所以说是O(N);
worst case也很好算, 区间是从N, 到N-1, 然后N-2, 这个是完全可能的, 这样最后虽然每次只走一个branching下行, 最后复杂度还是N^2级别的;
O(N) guaranteed running time + O(1) space
So how can we improve the above solution and make it O(N) guaranteed? The answer is quite simple, we can randomize the input, so that even when the worst case input would be provided the algorithm wouldn’t be affected. So all what it is needed to be done is to shuffle the input.
public int findKthLargest(int[] nums, int k) {
shuffle(nums);
k = nums.length - k;
int lo = 0;
int hi = nums.length - 1;
while (lo < hi) {
final int j = partition(nums, lo, hi);
if(j < k) {
lo = j + 1;
} else if (j > k) {
hi = j - 1;
} else {
break;
}
}
return nums[k];
}
private void shuffle(int a[]) {
final Random random = new Random();
for(int ind = 1; ind < a.length; ind++) {
final int r = random.nextInt(ind + 1);
exch(a, ind, r);
}
}
There is also worth mentioning the Blum-Floyd-Pratt-Rivest-Tarjan algorithm that has a guaranteed O(N) running time.
就跟quicksort可以被randomization拯救一样, 这个算法也可以用这个方法;
shuffle这个函数应该还是熟悉的了; 我懒得给我自己的加了;
不过下面还是很多人认为他他说的guaranteed不太严谨; 我觉得面试的时候也可以稍微避免说话太强烈;
@william1971 I don’t see any where mentioned that it’s 100% O(N) guarantee. Your chance of getting the worst case input is 1 / N!. You can do the math.
We need to end this discussion:
Just read this:
http://algs4.cs.princeton.edu/23quicksort/
https://en.wikipedia.org/wiki/Quicksort#Analysis_of_randomized_quicksort
Also if you understand what a permutation is you would know that probability of shuffling leading into worst case scenario input is 1 / N!
强调是一个probablistic guarantee就行了;
PQ做法: 很简单
This is due the fact that the PriorityQueue is implemented as a Binary Heap, which in fact is nothing more then complete binary tree. So both inserting and removing the values through offer() and poll() methods have O(lg K) complexity and altogether since you doing this operation N times the total complexity is O(N lg K)
I have a question for the O(n) solution. From my understanding, this algorithm is based on the quick sort. If there’s a lot of duplication in the array, it will lower the performance. For instance, if there’s only one number in the array, we only can eliminate one number in a recursion. Even randomize the partition can not solve that problem.
So my method to improve that algorithm is to split the array into three parts, which are <, = , > here’s my code. Please tell me your opinion.
public class Solution {
public int findKthLargest(int[] nums, int k) {
return findK(nums,nums.length - k,0,nums.length-1);
}
private int findK(int[] nums,int k, int start, int end){
int parti = nums[start],i=start,m=start;
for(int j=start+1;j<=end;j++){
if(nums[j]>parti)
continue;
if(nums[j]<=parti){
swap(nums,++i,j);
if(nums[j] != parti)
swap(nums,m++,i);
}
}
if(k>=m && k<=i)
return nums[k];
else if(k < m)
return findK(nums,k,start,m-1);
else
return findK(nums,k,i+1,end);
}
private void swap(int[] nums, int a, int b){
int temp = nums[a];
nums[a] = nums[b];
nums[b] = temp;
}
}
这个说法是没问题的, Sedgwick当时书上也说过, 他这个写法的2way partition是有问题的; 后面给出的方法就是三色旗做法的3way partition;
他partition好像是直接在findK里面做的; 没有仔细看了;
严格来说他的说法有问题, 为什么一次只能消除一个? 当duplicate多的是2partition的问题是, swap操作太多了, 但是假如整个数组全都相等, 最后返回的还是一个中间位置的split; 这样最后一个level还是能够砍一半的;
另外如果是我自己实现3partition, 记得最后要把equal分段的上下区间都记录下来, 然后recurse的时候把这一段整个排除掉; 也就是是说最后比如分出来是长度为3,3,3的三个分段, 最后就只能要么往第一个3里面下去, 要么往第三个3里面下去: 中间这个3你的recursion思路要能够排除掉;
I think the Heap solution should be O(K lg N) running time instead of O(N lg K) like you mentioned.
- You build the max heap: O(N)
- Take max element out: O(logN). Do this K times: O(KlogN)
Memory: only O(N), since you can build the heap in place (if given input is an array of size N)
Am I correct?
好像也不是很多人认同他这个观点; 因为最后并没有把所有的N个都扔到heap里面去; heap的size是控制在K的; 虽然不知道java本身是怎么实现的; 如果是自己实现heap, 肯定是先N个全都丢进去, 然后他这里的分析是没问题的;
share my quickSelect solution that beats 99.7% submission
//move the k largest elements to the left part of array
public class Solution {
public int findKthLargest(int[] nums, int k) {
if(nums.length == 1) return nums[0];
int left = 0;
int right = nums.length - 1;
while(left <= right){
int pivotPos = partition(nums, left, right);
if(pivotPos - left + 1 < k){
k = k - (pivotPos - left + 1);//shrink k value
left = pivotPos + 1;//move left to pivotPos + 1
}else if(pivotPos - left + 1 > k){
right = pivotPos - 1;//shrink right by 1 at least
}else{
return nums[pivotPos];
}
}
return 0;
}
//make elements value between [0, leftBound] are all >= pivot
private int partition(int[] array, int left, int right){
int pivotIndex = left + (right - left)/2;
int pivot = array[pivotIndex];
swap(array, pivotIndex, right);
int leftBound = left;
int rightBound = right - 1;
while(leftBound <= rightBound){
if(array[leftBound] >= pivot){
leftBound++;
}else if(array[rightBound] < pivot){
rightBound--;
}else{
swap(array, leftBound++, rightBound--);
}
}
swap(array, leftBound, right);
return leftBound;
}
private void swap(int[] array, int left, int right){
int temp = array[left];
array[left] = array[right];
array[right] = temp;
}
}
没有他说的这么快, 是6(95), 但是还是很神奇; 他这里也没有加shuffle, 为什么比我的快这么多?
好像就是选pivot的时候固定选中间的? 就这能差别这么大? 后面有时间再来看看到底什么鬼;
当然, partition的pivot确实影响最后的速度, 这个不是新鲜事了;
Hey guys, I try to proof O(N) algorithm in worst time.
First see the introduction of algorithm
Below are the steps for implementing algorithm which runs in O(n).
1.If n is small, for example n<6, just="" sort="" and="" return="" the="" k="" smallest="" number.(="" bound="" time-="" 7)="" 2.if="" n="">5, then partition the numbers into groups of 5.(Bound time n/5)
3.Sort the numbers within each group. Select the middle elements (the medians). (Bound time- 7n/5)
4.Call your “Selection” routine recursively to find the median of n/5 medians and call it m. (Bound time-Tn/5)
SELECTION ROUTINE
If k=r, then return m
If k is less than r, then return k th smallest of the set L .(Bound
time T7n/10)
If k is greater than r, then return k-r th smallest of the set R.
5.Compare all n-1 elements with the median of medians m and determine the sets L and R, where L contains all elements m. Clearly, the rank
of m is r=|L|+1 (|L| is the size or cardinality of L). (Bound time- n)
I have also shown how the order is O(n).
T(n)=O(n)+T(n/5)+T(7n/10) We will solve this equation in order to get
the complexity.
We assume that T (n)< Cn
T (n)= an + T (n/5) + T (7n/10)
Cn>=T(n/5) +T(7n/10) + an
C>= Cn/5+ C7n/5 + an
C>= 9C/10 +a
C/10>= a
C>=10a
There is such a constant that exists….so T (n) = O (n)
Why choose the magic number 5 as the partition? Here we go.
Assume the partition number is p. The chosen pivot is not the exact median. We can only know at least 1/2 n/p p/2 numbers are less than the pivot. So the worst case is to find in 1 - 1/2 n/p p/2 = 3n/4.
Then T (n)= a*n + T (n/p) + T (3n/4)
Follow the proof above, we can see that
n/p + 3n/4 < n
1/p + 3/4 < 1
p > 4
so we can choose any partition number > 4, for example 5
没仔细看;
哦, 这个破算法啊; 5bucket那个select算法; 这个跟这题也没有关系啊; 这个算法之前好像还看人实现过一次; 真的算是最难实现的几个基础算法之一了;
@lx5945 because unlike quicksort, you only look at one side
The first layer has O(n) because n is the length of the array, the second layer has O(n/2), the fourth has O(n/4)
Adding them together you actually get O(2n) which is O(n)
是这个意思;
public class Solution {
public int findKthLargest(int[] nums, int k) {
return select(nums, k-1);
}
// Quick select
private int select(int[] nums, int k) {
int left = 0, right = nums.length-1;
while(true) {
if(left == right)
return nums[left];
int pivotIndex = medianOf3(nums, left, right);
pivotIndex = partition(nums, left, right, pivotIndex);
if(pivotIndex == k)
return nums[k];
else if(pivotIndex > k)
right = pivotIndex-1;
else
left = pivotIndex+1;
}
}
//Use median-of-three strategy to choose pivot
private int medianOf3(int[] nums, int left, int right) {
int mid = left + (right - left) / 2;
if(nums[right] > nums[left])
swap(nums, left, right);
if(nums[right] > nums[mid])
swap(nums, right, mid);
if(nums[mid] > nums[left])
swap(nums,left, mid);
return mid;
}
private int partition(int[] nums, int left, int right, int pivotIndex) {
int pivotValue = nums[pivotIndex];
swap(nums, pivotIndex, right);
int index = left;
for(int i = left; i < right; ++i) {
if(nums[i] > pivotValue) {
swap(nums, index, i);
++index;
}
}
swap(nums, right, index);
return index;
}
private void swap(int[] nums, int a, int b) {
int temp = nums[a];
nums[a] = nums[b];
nums[b] = temp;
}
}
这个也是一个6(95)的算法; 优势还是在pivot的选择上面比较有讲究, 并没有去加shuffle;
medianof3算法我好像并没有实现过; 这里看看是怎么回事;
这是什么算法? 好奇怪啊;
哦不对, 这个不是算left..right的median, 而是就是left, right, mid三个当中选择一个的median;
搜索了一下, 这个好像是quicksort里面常用的一个技巧;
This would be better because it reduces the probability of finding "bad" pivots.
算法记忆方式就是你要维护一个从左到右逐渐降低的趋势; 然后先fix left and right, then right and mid, then left and mid;
UNFINISHED
Problem Description
Find the kth largest element in an unsorted array. Note that it is the kth largest element in the sorted order, not the kth distinct element.
For example,
Given [3,2,1,5,6,4] and k = 2, return 5.
Note:
- You may assume k is always valid, 1 ≤ k ≤ array's length.
Credits:
Special thanks to @mithmatt for adding this problem and creating all test cases.
Difficulty:Medium
Total Accepted:207.1K
Total Submissions:508.8K
Contributor:LeetCode
Companies
facebookmicrosoftamazonbloombergapplepocket gems
Related Topics
divide and conquerheap
Similar Questions
Wiggle Sort IITop K Frequent ElementsThird Maximum Number