MaxChunksToMakeSortedII [source code]

public class MaxChunksToMakeSortedII {
static
/******************************************************************************/
class Solution {
    public int maxChunksToSorted(int[] arr) {
        int n = arr.length;
        int[] max = new int[n], min = new int[n];
        max[0] = arr[0];
        min[n - 1] = arr[n - 1];
        for (int i = 1; i < n; i++)
            max[i] = Math.max (max[i - 1], arr[i]);
        for (int i = n - 2; i >= 0; i--)
            min[i] = Math.min (min[i + 1], arr[i]);
        int res = 0;
        for (int i = 0; i < n - 1; i++) if (max[i] <= min[i + 1])
            res++;
        return res + 1;
    }
}
/******************************************************************************/

    public static void main(String[] args) {
        MaxChunksToMakeSortedII.Solution tester = new MaxChunksToMakeSortedII.Solution();
        int[][] inputs = {
            {5,4,3,2,1}, {1},
            {2,1,3,4,4}, {4},
        };
        for (int i = 0; i < inputs.length / 2; i++) {
            int[] arr = inputs[2 * i];
            int ans = inputs[2 * i + 1][0];
            System.out.println (Printer.separator ());
            int output = tester.maxChunksToSorted (arr);
            System.out.printf ("[%s] -> %s, expected: %d\n", 
                Printer.array (arr), Printer.wrapColor (output + "", output == ans ? "green" : "red"), ans
            );
        }
    }
}

之前一题的Follow-Up, 很快做出来了, 速度是12ms (NA);

editorial

Approach #1: Sliding Window [Accepted]

Intuition

Let's try to find the smallest left-most chunk.

Algorithm

Notice that if a1,a2,…,am
is a chunk, and a1,a2,…,an
is a chunk (m<n), then am+1,am+2,…,an
is a chunk too. This shows that a greedy approach produces the highest number of chunks.

这句话是什么意思? 为什么这样就知道这个是greedy?

We know the array arr should end up like expect = sorted(arr). If the count of the first k elements minus the count what those elements should be is zero everywhere, then the first k elements form a valid chunk. We repeatedly perform this process.

We can use a variable nonzero to count the number of letters where the current count is non-zero.

class Solution {  
    public int maxChunksToSorted(int[] arr) {  
        Map<Integer, Integer> count = new HashMap();  
        int ans = 0, nonzero = 0;  

        int[] expect = arr.clone();  
        Arrays.sort(expect);  

        for (int i = 0; i < arr.length; ++i) {  
            int x = arr[i], y = expect[i];  

            count.put(x, count.getOrDefault(x, 0) + 1);  
            if (count.get(x) == 0) nonzero--;  
            if (count.get(x) == 1) nonzero++;  

            count.put(y, count.getOrDefault(y, 0) - 1);  
            if (count.get(y) == -1) nonzero++;  
            if (count.get(y) == 0) nonzero--;  

            if (nonzero == 0) ans++;  
        }  

        return ans;  
    }  
}

除开上面那句话, 这个算法的思路还是很明确的, 因为确实是这样, 当你从0开始走的时候, 你expected的counts信息, 你是知道的, 这个就很明显可以用sliding window来解决;

稍微有点看不懂, 大概比算了一下. 顺便, 这个是5,4,3,2,1,6的trace:

i:(0), nonzero:(0)  
{}  
i:(1), nonzero:(2)  
{1=-1, 5=1}  
i:(2), nonzero:(4)  
{1=-1, 2=-1, 4=1, 5=1}  
i:(3), nonzero:(4)  
{1=-1, 2=-1, 3=0, 4=1, 5=1}  
i:(4), nonzero:(2)  
{1=-1, 2=0, 3=0, 4=0, 5=1}  
i:(5), nonzero:(0)  
{1=0, 2=0, 3=0, 4=0, 5=0}

注意, 在[2]的位置, 不要算错了; 我比算的时候在这里一直算错的;

这里的这个nonzero判断的实际上是指的两个array的针对某一个字母的count的差值不是zero: 那么说明两个的frequency不一致; 理解了这个变量的意思, 这个算法还是不难看懂的;

注意他这里还是不少技巧的, 一个是moore voting类型的一个计算; 这里这个差值的定义, 其实是有点类似Moore voting里面的那种统计方式的; 不过后来想想其实也不一样, 其实这里就是计算差值而已;

novelty counting

另外一个技巧, 就是看他这里是怎么来判断nonzero的更新的; 这个计算方式实际上是非常类似NLP的时候Jason教的novelty count; 反正你就是正常的更新count, 然后每次更新完了记得检查一下更新之后的结果就行了; 这个技巧还是可以的;

Approach #2: Sorted Count Pairs [Accepted]

Intuition

As in Approach #1, let's try to find the smallest left-most chunk, where we have some expectation expect = sorted(arr)

If the elements were distinct, then it is enough to find the smallest k with max(arr[:k+1]) == expect[k], as this must mean the elements of arr[:k+1] are some permutation of expect[:k+1].

Since the elements are not distinct, this fails; but we can amend the cumulative multiplicity of each element to itself to make the elements distinct.

Algorithm

Instead of elements x, have counted elements (x, count) where count ranges from 1 to the total number of x present in arr.

Now cur will be the cumulative maximum of counted[:k+1], where we expect a result of Y = expect[k]. We count the number of times they are equal.

class Solution {  
    public int maxChunksToSorted(int[] arr) {  
        Map<Integer, Integer> count = new HashMap();  
        List<Pair> counted = new ArrayList();  
        for (int x: arr) {  
            count.put(x, count.getOrDefault(x, 0) + 1);  
            counted.add(new Pair(x, count.get(x)));  
        }  

        List<Pair> expect = new ArrayList(counted);  
        Collections.sort(expect, (a, b) -> a.compare(b));  

        Pair cur = counted.get(0);  
        int ans = 0;  
        for (int i = 0; i < arr.length; ++i) {  
            Pair X = counted.get(i), Y = expect.get(i);  
            if (X.compare(cur) > 0) cur = X;  
            if (cur.compare(Y) == 0) ans++;  
        }  

        return ans;  
    }  
}  

class Pair {  
    int val, count;  
    Pair(int v, int c) {  
        val = v; count = c;  
    }  
    int compare(Pair that) {  
        return this.val != that.val ? this.val - that.val : this.count - that.count;  
    }  
}

awice好像没有意识到这题最优的写法, 也就是我上面借鉴的之前题目的discussion的写法; 不过他这个写法, 你还是可以看出来他的功力深厚的, 用了很多的技巧;

注意他这里sort的方式, 直接在comparator里面调用自己写的compare函数, 这样就不用override Pair的compareTo函数了;

另外, 你本能的可能想要直接把这个Pair用一个int[]来做, 这个问题我们以前讲过了, 用单独的class来做, 可以做出更多的flexibility (可以写自己的compareTo, toString之类的), 而且可读性也更强, 这个在实际面试的时候也更加有利: 面试官未必有心情帮你记忆一个数组到底哪个位置对应哪个功能: 而且人家完全可以用你不尊重maintainability为由刁难你: 工作中这样写aggregate是很有问题的, 后来人很可能不知道你写的什么意思;

另外, 他这里这个用amend cumulative multiplicity来处理duplicate(强行制造distinct)的思路也是有点意思; 注意这个preprocess其实并没有占用很多的时间;

注意他这里的Pair的compare的实现方式, 先正序比较val, 然后正序比较count, 这样保证在原来的array里面, 同一数字的later occurrence更大; 这样因为代码写的是, 只要有>(也可能是duplicate的数字, 只不过multiplicity不一样导致的), 那么守门员cur也会被更新, 那么就算是4,4这样紧邻的duplicate也会被split, 这个是我们想要的semantic;

后面的内容就一样了, 反正awice的这两个解法都高度依赖于跟sorted之后的结果的直接对比, 这个算是相对naive一点的思路;

discussion最优解还是这个, 之前看到过一次了:

@shawngao said in Java solution, left max and right min.:

Algorithm: Iterate through the array, each time all elements to the left are smaller (or equal) to all elements to the right, there is a new chunck.
Use two arrays to store the left max and right min to achieve O(n) time complexity. Space complexity is O(n) too.
This algorithm can be used to solve ver1 too.

class Solution {  
    public int maxChunksToSorted(int[] arr) {  
        int n = arr.length;  
        int[] maxOfLeft = new int[n];  
        int[] minOfRight = new int[n];  

        maxOfLeft[0] = arr[0];  
        for (int i = 1; i < n; i++) {  
            maxOfLeft[i] = Math.max(maxOfLeft[i-1], arr[i]);  
        }  

        minOfRight[n - 1] = arr[n - 1];  
        for (int i = n - 2; i >= 0; i--) {  
            minOfRight[i] = Math.min(minOfRight[i + 1], arr[i]);  
        }  

        int res = 0;  
        for (int i = 0; i < n - 1; i++) {  
            if (maxOfLeft[i] <= minOfRight[i + 1]) res++;  
        }  

        return res + 1;  
    }  
}

这个是下面一个很有意思的分析:

@mark943 said in Java solution, left max and right min.:

I had the exact same thought!

The intuition came by comparing this problem with Problem 763: Partition Labels. For reference, I've also included my solution to the "Partition Labels" problem.

Edit (Explanation): Assume we have an array A of length n. The objective is to split A into as many chunks as possible, such that when we sort all these chunks and concatenate them, the result will be a fully sorted array in ascending order.

Notice that as we traverse A, we can split A at index i into A[:i+1] and A[i+1:] if and only if max(A[:i+1]) <= min(A[i+1:]). This to maintain the invariant that sorted(A[:i+1]) + sorted(A[i+1:]) == sorted(A).

这个invariant的定义很好, sorted问题其实很多时候是可以降维到最值问题来帮助理解的, 就好像之前的那个median题, 也是这个思路;

Therefore, all we need to do is iterate over the array, and every time we encounter max(A[:i+1]) <= min(A[i+1:]), we split the array.

Now, suppose I split at some index i (where 0 <= i < n-1, and then save this index as begin := i. You may be thinking, on the next iteration, I should be comparing max(A[begin : i+1] with min(A[i+1:]) instead of max(A[:i+1]) with min(A[i+1]:).

However, notice that both max_val and min_val must be sorted in ascending order (just look at how they're created). So, by definition max(A[begin : i+1]) = max(A[:i+1]), for all begin <= i < n. So, we don't even need the begin pointer, we just compare max(A[:i+1]) with min(A[i+1:]) on every iteration.

这个点我之前看这个算法的时候还真没有理解到! 实际上就是解释, 你为什么可以用subarray so far max来作为chunk max来使用. 当然, 当你知道这样讲出来之后, 就发现这个其实是一个很trivial的statement了; 不过能想到这个还是挺重要的;

注意, 这个justification只有max需要, 而从右往左的min, 是不需要这个论断的;

An optimisation: rather then do max and min operations on every iteration (which would lead to an O(n^2) algorithm), we perform a "pre-processing pass" over A to compute the max. value up to index every index i, and the min. value down to every index i.

class Solution:  
    def maxChunksToSorted(self, arr):  
        """  
        :type arr: List[int]  
        :rtype: int  
        """  
        if len(arr) == 1:  
            return 1  

        # `max_val[i]` gives the maximum value in A[:i+1]  
        max_val    = [-math.inf] * len(arr)  
        max_val[0] = arr[0]  

        # `min_val[i]` gives the minimum value in A[i:]  
        min_val     = [math.inf] * len(arr)  
        min_val[-1] = arr[-1]  

        n = len(arr)  
        count = 0  

        for i in range(1, n):  
            max_val[i]         = max(max_val[i-1], arr[i])  
            min_val[n - i - 1] = min(min_val[n - i], arr[n - i - 1])  

        # Get number of chunks  
        count = 1  
        for i in range(n - 1):  
            if max_val[i] <= min_val[i+1]:  
                count += 1  
        return count

class Solution:  
    def partitionLabels(self, s):  
        """  
        :type s: str  
        :rtype: List[int]  
        """  
        res = []  

        # 1. Traverse the string and record the last index of each character  
        last_index = dict()  
        for i in range(len(s)):  
            last_index[s[i]] = i  

        # 2. Keep track of the end index of the current substring using variable `end`.  
        #    Partition once `i` crosses `end`.  
        end = 0  
        start = 0  
        for i in range(len(s)):  
            end = max(end, last_index[s[i]])  
            if i == end:  
                res.append(i + 1 - start)  

                # Update pointer to start tracking next substring  
                start = i + 1  
        return res

discussion另外一个解法:

@votrubac said in C++ 7 lines, O (n * log n) / O(n):

Same as Max Chunks To Make Sorted (ver. 1). We just need to normalize the input array so it contains the indexes. We use the sorting for the normalization, and one trick here is to have a stable sort, so that if we have the same value, the index will be lowest for the value appearing first.

int maxChunksToSorted(vector<int>& v) {  
    vector<int> arr(v.size());  
    iota(arr.begin(), arr.end(), 0);  
    sort(arr.begin(), arr.end(), [&v](int i1, int i2) {return v[i1] == v[i2] ? i1 < i2 : v[i1] < v[i2]; });  

    for (auto i = 0, max_i = 0, ch = 0; i <= arr.size(); ++i) {  
      if (i == arr.size()) return ch;  
      max_i = max(max_i, arr[i]);  
      if (max_i == i) ++ch;  
    }  
}

这个解法思路还是有点意思的, 大概类似我之前自己刷这道面经的时候的思路, 基本就是把这个问题reduce到ver1的做法: 通过一个辅助的full assoc的array来完成; 注意他这里这个sort操作, 用v来帮助sort arr, which is the auxiliary array.

@lily4ever said in Simple Java Solution with explanation:

In Ver 1, the basic idea is to use max[] array to keep track of the max value until the current position, and compare it to the sorted array. If the max[i] equals the element at index i in the sorted array, then the final count++.
Here the only difference is that we need to a variable 'upperLimit' to help determine whether the current point is the correct division position.

For example,
original:     2, 1, 4, 4, 3, 5, 7, 6  
max:          2, 2, 4, 4, 4, 5, 7, 7  
sorted:       1, 2, 3, 4, 4, 5, 6, 7
The chunks are: 2, 1 | 4, 4, 3 | 5 | 7, 6

It needs to be noted that at index 3, although max[i] == sorted[i], this is not the right dividing point. Otherwise, the number after it (3) will be in the wrong chunk.

    public int maxChunksToSorted(int[] arr) {  
        if (arr == null || arr.length == 0) return 0;  

        int[] sorted = arr.clone();  
        Arrays.sort(sorted);  

        int[] max = new int[arr.length];  
        max[0] = arr[0];  
        for (int i = 1; i < arr.length; i++) {  
            max[i] = Math.max(max[i - 1], arr[i]);  
        }  

        int count = 0;  
        int upperLimit = Integer.MAX_VALUE;  
        for (int i = arr.length - 1; i >= 0; i--) {  
            if (max[i] == sorted[i]) {  
                if (sorted[i] > upperLimit) continue;  

                count++;  
                upperLimit = arr[i];  
            }  
        }  

        return count;  
    }

感觉他这个算法就是一个强行在ver1的基础上变通得到的; 注意他最后这个循环采用的是倒序;

大概笔算了一下, 几个细节, 一个是, 他的这个count, 是在每一个chunk的开头更新的, 这样就没有了收尾问题(当然一开始的处理也要恰当);

不过想了半天, 还是不能完美的解释, 为什么当我们发现max[i] == sorted[i]之后, 要额外的排除掉sorted[i] > upperLimit的情况, 认为这些是false positive, 也就是这里的这个continue指的是: not the right position for partition, this position is still within the same chunk;

比如给的例子, 在[4]的位置我们start了一个新的chunk, 然后到了[3]的位置, 我们就出现了这个false positive, 那么怎么解释呢? 有没有一个什么完整的理论支撑这个思路? 当然, 我认为假如我们应该在[3] split, 那么一个必要条件肯定是sorted[3] <= arr[4]因为总体要sorted, 但是这样一个必要条件的排除, 是写算法的一个好方式吗? 总感觉不够硬.

watch了这个帖子, 看看他后面会不会给出解释;

discussion和submission暂时没有了;

Problem Description

This question is the same as "Max Chunks to Make Sorted" except the integers of the given array are not necessarily distinct, the input array could be up to length 2000, and the elements could be up to 10**8.

Given an array arr of integers (not necessarily distinct), we split the array into some number of "chunks" (partitions), and individually sort each chunk. After concatenating them, the result equals the sorted array.

What is the most number of chunks we could have made?

Example 1:

Input: arr = [5,4,3,2,1]  
Output: 1  
Explanation:  
Splitting into two or more chunks will not return the required result.  
For example, splitting into [5, 4], [3, 2, 1] will result in [4, 5, 1, 2, 3], which isn't sorted.

Example 2:

Input: arr = [2,1,3,4,4]  
Output: 4  
Explanation:  
We can split into two chunks, such as [2, 1], [3, 4, 4].  
However, splitting into [2, 1], [3], [4], [4] is the highest number of chunks possible.

Note:

arr will have length in range [1, 2000].
arr[i] will be an integer in range [0, 10**8].

Difficulty:Hard
Total Accepted:1.2K
Total Submissions:2.8K
Contributor:yuxiang1515
Companies
google
Related Topics
array
Similar Questions
Max Chunks To Make Sorted

Hint 1
Each k for which some permutation of arr[:k] is equal to sorted(arr)[:k] is where we should cut each chunk.

LC779 MaxChunksToMakeSortedII