NumberOfCornerRectangles [source code]


public class NumberOfCornerRectangles {
static
/******************************************************************************/
class Solution {
    public int countCornerRectangles(int[][] grid) {
        int R = grid.length, C = grid[0].length;
        if (R < 2 || C < 2)
            return 0;
        // can't use array, has to use map
        Map<Integer, List<Integer>> rowOnes = new HashMap<> (), colOnes = new HashMap<> ();
        for (int i = 0; i < R; i++) {
            for (int j = 0; j < C; j++)
                if (grid[i][j] == 1) {
                    // each list is ordered;
                    rowOnes.computeIfAbsent (i, key -> new ArrayList<> ()).add (j);
                    colOnes.computeIfAbsent (j, key -> new ArrayList<> ()).add (i);
                }
        }
        int[][] dp = new int[R][C];
        for (int i = 0; i < R; i++) {
            for (int j = 0; j < C; j++) {
                // usually: i==0 && j==0, else if x == 0, else if j==0; but can be combined here
                if (i == 0 || j == 0) {
                    dp[i][j] = 0;
                    continue;
                }
                // Tricky: has to handle overlap
                dp[i][j] = dp[i][j - 1] + dp[i - 1][j] - dp[i - 1][j - 1];
                if (grid[i][j] == 1) {
                    List<Integer> curRowOnes = rowOnes.get (i), curColOnes = colOnes.get (j);
                    // curRowOnes stores Y(j), not X(i)!!!
                    for (int y : curRowOnes) {
                        if (y >= j)
                            break;
                        for (int x : curColOnes) {
                            if (x >= i)
                                break;
                            if (grid[i][y] == 1 && grid[x][j] == 1 && grid[x][y] == 1)
                                dp[i][j]++;
                        }
                    }
                }
            }
        }
        return dp[R - 1][C - 1];
    }
}
/******************************************************************************/

    public static void main(String[] args) {
        NumberOfCornerRectangles.Solution tester = new NumberOfCornerRectangles.Solution();
        int[][][] inputs = {
            {{0,1,0},{1,0,1},{1,0,1},{0,1,0}}, {{1}},   // can you tell immediately what the R and C are for this matrix? 
            {{1, 0, 0, 1, 0},
             {0, 0, 1, 0, 1},
             {0, 0, 0, 1, 0},
             {1, 0, 1, 0, 1}}, {{1}},
             {{1, 1, 1},
             {1, 1, 1},
             {1, 1, 1}}, {{9}},
             {{1, 1, 1, 1}}, {{0}},
        };
        for (int i = 0; i < inputs.length / 2; i++) {
            int[][] grid = inputs[2 * i];
            int ans = inputs[2 * i + 1][0][0];
            System.out.println (Printer.separator ());
            int output = tester.countCornerRectangles (grid);
            System.out.printf ("%s -> %s, expected: %d\n",
                Matrix.printMatrix (grid), Printer.wrapColor (output + "", output == ans ? "green" : "red"), ans
            );
        }
    }
}

tag说是一个DP问题, 但是题目读完一点DP的意思都看不到, 很无奈;

如果强行要说DP的话, 大概知道一点思路; 另外暴露DP问题的一个线索, 很多时候是, 最后的输出只要求一个count类的量, 这种题目最后用DP的可能性非常大; 当然, 也不是全部, 因为就算是DP算法, 一般也都是可以保留back pointer的, 所以就算是一般意义上的输出的问题, 也可以完成;

但是这题我还是决定先用笨办法写写看, 因为虽然好像感觉知道DP的意思, 但是没看出来有什么优势;

在实际写的过程中, 这个函数写了一半:


    void countRect (int xLo, int xHi, int yLo, int yHi, int[][] grid, Set<String> set) {  
        if (grid[xLo][yLo] != 1)  
            return;  
        int xIncre = xHi > xLo ? 1 : -1, yIncre = yHi > yLo ? 1 : -1;  
        for (int j = yLo; j <= yHi; j += yIncre) {  
            if (grid[xLo][j] == 1 && grid[])  
        }  
    }

开始意识到这个题目为什么需要用DP来做了; 这个题目最后的算法跟NLP里面的CKY算法很类似, 你在两个方向上要寻找的都是O(N), 所以如果直接用暴力寻找, 最后的复杂度会很高; CKY的时候那个blow-up会更厉害; 这里的这个还好一点, 是O(N^2); 所以后来想了想还是用DP来写, 又不是不会写;

而且暴力搜索的方法还有一个问题, 就是要从四个顶点对应的四个方向分别搜索一次, 用DP的话, 这个问题也是自动就没有了; //不对, 这个并没有解决; //不对, 可以解决;

另外写着写着发现对题目的理解有错误, 实际上这题不用DP还真写不出来;

最后花了很多时间, 写了上面这个算法, 但是并没有AC, 而是TLE了; 算了, 差不多知道题目意思就行了; 我上面的算法其实就是模仿CKY算法的思路设计的, 最后的复杂度应该是N^4, 按理说应该不是不能接受, 不过估计应该是有更好的算法; (这个算法保存在下面了);

x is col(i), y is row(j)

在设计的过程中, 想到一个问题, 很多时候我们的坐标用[i][j]来表示, 有时候又用[x][y]来表示, 无形中就是暗示了一个x跟i的对应, y跟j的对应; 但是实际上你画图就知道了, 这个跟我们数学上的习惯是不一致的; 数学上比如习惯把x放在col的位置上面; 不过这个东西自己知道以后注意点就行了;
2018-04-23 00:19:29 这个也许就是为什么awice喜欢直接用r和c, 这个意思没有误会;

坏办法 > 没办法

写这个算法的时候, 经历了很多被uncertainty阻挠的过程, 尤其是是否需要四个方向分别跑一遍这个问题, 一度让我觉得无从下手; 这个时候的思路, 按照Guoye的说法, 就是直接硬着头皮写就行了, 虽然看起来写的是一个很不好的办法, 但是写着写着就会发现好像这个uncertainty并没有那么难以解决, 甚至根本就不需要解决;

一个debug的小技巧

当compiler告诉你在这一行dp[i][j] = dp[i][j - 1] + dp[i - 1][j]有超界的时候, 你又不知道具体是哪个, 这个时候一个办法就是直接改写成下面的样子:

dp[i][j] =   
dp[i - 1][j] +   
dp[i][j - 1];

这样再compile之后的行数提示就可以告诉你到底是那个term超界了;

not all dp length needs +1

之前刚写过一个ED的题目, 所以这里刚开始建表的时候也是习惯性的给了[R+1][C+1]的维度, 但是实际上并不是所有的DP表格的维度都需要+1的; 比如这一题就是这样, 直接一个entry就代表ends in this entry的结果就行了;

overlap in dp

这个算法最后一个花了很多时间debug的问题就是overlap的处理; 一般来说在DP问题当中只要subproblem划分的正确, 是完全可以避免overlap的; 但是这题在debug的时候发现我自己写的算法本来是: dp[i][j] = dp[i][j - 1] + dp[i - 1][j], 这个是有overlap的, 当时一度以为这个算法本质上有毛病, 无法解决了; 后来突然想起来, 并不是所有的overlap都是无法调和的, 比如这里的这个overlap实际上就很好解决;

后来思考了一下, 我感觉好像我这个算法是可以修改来躲避这个TLE的, 这个TLE的case是一个很大的sparse的结果; 虽然我这个算法不算好, 但是感觉只要加上对sparse的处理, 可能能过掉这个?

这里是优化之前的代码:

class Solution {  
    public int countCornerRectangles(int[][] grid) {  
        int R = grid.length, C = grid[0].length;  
        if (R < 2 || C < 2)  
            return 0;  
        int[][] dp = new int[R][C];  
        for (int i = 0; i < R; i++) {  
            for (int j = 0; j < C; j++) {  
                // usually: i==0&&j==0, else if i== 0 else if j==0; but can be combined here  
                if (i == 0 || j == 0) {  
                    dp[i][j] = 0;  
                    continue;  
                }  
                // Tricky: has to handle overlap  
                dp[i][j] = dp[i][j - 1] + dp[i - 1][j] - dp[i - 1][j - 1];  
                if (grid[i][j] == 1) {  
                    for (int x = 0; x < i; x++)  
                        for (int y = 0; y < j; y++)  
                            if (grid[i][y] == 1 && grid[x][j] == 1 && grid[x][y] == 1)  
                                dp[i][j]++;  
                }  
            }  
        }  
        return dp[R - 1][C - 1];  
    }  
}

不要随便省略大括号

                if (grid[i][j] == 1) {  
                    List<Integer> curRowOnes = rowOnes.get (i), curColOnes = colOnes.get (j);  
                    for (int x : curRowOnes) //<<<  
                        if (x > i)  
                            break;  
                        for (int y : curColOnes) {  
                            if (y > j)  
                                break;  
                            if (grid[i][y] == 1 && grid[x][j] == 1 && grid[x][y] == 1)  
                                dp[i][j]++;  
                        }  
                        // <<<  
                }

这个代码看起来完全没有问题, 但是就是因为我省略了指出来的位置的两个大括号, 导致在最下面的++的header里面, x根本不visible; 这个东西很玄学, 最好的办法还是保险为见, 不要随便乱省略大括号;

加了sparse之后, 果然就是AC了, 不过速度很丢人: 483ms (NA). 还算可以了, 几经波折, 好歹算是AC了; 而且事实上也是学了很多东西的; 包括sparse的处理, 事实上debug也花了时间, 并不是那么trivial;

2018-04-23 00:15:22 这个题目一个难点在于, 你四个顶点可以在grid里面的任何位置, 当然, 有alignment平行的限制, 但是自由度还是太高了; 最后比较好的做法, 就是限制其中一条边(比如下边, 这个限制的方式就是make it act as the iterator, 这样在每一个iteration里面, 起码我知道我下边是在什么row了);

另外回头来看, 这题的笔记写的是有点含量;

editorial

Approach #1: Count Corners [Accepted]

Intuition

We ask the question: for each additional row, how many more rectangles are added?

For each pair of 1s in the new row (say at new_row[i] and new_row[j]), we could create more rectangles where that pair forms the base. The number of new rectangles is the number of times some previous row had row[i] = row[j] = 1.

Algorithm

Let's maintain a count count[i, j], the number of times we saw row[i] = row[j] = 1. When we process a new row, for every pair new_row[i] = new_row[j] = 1, we add count[i, j] to the answer, then we increment count[i, j].

class Solution {  
    public int countCornerRectangles(int[][] grid) {  
        Map<Integer, Integer> count = new HashMap();  
        int ans = 0;  
        for (int[] row: grid) {  
            for (int c1 = 0; c1 < row.length; ++c1) if (row[c1] == 1) {  
                for (int c2 = c1+1; c2 < row.length; ++c2) if (row[c2] == 1) {  
                    int pos = c1 * 200 + c2;  
                    int c = count.getOrDefault(pos, 0);  
                    ans += c;  
                    count.put(pos, c+1);  
                }  
            }  
        }  
        return ans;  
    }  
}

这个算法总体来说还是很聪明的, 然后awice认为这个算法的复杂度应该是在R*(C^2);

总体来说这个算法的优越性还是对于表格和DP的更加熟练的理解;

他的iteration是以row为单位, 而不是像我这样死板的, 就知道一个entry一个entry的做;
- 2018-04-23 00:26:32 那是怎么想到用row来做的呢? 难道是因为看到要求四边平行, 所以想到? 太牵强了
表格的entry里面存放的内容, 也是非常的有创意; 这个我是真的没有想到;

另外, 他这里选择了用Map而不是一个二维表格, 那么一个问题就来了, 怎么处理double key的问题? 这个问题我在NLP作业的时候也碰到过了, 当时的做法很蠢, 是用的二重map来实现的; 但是这里看到, 其实很简单, 用组合key的方式就行了; 这个方法当时在StackOverflow上面其实看到类似的, 不过那里当时讨论的都是怎么算hash, 然后让HashMap能够自动识别新的key; 实际上自己手动这样算一个组合key, 然后直接将这个组合key, 而不是the owner of the multiple key fields作为HashMap的index就行了;

关于这个解法有一个问题, 我后来想了一下, 这个算法其实不算是DP? 这个其实是一个标准的2sum类算法; 之所以认为这个不是DP, 是因为这里有一个类似update history的过程, 而DP是不可能修改过去的, 这个也是当时463上课的时候的一个重要的教训;

即使是这样, 还是不得不说这一题的迷惑性还是挺强的, 毕竟这个表格的题目, 直接上来就认为是DP的可能性非常的大; 不过最后居然以row为单位是一个2sum算法, 这个是难以想到的;

Approach #2: Heavy and Light Rows [Accepted]

class Solution {  
    public int countCornerRectangles(int[][] grid) {  
        List<List<Integer>> rows = new ArrayList();  
        int N = 0;  
        for (int r = 0; r < grid.length; ++r) {  
            rows.add(new ArrayList());  
            for (int c = 0; c < grid[r].length; ++c)  
                if (grid[r][c] == 1) {  
                    rows.get(r).add(c);  
                    N++;  
                }  
        }  

        int sqrtN = (int) Math.sqrt(N);  
        int ans = 0;  
        Map<Integer, Integer> count = new HashMap();  

        for (int r = 0; r < grid.length; ++r) {  
            if (rows.get(r).size() >= sqrtN) {  
                Set<Integer> target = new HashSet(rows.get(r));  

                for (int r2 = 0; r2 < grid.length; ++r2) {  
                    if (r2 <= r && rows.get(r2).size() >= sqrtN)  
                        continue;  
                    int found = 0;  
                    for (int c2: rows.get(r2))  
                        if (target.contains(c2))  
                            found++;  
                    ans += found * (found - 1) / 2;  
                }  
            } else {  
                for (int i1 = 0; i1 < rows.get(r).size(); ++i1) {  
                    int c1 = rows.get(r).get(i1);  
                    for (int i2 = i1 + 1; i2 < rows.get(r).size(); ++i2) {  
                        int c2 = rows.get(r).get(i2);  
                        int ct = count.getOrDefault(200*c1 + c2, 0);  
                        ans += ct;  
                        count.put(200*c1 + c2, ct + 1);  
                    }  
                }  
            }  
        }  
        return ans;  
    }  
}

上面的editorial1的复杂度是RCC, 这个的复杂度是N(N^(1/2)). 事实上到底这个新算法有没有改进, 还很难说, 因为worstcase这个就是R^(1.5)*C^(1.5). 跟之前的那个RCC比起来谁比较大还不好说;

这个优化的大概原理还是看得懂的, 其实就是对row的density进行一个区分; heavy的row直接用一个稍微暴力一点的方法, 但是实际上最后的速度反而好一些; 每一个heavy row对其他的light row进行依此的全捕捉; 然后用f * (f - 1) / 2的公式来计算可以捕捉到的对应的rectangle的个数; 注意这里:

if (r2 <= r && rows.get(r2).size() >= sqrtN)  
    continue;

这个是因为我们一个row一个row的进行扫描是从上到下, 所以他这里避免让当前的heavy row去捕捉之前的heavy row; 这个类似一个去重的操作; 注意这个操作只要一个方向就行了, 因为我们的iteration就是一个方向的;

另外还要注意他这里的对于sparse的处理, 其实list of lists就行了, 而不用我的那个map of lists; 另外, 我后来想了一下, 为了避免这里的命名的混淆, 比如一个list如果存放的是一个row里面所有1的y, 那么这个list可以叫做类似的ys_of_row这样的, 这样就比较不容易混淆里面装的到底是什么. naming本身确实也是一个学问;

2018-04-23 00:34:44 刚看到算法描述的时候, 很怀疑他这个f怎么快速计算, 毕竟如果heavy的row是full的, 那倒还好; 但是如果不是, 你就无法判断;

然后看了代码, 首先, 他这里是有一个sparse处理的: 这种东西不用白不用的; 其次, 对于heavy row的处理, 有一个去重: 当前的row如果是heavy, 那么上下的light row都可以和我匹配; 当前的row如果是heavy, 我上面的其他heavy, 我就不管了; 用这个order来避免heavy与heavy之间的重复计算;

这个做法其实跟editorial1其实是类似的, 都是固定一条边; 只不过一个是选择底边, 一个是选择heavy row;

当然, 最后f的做法, 其实是一个类似找intersection的过程; 这个过程之所以居然能够线性做完, 就是因为hash结构带来的速度优势; 反正这个做法的技巧性还是比较强的, 有很多engineering角度提速的思考在里面;

这个是discussion另外一个RRC的解法, 相对来说, 比awice的解法要容易想到的多, 虽然我并没有想到:

@jun1013 said in short JAVA AC solution (O(m^2 * n)) with explanation.:

To find an axis-aligned rectangle, my idea is to fix two rows (or two columns) first, then check column by column to find "1" on both rows. Say you find n pairs, then just pick any 2 of those to form an axis-aligned rectangle (calculating how many in total is just high school math, hint: combination).

class Solution {  
    public int countCornerRectangles(int[][] grid) {  
        int ans = 0;  
        for (int i = 0; i < grid.length - 1; i++) {  
            for (int j = i + 1; j < grid.length; j++) {  
                int counter = 0;  
                for (int k = 0; k < grid[0].length; k++) {  
                    if (grid[i][k] == 1 && grid[j][k] == 1) counter++;  
                }  
                if (counter > 0) ans += counter * (counter - 1) / 2;  
            }  
        }  
        return ans;  
    }  
}

需要学习的是他外层两个循环的写法, 如果让你写一个pick 2 rows out of M这样的算法, 你知道这个循环应该这样写吗?

另外这个算法好像其实速度也是334ms, 所以我自己的DP写法其实最后估计也不是特别差;

2018-04-23 00:38:54 有没有想过这个算法为什么快? 这个其实思路跟awice那个是有区别的; 相反, 跟editorial2反而更加类似; 他意识到一个问题, 我们最后要求的, 是一个count, 而不是最后要每一个矩形; 那么你应该有经验, 只是count, 跟实际output出来, 最后能达到的速度, 是有很大的区别的;
他就观察到了, 我们其实对于任何两个row, 只要计算intersection的位置的个数就行了; 当然笨办法就是四个遍历, 但是算intersection, 是可以线性做完的; 最后就把复杂度降低了一个等级;

这个是discussion另一个不太好的方法:

class Solution {  
    public int countCornerRectangles(int[][] grid) {  
        int m = grid.length, n = grid[0].length, cnts = 0;  
        for (int i0 = 0; i0 < m; i0++)  
            for (int j0 = 0; j0 < n; j0++)  
                for (int i = i0 + 1; i < m && grid[i0][j0] != 0; i++)  
                    for (int j = j0 + 1; j < n && grid[i][j0] != 0; j++)  
                        cnts += grid[i][j] * grid[i0][j];  
        return cnts;  
    }  
}

这个就是利用了一个rectangle问题里面的一个常用思路: 一条对角线(及其顶点)决定矩形. 最后这个算法的速度其实是RRCC. 也让我意识到了我自己的DP算法的问题, 我的DP算法最后的速度其实跟暴力解并无两样; 应该有可以优化的地方;

what about DP?

虽然这题对于DP的尝试是失败的, 但是我还是尝试性的总结一下DP的常用思路:

什么表格? 尤其是dimension对应什么? 这个不用太紧张, 因为后面还可能回来重新调整的;
什么entry, 也就是recursive return value是什么? 这个是关键的一步. 到目前位置, 你还没有办法确定你对于这个entry定义的是否正确;
- 这一步的entry的定义, 关键在于准确. 一般来说, 类似tree问题的思路, 这个entry的value的定义, 最好有一定的root involvement. 这个也是我现在思考的问题, 我现在这个DP算法, entry定义的是S[i][j] = number of corner rects in the big rectangle formed by [0][0] and [i][j]. 这个定义是没有root involvement的, 我在想是不是应该换一个定义方式;
如何完成subproblem组合: 尝试根据第二步定义的entry, 来完成一个组合的可能性, 也就是站在一个sub entry, 怎么组合到更大的一个entry? 或者站在一个大entry, 你需要什么小entry? 这个可能不能一次成功, 如果你发现这一步无法完成, 考虑回到第二步重新定义entry;
- Guoye对于DP的原话: 主要就是考虑能否从小一点的推出大一点的. 所以这一点也是他比较看重的;

这题discussion基本都是比较暴力的做法, 跟editorial都没法比; submission暂时看不到;

Problem Description

Given a grid where each entry is only 0 or 1, find the number of corner rectangles.

A corner rectangle is 4 distinct 1s on the grid that form an axis-aligned rectangle. Note that only the corners need to have the value 1. Also, all four 1s used must be distinct.

Example 1:

Input: grid =   
[[1, 0, 0, 1, 0],  
 [0, 0, 1, 0, 1],  
 [0, 0, 0, 1, 0],  
 [1, 0, 1, 0, 1]]  
Output: 1  
Explanation: There is only one corner rectangle, with corners grid[1][2], grid[1][4], grid[3][2], grid[3][4].

Example 2:

Input: grid =   
[[1, 1, 1],  
 [1, 1, 1],  
 [1, 1, 1]]  
Output: 9  
Explanation: There are four 2x2 rectangles, four 2x3 and 3x2 rectangles, and one 3x3 rectangle.

Example 3:

Input: grid =   
[[1, 1, 1, 1]]  
Output: 0  
Explanation: Rectangles must have four distinct corners.

Note:

The number of rows and columns of grid will each be in the range [1, 200].
Each grid[i][j] will be either 0 or 1.
The number of 1s in the grid will be at most 6000.

Difficulty:Medium
Total Accepted:2.6K
Total Submissions:5.3K
Contributor:xinlin5
Companies
facebook
Related Topics
dynamic programming

Hint 1
For each pair of 1s in the new row (say at new_row[i] and new_row[j]), we could create more rectangles where that pair forms the base. The number of new rectangles is the number of times some previous row had row[i] = row[j] = 1.

LC751 NumberOfCornerRectangles