NumberOfConnectedComponentsInAnUndirectedGraph [source code]
public class NumberOfConnectedComponentsInAnUndirectedGraph {
static
/******************************************************************************/
public class Solution {
public int countComponents(int n, int[][] edges) {
HashMap<Integer, List<Integer>> adjMap = new HashMap<>();
for (int[] edge : edges) {
for (int i = 0; i < 2; i++) {
int node = edge[i], other = edge[1 - i];
List<Integer> ls = adjMap.get(node);
if (ls == null) {
ls = new ArrayList<Integer>();
adjMap.put(node, ls);
}
ls.add(other);
}
}
int count = 0;
boolean[] visited = new boolean[n];
for (int i = 0; i < n; i++) {
if (!visited[i]) {
count++;
dfs(visited, adjMap, i);
}
}
return count;
}
public void dfs(boolean[] visited, Map<Integer, List<Integer>> map, int node) {
if (visited[node])
return;
visited[node] = true;
List<Integer> ls = map.get(node);
if (ls == null)
return;
for (int i : ls) {
dfs(visited, map, i);
}
}
}
/******************************************************************************/
public static void main(String[] args) {
NumberOfConnectedComponentsInAnUndirectedGraph.Solution tester = new NumberOfConnectedComponentsInAnUndirectedGraph.Solution();
}
}
这个题目跟前面有一个题目就很像, 一个DFS是最简单的做法. 需要注意的是这里是undirected edge, 所以实际上就相当于是two directed edge. 为了方便起见, 可以两个方向都保存了(两个方向是指的一个edge两头的node分别作为adjacency list的key). 具体的adjacency list的存储方式可以选用Map, 这个上一题正好碰到过;
理解了这个思想之后, 这个问题其实就简单了, DFS处理CC问题也是常规了; 最后的速度是16ms (34%), 有点意外的慢;
这里稍微要注意的两个地方就是, 因为这里每一个edge都是双向的, 所以这里helper实际上是要判断visited base case的了; 这个在之前做的一个directed的题目里面倒是不需要; 另外这里helper每个iteration的list get出来之后要判断是否是Null, 如果是Null不能直接用一个for each进行循环, 会报错. 我记得以前好像是不会报错的, 不知道为什么这道题目就不行: 以前记得for each这种循环是会自动handle Null list的;
discussion最优解是UF做法, 而且速度也是submission最优解: 2ms:
public int countComponents(int n, int[][] edges) {
int[] roots = new int[n];
for(int i = 0; i < n; i++) roots[i] = i;
for(int[] e : edges) {
int root1 = find(roots, e[0]);
int root2 = find(roots, e[1]);
if(root1 != root2) {
roots[root1] = root2; // union
n--;
}
}
return n;
}
public int find(int[] roots, int id) {
while(roots[id] != id) {
roots[id] = roots[roots[id]]; // optional: path compression
id = roots[id];
}
return id;
}
总体来说跟之前的几个UF问题的处理基本是相似的:
- initialization as N groups
- path compression
- reduce count
- find method(写法要稍微记忆一下, 熟悉起来)
记住这几个细节, 一个UF函数还是很好写的;
这里没有处理rank, rank按照之前碰到的时候的说法, 其实是一个加速的技巧, 所以这里其实是还有加速提升空间的;
没发生一次valid union, group数量就少一个. 最后剩下的数量就是实际的group的数量, 也就是CC的数量, 这个之前也解释过了;
有人分析复杂度:
The complexity for M quick union + path compression on N objects should be N + MlgN, I think in this problem, M = 2E, N = V, so O(V + 2ElgV), correct me if this is wrong
我认为这个分析不对, 首先, M在这里应该是跟E有关系, 而不是2E. 其次, M按理应该是比E小很多的, 当然WorstCase可以认为相等; 另外这个lgN我也认为不对, 这里的tree不一定是binary的, 所以这个lg其实很含糊. 实际上的log很有可能是log_100之类的东西;
如果是BestCase, 那么可以做到O(N + M). 也就是每一个union只要O(1)的cost. 只要一个比如(1, 0), (2, 0), (3, 0) ...这样的edge组合就行了. 每一次union的find都是实际上不需要走一个traverse;
如果是BestCase, 那么可以做到O(N + M * N^2)的复杂度. 比如: (1, 0), (2, 1), (3, 2) ...这样的组合, 最后得到的就是一个最坏的复杂度, 所以无论如何我认为她的分析都是错的; (注意每个union里面是有两个find); 这个是没有path compression的情况; //不对, 这个计算的不对, 每一个union做不到N^2.
如果回忆一下当时算法课上讲的东西, 一次find最坏的cost应该是rank最大值, 不对是最大深度; 这里这个没有rank的做法, 但是最后好像也是可以形成一个binary tree的概念, 只要你追求WorstCase;
想了一下, 如果没有path compression, 那么一个WorstCase可以是: (0, 1), (0, 2), (0, 3)...这个最后可以做到1 + 2 + ... + M的cost for all M unions; 那么复杂度也就是O(N + M^2);
一个边界效应记忆一下: 假如这个path的长度只有1, 比如只有1 -> 2(注意UF问题里面, 箭头方向跟普通的tree里面的箭头的理解有所区别: a -> b可以理解为parent[a] = b, 也就是指向parent); 那么这个compress(by halving)下来是还是1的; 而0->1->2被compress下来就是0->1<-2了. 类似于一个(len + 1) /2的操作; (可以在len从3开始继续验证);
最后我认为WorstCase复杂度应该还是O(N + M^2). 举例子来看大概是这样, 但是没有办法证明;
这个是Sedgwick对于UF的介绍: https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf
有空在看吧;
discussion里有些其他人adjacency list用一个array of lists来做, 其实也是可以的, 跟Map差不多, 都能做到快速access. 事实上, 可能比Map还快一些, 因为每一个entry实现的方式是一样的(都是list);
这里有一个用了total path compression的UF:
public class Solution {
public int countComponents(int n, int[][] edges) {
if (n <= 1) {
return n;
}
int[] roots = new int[n];
for (int i = 0; i < n; i++) {
roots[i] = i;
}
for (int[] edge : edges) {
int x = find(roots, edge[0]);
int y = find(roots, edge[1]);
if (x != y) {
roots[x] = y;
n--;
}
}
return n;
}
public int find(int[] roots, int id) {
int x = id;
while (roots[id] != id) {
id = roots[id];
}
while (roots[x] != id) {
int fa = roots[x];
roots[x] = id;
x = fa;
}
return id;
}
}
注意他这里这个find写法. path compression是专门写的, 看起来代码很像swap, 但是实际上不是;
这个人同时还写了一个DFS版本:
public class Solution {
public int countComponents(int n, int[][] edges) {
if (n <= 1) {
return n;
}
List<List<Integer>> adjList = new ArrayList<List<Integer>>();
for (int i = 0; i < n; i++) {
adjList.add(new ArrayList<Integer>());
}
for (int[] edge : edges) {
adjList.get(edge[0]).add(edge[1]);
adjList.get(edge[1]).add(edge[0]);
}
boolean[] visited = new boolean[n];
int count = 0;
for (int i = 0; i < n; i++) {
if (!visited[i]) {
count++;
dfs(visited, i, adjList);
}
}
return count;
}
public void dfs(boolean[] visited, int index, List<List<Integer>> adjList) {
visited[index] = true;
for (int i : adjList.get(index)) {
if (!visited[i]) {
dfs(visited, i, adjList);
}
}
}
}
代码非常的clean; 注意他这里helper里面的处理: 跟main函数里面一样, 同样加上一个涂色判断. 这样就不用单独处理一个base case了, 而且代码也和main函数保持一致, 更加好记忆;
这个是他写的BFS:
public class Solution {
public int countComponents(int n, int[][] edges) {
if (n <= 1) {
return n;
}
List<List<Integer>> adjList = new ArrayList<List<Integer>>();
for (int i = 0; i < n; i++) {
adjList.add(new ArrayList<Integer>());
}
for (int[] edge : edges) {
adjList.get(edge[0]).add(edge[1]);
adjList.get(edge[1]).add(edge[0]);
}
boolean[] visited = new boolean[n];
int count = 0;
for (int i = 0; i < n; i++) {
if (!visited[i]) {
count++;
Queue<Integer> queue = new LinkedList<Integer>();
queue.offer(i);
while (!queue.isEmpty()) {
int index = queue.poll();
visited[index] = true;
for (int next : adjList.get(index)) {
if (!visited[next]) {
queue.offer(next);
}
}
}
}
}
return count;
}
}
这里有一个人提出一个观点:
One suggestion to your code (BFS): mark the node as visited when you add it to the queue, not when it's popped. Otherwise you might add nodes multiple times to the queue. The result will still be OK but it's unnecessary computation.
这个观点是对的, 这里是可能有cycle的, 所以下一个level的某一个node可能是当前level不同node的共同的child. 这样就会被重复的add. 虽然实际上没有太大差别, 因为就算是在一个level当中, 这一个node被add了两次, 但是第一个add的occurrence被pop并且处理之后, 第二个occurrence其实就自动被跳过了;
不过能够注意到还是很聪明的, 这种细节好像CLRS上面有过介绍?
这个是submission最优解, 属于一个模板的UF代码, 贴在这里当做参考:
public class Solution {
public class UnionFind {
private int count;
private int[] id;
public UnionFind(int n) {
count = n;
id = new int[n];
for(int i = 0; i < n; i++) {
id[i] = i;
}
}
public int find(int p) {
while(p != id[p]) {
id[p] = id[id[p]];
p = id[p];
}
return p;
}
public void union(int p, int q) {
int i = find(p);
int j = find(q);
if (i == j) {
return;
}
id[i] = j;
count --;
}
public int count() {
return count;
}
}
public int countComponents(int n, int[][] edges) {
if (edges == null || edges.length == 0) {
return n;
}
UnionFind uf = new UnionFind(n);
for (int[] edge : edges) {
uf.union(edge[0], edge[1]);
}
return uf.count();
}
}
速度是3ms;
Problem Description
Given n nodes labeled from 0 to n - 1 and a list of undirected edges (each edge is a pair of nodes), write a function to find the number of connected components in an undirected graph.
Example 1:
0 3
| |
1 --- 2 4
Given n = 5 and edges = [[0, 1], [1, 2], [3, 4]], return 2.
Example 2:
0 4
| |
1 --- 2 --- 3
Given n = 5 and edges = [[0, 1], [1, 2], [2, 3], [3, 4]], return 1.
Note:
You can assume that no duplicate edges will appear in edges. Since all edges are undirected, [0, 1] is the same as [1, 0] and thus will not appear together in edges.
Difficulty:Medium
Total Accepted:26.5K
Total Submissions:55.3K
Contributor: LeetCode
Companies
google twitter
Related Topics
depth-first search breadth-first search union find graph
Similar Questions
Number of Islands Graph Valid Tree Friend Circles
Problem Description
这个是我在discussion里面po的关于复杂度的讨论:
@Ximin.Z I do not quite agree with your conclusion.
Discussion of Complexity
I am not currently capable of coming up with a formal proof or explanation, but for now, I have two bones to pick with your conclusion:
M = 2E
does not seem valid to me: even if each edge triggered aunion
operation, which is a worst case scenario, we would still haveM = E
per your definition ofM
, combined with the code by op.- the
lg
part does not seem justified or illustrated for me: what is the base of yourlg
? With path compression. The value difference is not asymptotically significant, but does cause a difference in understanding.
I am only putting two cases here:
Best Case
Consider such a set of edge: (1,0), (2,0), (3,0), (4,0)...
.
It would be easy to verify the cost in such a scenario(note that each edge triggers one valid union
in this case, and also, path compression does not matter):
Also note that each union's contribution to the total cost is the sum of the two find
calls.
Edge | Graph | contribution to Cost |
---|---|---|
(1,0) | 1->0 | 1 + 1 |
(2,0) | 2->0<-1 | 1 + 1 |
(3,0) | ... | 1 + 1 |
(4,0) | ... | 1 + 1 |
You can verify by drawing the graph on paper yourself, and I can't do it here since the graph goes beyond one dimension upon iteration. It is not hard to generate in this case that the best case performance is O(N + 2M)
(constant factor ignored).
Worst Case
First, let's assume without the path compression. Consider such a case:
Edge|Graph|contribution to Cost
------|-------|-----------------------
(0,1)|0->1|1 + 1
(0,2)|0->1->2|1 + 1
(0,3)|0->1->2->3|2 + 1
(0,4)|0->1->2->3->4|3 + 1
It is not hard to generalize in this case that for M
unions, the total cost is gonna be O(N + M + sum(1 to M)) = O(N + M^2)
.
What about putting path compression by halving on?
Consider this case below. You have to draw something on paper to actually see the point of this case. The graph is just impossible to put on here.
Edge | contribution to Cost (reminder: two find calls) |
---|---|
(0,1) | 1 + 1 |
(0,2) | 1 + 1 |
(0,3) | 2 + 1 |
(0,4) | 2 + 1 |
(1,5) | 3 + 1 |
(1,6) | 3 + 1 |
(0,7) | 4 + 1 |
(2,8) | 4 + 1 |
I do not have a theoretical for this case, but it is likely to generalize here that the total cost, with the aid of path compression by halving, would be O(N + M + 2 * sum(1 to M/2))
, which would still be O(N + M^2)
. (the portion of cost done by M
union
s are reduced by half though).
So in conclusion, I do not think it is justified to say that the cost here is O(N + MlgN)
here. I am tempted to say O(N + M^2)
but I do not for now have a formal proof.
If we do path compression in the textbook way (every one visited compressed to root) rather than by halving, I believe the total cost could be able to be significantly reduced.