SimilarStringGroups [source code]
public class SimilarStringGroups {
static
/****************************************************************************/
class Solution {
Map<Integer, List<Integer>> adjlists;
public int numSimilarGroups(String[] A) {
Map<String, Integer> word2int = new HashMap<> ();
for (int i = 0; i < A.length; i++)
word2int.put (A[i], i);
int N = A.length;
adjlists = new HashMap<> ();
for (int i = 0; i < N; i++) {
for (int j = i + 1; j < N; j++) {
if (isSimilar (A[i], A[j])) {
adjlists.computeIfAbsent (i, num -> new ArrayList<> ()).add (j);
adjlists.computeIfAbsent (j, num -> new ArrayList<> ()).add (i);
}
}
}
boolean[] visited = new boolean[N];
int count = 0;
for (int i = 0; i < N; i++) {
if (!visited[i]) {
count++;
dfs (i, visited);
}
}
return count;
}
void dfs (int node, boolean[] visited) {
visited[node] = true;
for (int nei : adjlists.getOrDefault (node, Collections.emptyList ())) {
if (!visited[nei]) {
dfs (nei, visited);
}
}
}
boolean isSimilar (String a, String b) {
if (a.length () != b.length ())
return false;
Character aprev = null, bprev = null;
int mismatches = 0;
for (int i = 0; i < a.length (); i++) {
if (a.charAt (i) != b.charAt (i)) {
if (aprev == null) {
aprev = a.charAt (i);
bprev = b.charAt (i);
} else {
if (mismatches > 1)
return false;
if (!(a.charAt (i) == bprev && b.charAt (i) == aprev)) {
return false;
}
}
mismatches++;
}
}
return true;
}
}
/****************************************************************************/
public static void main(String[] args) {
SimilarStringGroups.Solution tester = new SimilarStringGroups.Solution();
String[][] inputs = {
{"tars","rats","arts","star"}, {"2"},
{"jvhpg","jhvpg","hpvgj","hvpgj","vhgjp"}, {"3"},
};
for (int i = 0; i < inputs.length / 2; i++) {
System.out.println (Printer.separator ());
String[] A = inputs[2 * i];
int ans = Integer.parseInt (inputs[2 * i + 1][0]), output = tester.numSimilarGroups (A);
System.out.printf ("%s\n%s\n%d\n",
Printer.array (A), Printer.wrap (output + "", output == ans ? 92 : 91), ans
);
}
}
}
看最后一个note的意思, 这题估计卡时间复杂度卡的挺厉害的;
笨办法思路很清晰; 先不考虑isSimilar怎么实现, 假设是一个黑箱; 那么用isSimilar可以定义edge, 然后最后CC的实现就是很简单的;
简单思考一下isSimilar你会怎么实现?
感觉扫一遍就行了啊? 两个string, 如果不同的地方是两个, 而且正好可以互换, 那么就是similar, 否则就不是;
自己做一下看看? 感觉可以写写tokenization的样子; 不对, 人家给你的就是一个string数组, 现成的已经可以tokenize好了的;
这题之前瞄了一眼, uwi是用的UF来写的; 题目的tag里面给了DFS, 毕竟就是CC的题目; 那么我用哪个?
看到这里, 大概理解了这题为什么难写; 虽然说我们决定用similar来定义edge, 但是你现在仍然是没有一个真正维护这个graph的结构, 比如一个adjlists, 或者说一个adjmatrix;
在每一个node, 也就是一个word, 难道你直接遍历所有其他的word, 来决定你跟哪些其他word之间有edge吗? 这个复杂度也太高了;
当然或许可以提前preprocess一个adjlists这样的结构? 一个N^2的遍历, 然后更新每一个word的neighbors;
居然很轻松的就AC了, 水hard;
唯一的一个bug就是一开始similar的实现有点问题: 我一开始的实现是发现一旦有两个mismatch而且能够swap, 直接就return true了; 这个是有问题的; 如果后面还有更多的mismatch, 那么这个就不能返回true;
速度是335(NA), 估计很差;
UNFINISHED
editorial
Approach #1: Piecewise [Accepted]
他intuition里面描述的第二种方法, 其实我也是想到了, 但是我认为既然说了给的A是一个distinct的, 最后肯定穷举所有similar的复杂度要超过直接根据现有的这些来对比;
然后他这里第三段说, 他要组合, 我是觉得有点蛋疼的; 如果W是4, 你的意思是你就穷举一共4C2也就是6个可能性, 有必要吗? 给的A肯定长度不超过6的;
Algorithm
We will build some underlying graph with N nodes, where nodes i and j are connected if and only if A[i] is similar to A[j], then look at the number of connected components.
There are a few challenges involved in this problem, but each challenge is relatively straightforward.
- Use a helper function similar(word1, word2) that is true if and only if two given words are similar.
- Enumerate all neighbors of a word, and discover when it is equal to a given word.
- Use either a union-find structure or a depth-first search, to calculate the number of connected components of the underlying graph. We've showcased a union-find structure in this solution, with notes of a depth-first search in the comments.
For more details, see the implementations below.
class Solution {
public int numSimilarGroups(String[] A) {
int N = A.length;
int W = A[0].length();
DSU dsu = new DSU(N);
if (N < W*W) { // If few words, then check for pairwise similarity: O(N^2 W)
for (int i = 0; i < N; ++i)
for (int j = i+1; j < N; ++j)
if (similar(A[i], A[j]))
dsu.union(i, j);
} else { // If short words, check all neighbors: O(N W^3)
Map<String, List<Integer>> buckets = new HashMap();
for (int i = 0; i < N; ++i) {
char[] L = A[i].toCharArray();
for (int j0 = 0; j0 < L.length; ++j0)
for (int j1 = j0 + 1; j1 < L.length; ++j1) {
swap(L, j0, j1);
StringBuilder sb = new StringBuilder();
for (char c: L) sb.append(c);
buckets.computeIfAbsent(sb.toString(),
x-> new ArrayList<Integer>()).add(i);
swap(L, j0, j1);
}
}
for (String word: A) { // plus O(W^4), but W*W <= N.
int i1 = buckets.get(word).get(0);
for (int i2: buckets.get(word))
dsu.union(i1, i2);
}
}
int ans = 0;
for (int i = 0; i < N; ++i)
if (dsu.parent[i] == i) ans++;
return ans;
}
public boolean similar(String word1, String word2) {
int diff = 0;
for (int i = 0; i < word1.length(); ++i)
if (word1.charAt(i) != word2.charAt(i))
diff++
return diff <= 2;
}
public void swap(char[] A, int i, int j) {
char tmp = A[i];
A[i] = A[j];
A[j] = tmp;
}
}
class DSU {
int[] parent;
public DSU(int N) {
parent = new int[N];
for (int i = 0; i < N; ++i)
parent[i] = i;
}
public int find(int x) {
if (parent[x] != x) parent[x] = find(parent[x]);
return parent[x];
}
public void union(int x, int y) {
parent[find(x)] = find(y);
}
}
awice好像特别喜欢这种组合算法, 之前好像就见到过他用过一次了;
他的similar比我实现的好一点: 实际上只要数确保最后有两个就行了; 不需要判断他们是不是真的能够swap. 自己想想为什么;
不过他的实现也不是完美, 我提出的意见:
- N will always be O(W^2) because A is all unique and they are all anagrams.
- in
similar
, you don't have to count all diffs. when diff_count goes from 2 to 3, you can output false. One example isabcde
withbcdea
. something like:
if (word1.charAt(i) != word2.charAt(i) && ++diff == 3) return false;
is better.
想了一下, 我的第一条意见其实是不对的, 他这个组合是有道理的; 因为N虽然说是unique anagrams, 不代表最后就一定是O(W^2)! 因为这里每一个anagram最后就是一个permutation, 是factorial级别的, 凭什么是O(W^2)?
感觉是我自己糊涂了;
但是还是没有仔细看他的第二个方法了, 大概意思懂了, 这个题目最后基本上还是考基本功;
面试的话, 可以把这个组合作为Follow-Up的优化提一下, 比较两个复杂度之类的;
uwi: 5分钟
class Solution {
public int numSimilarGroups(String[] A) {
int n = A.length;
DJSet ds = new DJSet(n);
for(int i = 0;i < n;i++){
for(int j = i+1;j < n;j++){
if(ok(A[i], A[j])){
ds.union(i, j);
}
}
}
return ds.count();
}
public boolean ok(String a, String b)
{
int diff = 0;
for(int i = 0;i < a.length();i++){
if(a.charAt(i) != b.charAt(i)){
diff++;
}
}
return diff <= 2;
}
class DJSet {
public int[] upper;
public DJSet(int n) {
upper = new int[n];
Arrays.fill(upper, -1);
}
public int root(int x) {
return upper[x] < 0 ? x : (upper[x] = root(upper[x]));
}
public boolean equiv(int x, int y) {
return root(x) == root(y);
}
public boolean union(int x, int y) {
x = root(x);
y = root(y);
if (x != y) {
if (upper[y] < upper[x]) {
int d = x;
x = y;
y = d;
}
upper[x] += upper[y];
upper[y] = x;
}
return x == y;
}
public int count() {
int ct = 0;
for (int u : upper)
if (u < 0)
ct++;
return ct;
}
}
}
Problem Description
Two strings X and Y are similar if we can swap two letters (in different positions) of X, so that it equals Y.
For example, "tars" and "rats" are similar (swapping at positions 0 and 2), and "rats" and "arts" are similar, but "star" is not similar to "tars", "rats", or "arts".
Together, these form two connected groups by similarity: {"tars", "rats", "arts"} and {"star"}. Notice that "tars" and "arts" are in the same group even though they are not similar. Formally, each group is such that a word is in the group if and only if it is similar to at least one other word in the group.
We are given a list A of unique strings. Every string in A is an anagram of every other string in A. How many groups are there?
Example 1:
Input: ["tars","rats","arts","star"]
Output: 2
Note:
- A.length <= 2000
- A[i].length <= 1000
- A.length * A[i].length <= 20000
- All words in A consist of lowercase letters only.
- All words in A have the same length and are anagrams of each other.
- The judging time limit has been increased for this question.
Difficulty:Hard
Total Accepted:958
Total Submissions:2.8K
Contributor:awice
Companies
google
Related Topics
depth-first searchunion findgraph