SimilarStringGroups [source code]


public class SimilarStringGroups {
static
/****************************************************************************/
class Solution {
    Map<Integer, List<Integer>> adjlists;

    public int numSimilarGroups(String[] A) {
        Map<String, Integer> word2int = new HashMap<> ();
        for (int i = 0; i < A.length; i++)
            word2int.put (A[i], i);
        int N = A.length;
        adjlists = new HashMap<> ();
        for (int i = 0; i < N; i++) {
            for (int j = i + 1; j < N; j++) {
                if (isSimilar (A[i], A[j])) {
                    adjlists.computeIfAbsent (i, num -> new ArrayList<> ()).add (j);
                    adjlists.computeIfAbsent (j, num -> new ArrayList<> ()).add (i);
                }
            }
        }
        boolean[] visited = new boolean[N];
        int count = 0;
        for (int i = 0; i < N; i++) {
            if (!visited[i]) {
                count++;
                dfs (i, visited);
            }
        }
        return count;
    }

    void dfs (int node, boolean[] visited) {
        visited[node] = true;
        for (int nei : adjlists.getOrDefault (node, Collections.emptyList ())) {
            if (!visited[nei]) {
                dfs (nei, visited);
            }
        }
    }

    boolean isSimilar (String a, String b) {
        if (a.length () != b.length ())
            return false;
        Character aprev = null, bprev = null;
        int mismatches = 0;
        for (int i = 0; i < a.length (); i++) {
            if (a.charAt (i) != b.charAt (i)) {
                if (aprev == null) {
                    aprev = a.charAt (i);
                    bprev = b.charAt (i);
                } else {
                    if (mismatches > 1)
                        return false;
                    if (!(a.charAt (i) == bprev && b.charAt (i) == aprev)) {
                        return false;
                    }
                }
                mismatches++;
            }
        }
        return true;
    }
}
/****************************************************************************/

    public static void main(String[] args) {
        SimilarStringGroups.Solution tester = new SimilarStringGroups.Solution();
        String[][] inputs = {
            {"tars","rats","arts","star"}, {"2"},
            {"jvhpg","jhvpg","hpvgj","hvpgj","vhgjp"}, {"3"},
        };
        for (int i = 0; i < inputs.length / 2; i++) {
            System.out.println (Printer.separator ());
            String[] A = inputs[2 * i];
            int ans = Integer.parseInt (inputs[2 * i + 1][0]), output = tester.numSimilarGroups (A);
            System.out.printf ("%s\n%s\n%d\n", 
                Printer.array (A), Printer.wrap (output + "", output == ans ? 92 : 91), ans
            );
        }
    }
}

看最后一个note的意思, 这题估计卡时间复杂度卡的挺厉害的;

笨办法思路很清晰; 先不考虑isSimilar怎么实现, 假设是一个黑箱; 那么用isSimilar可以定义edge, 然后最后CC的实现就是很简单的;

简单思考一下isSimilar你会怎么实现?

感觉扫一遍就行了啊? 两个string, 如果不同的地方是两个, 而且正好可以互换, 那么就是similar, 否则就不是;

自己做一下看看? 感觉可以写写tokenization的样子; 不对, 人家给你的就是一个string数组, 现成的已经可以tokenize好了的;

这题之前瞄了一眼, uwi是用的UF来写的; 题目的tag里面给了DFS, 毕竟就是CC的题目; 那么我用哪个?

看到这里, 大概理解了这题为什么难写; 虽然说我们决定用similar来定义edge, 但是你现在仍然是没有一个真正维护这个graph的结构, 比如一个adjlists, 或者说一个adjmatrix;

在每一个node, 也就是一个word, 难道你直接遍历所有其他的word, 来决定你跟哪些其他word之间有edge吗? 这个复杂度也太高了;

当然或许可以提前preprocess一个adjlists这样的结构? 一个N^2的遍历, 然后更新每一个word的neighbors;

居然很轻松的就AC了, 水hard;
唯一的一个bug就是一开始similar的实现有点问题: 我一开始的实现是发现一旦有两个mismatch而且能够swap, 直接就return true了; 这个是有问题的; 如果后面还有更多的mismatch, 那么这个就不能返回true;

速度是335(NA), 估计很差;


UNFINISHED


editorial

Approach #1: Piecewise [Accepted]

他intuition里面描述的第二种方法, 其实我也是想到了, 但是我认为既然说了给的A是一个distinct的, 最后肯定穷举所有similar的复杂度要超过直接根据现有的这些来对比;

然后他这里第三段说, 他要组合, 我是觉得有点蛋疼的; 如果W是4, 你的意思是你就穷举一共4C2也就是6个可能性, 有必要吗? 给的A肯定长度不超过6的;

Algorithm

We will build some underlying graph with N nodes, where nodes i and j are connected if and only if A[i] is similar to A[j], then look at the number of connected components.

There are a few challenges involved in this problem, but each challenge is relatively straightforward.

  • Use a helper function similar(word1, word2) that is true if and only if two given words are similar.
  • Enumerate all neighbors of a word, and discover when it is equal to a given word.
  • Use either a union-find structure or a depth-first search, to calculate the number of connected components of the underlying graph. We've showcased a union-find structure in this solution, with notes of a depth-first search in the comments.

For more details, see the implementations below.

class Solution {  
    public int numSimilarGroups(String[] A) {  
        int N = A.length;  
        int W = A[0].length();  
        DSU dsu = new DSU(N);  

        if (N < W*W) { // If few words, then check for pairwise similarity: O(N^2 W)  
            for (int i = 0; i < N; ++i)  
                for (int j = i+1; j < N; ++j)  
                    if (similar(A[i], A[j]))  
                        dsu.union(i, j);  

        } else { // If short words, check all neighbors: O(N W^3)  
            Map<String, List<Integer>> buckets = new HashMap();  
            for (int i = 0; i < N; ++i) {  
                char[] L = A[i].toCharArray();  
                for (int j0 = 0; j0 < L.length; ++j0)  
                    for (int j1 = j0 + 1; j1 < L.length; ++j1) {  
                        swap(L, j0, j1);  
                        StringBuilder sb = new StringBuilder();  
                        for (char c: L) sb.append(c);  
                        buckets.computeIfAbsent(sb.toString(),  
                                x-> new ArrayList<Integer>()).add(i);  
                        swap(L, j0, j1);  
                    }  
            }  

            for (String word: A) { // plus O(W^4), but W*W <= N.  
                int i1 = buckets.get(word).get(0);  
                for (int i2: buckets.get(word))  
                    dsu.union(i1, i2);  
            }  
        }  

        int ans = 0;  
        for (int i = 0; i < N; ++i)  
            if (dsu.parent[i] == i) ans++;  

        return ans;  
    }  

    public boolean similar(String word1, String word2) {  
        int diff = 0;  
        for (int i = 0; i < word1.length(); ++i)  
            if (word1.charAt(i) != word2.charAt(i))  
                diff++  
        return diff <= 2;  
    }  

    public void swap(char[] A, int i, int j) {  
        char tmp = A[i];  
        A[i] = A[j];  
        A[j] = tmp;  
    }  
}  

class DSU {  
    int[] parent;  
    public DSU(int N) {  
        parent = new int[N];  
        for (int i = 0; i < N; ++i)  
            parent[i] = i;  
    }  
    public int find(int x) {  
        if (parent[x] != x) parent[x] = find(parent[x]);  
        return parent[x];  
    }  
    public void union(int x, int y) {  
        parent[find(x)] = find(y);  
    }  
}

awice好像特别喜欢这种组合算法, 之前好像就见到过他用过一次了;

他的similar比我实现的好一点: 实际上只要数确保最后有两个就行了; 不需要判断他们是不是真的能够swap. 自己想想为什么;
不过他的实现也不是完美, 我提出的意见:

  • N will always be O(W^2) because A is all unique and they are all anagrams.
  • in similar, you don't have to count all diffs. when diff_count goes from 2 to 3, you can output false. One example is abcde with bcdea. something like:
if (word1.charAt(i) != word2.charAt(i) && ++diff == 3) return false;

is better.

想了一下, 我的第一条意见其实是不对的, 他这个组合是有道理的; 因为N虽然说是unique anagrams, 不代表最后就一定是O(W^2)! 因为这里每一个anagram最后就是一个permutation, 是factorial级别的, 凭什么是O(W^2)?
感觉是我自己糊涂了;

但是还是没有仔细看他的第二个方法了, 大概意思懂了, 这个题目最后基本上还是考基本功;

面试的话, 可以把这个组合作为Follow-Up的优化提一下, 比较两个复杂度之类的;


uwi: 5分钟

class Solution {  
    public int numSimilarGroups(String[] A) {  
        int n = A.length;  
        DJSet ds = new DJSet(n);  
        for(int i = 0;i < n;i++){  
            for(int j = i+1;j < n;j++){  
                if(ok(A[i], A[j])){  
                    ds.union(i, j);  
                }  
            }  
        }  
        return ds.count();  
    }  

    public boolean ok(String a, String b)  
    {  
        int diff = 0;  
        for(int i = 0;i < a.length();i++){  
            if(a.charAt(i) != b.charAt(i)){  
                diff++;  
            }  
        }  
        return diff <= 2;  
    }  

    class DJSet {  
        public int[] upper;  

        public DJSet(int n) {  
            upper = new int[n];  
            Arrays.fill(upper, -1);  
        }  

        public int root(int x) {  
            return upper[x] < 0 ? x : (upper[x] = root(upper[x]));  
        }  

        public boolean equiv(int x, int y) {  
            return root(x) == root(y);  
        }  

        public boolean union(int x, int y) {  
            x = root(x);  
            y = root(y);  
            if (x != y) {  
                if (upper[y] < upper[x]) {  
                    int d = x;  
                    x = y;  
                    y = d;  
                }  
                upper[x] += upper[y];  
                upper[y] = x;  
            }  
            return x == y;  
        }  

        public int count() {  
            int ct = 0;  
            for (int u : upper)  
                if (u < 0)  
                    ct++;  
            return ct;  
        }  
    }  
}

Problem Description

Two strings X and Y are similar if we can swap two letters (in different positions) of X, so that it equals Y.

For example, "tars" and "rats" are similar (swapping at positions 0 and 2), and "rats" and "arts" are similar, but "star" is not similar to "tars", "rats", or "arts".

Together, these form two connected groups by similarity: {"tars", "rats", "arts"} and {"star"}. Notice that "tars" and "arts" are in the same group even though they are not similar. Formally, each group is such that a word is in the group if and only if it is similar to at least one other word in the group.

We are given a list A of unique strings. Every string in A is an anagram of every other string in A. How many groups are there?

Example 1:

Input: ["tars","rats","arts","star"]  
Output: 2

Note:

  • A.length <= 2000
  • A[i].length <= 1000
  • A.length * A[i].length <= 20000
  • All words in A consist of lowercase letters only.
  • All words in A have the same length and are anagrams of each other.
  • The judging time limit has been increased for this question.

Difficulty:Hard
Total Accepted:958
Total Submissions:2.8K
Contributor:awice
Companies
google
Related Topics
depth-first searchunion findgraph

results matching ""

    No results matching ""