RepeatedSubstringPatternOPT2 [source code]

public class RepeatedSubstringPatternOPT2 {
static
/******************************************************************************/
public class Solution {
    public boolean repeatedSubstringPattern(String str) {
        int l = str.length();
        int[] lps = new int[l];
        int leading = 1;
        int chasing = 0;
        while (leading < l) {
            if(str.charAt(chasing) == str.charAt(leading)) {
                chasing++;
                lps[leading] = chasing;
                leading++;
            } else {
                if (chasing > 0) {
                    chasing = lps[chasing - 1];
                } else {
                    chasing = 0;
                    leading++;
                }
            }
        }
        int lp = lps[l - 1];
        return (lp > 0 && l % (l - lp) == 0 && str.startsWith(str.substring(lp)));
    }
}
/******************************************************************************/

    public static void main(String[] args) {
        RepeatedSubstringPatternOPT2.Solution tester = new RepeatedSubstringPatternOPT2.Solution();
        String[] inputs = {"abab", "aba", "abcabcabcabc"};
        for (String s : inputs) System.out.printf("%s -> %s%n", s, tester.repeatedSubstringPattern(s));
    }
}

这个是一个基于 KMP 的解, 速度是25(83), 还算是不错的, 关键是学习 KMP 在这种一般问题当中拓展应用;

这个是作者自己的解释:
The problem of the first solution is that we do not use the knowledge of failed matching, and the Knuth-Morris-Pratt algorithm is a classic example of how knowledge of failed tries can be use to guide future search.

In fact we only need to compute the pattern table (the lps array, see below) in the Knuth-Morris-Pratt algorithm.

The entry lps[i] is the length of the longest proper prefix that is also a suffix of (s[0], ..., s[i]), or equivalently, length of the longest prefix that is also a proper suffix of (s[0], ..., s[i]). lps[0] is 0, since a single - character string has no proper prefix or proper suffix. Here is a very detailed explanation on the KMP algorithm and how lps is computed dynamically.

After we get lps, we relate the property of the lps table to the property of a string being constructed by joining copies of its substring.

One on hand, if str = (s[0], ..., s[km - 1]) is constructed by joining m copies of its substring substr = (s[0], ..., s[k-1]), and assuming that substr is the finest making blockstr can be boiled down to, meaning str is not constructed by joining copies of any proper substring of substr. Then we must have lps[km - 1] equals (m - 1)k.

On the other hand, assuming that the longest proper prefix prefixStr of string str that is also a suffix, and the remaining string remainderStr obtained by removing prefix from str satisfies the following 3 properties:

  1. remainderStr is a proper substring of str,
  2. |str| is divisiable by |remainderStr|,
  3. remainderStr is a prefix of prefixStr.

We can show by induction that str is constructed by joining copies of remainderStr.
Here is the code. It solve the 100 test cases in 29ms. A great improvement over the native approach! Remember the statement above, since we are going to use it again.

简单来讲, 这个作者就是想到了 lps array 跟这个问题之间的关联, 先制作一个lps, 然后找到最后一项, 通过这个最后一项, 我们其实就找到了repeated pattern的长度(如果有 repeated 的话). 这样就避免了一个针对这个pattern length进行穷搜的过程.

看似还是一个很聪明的思路, 不过最后的速度不是最优的;

这个是另外一个人基于类似的思路写的代码:

bool repeatedSubstringPattern(string str) {  
    int i = 1, j = 0, n = str.size();  
    vector<int> dp(n+1,0);  
    while( i < str.size() ){  
        if( str[i] == str[j] ) dp[++i]=++j;  
        else if( j == 0 ) i++;  
        else j = dp[j];  
    }  
    return dp[n]&&dp[n]%(n-dp[n])==0;  
}

代码看起来简练很多;
这个代码也告诉我们, 最上面代码的str.startsWith(str.substring(lp)) 其实也是可以删除的. 我删除了之后, 还是 AC, 不过速度没有加太多, 只是24ms;

这个是另外一个人的解释:
Same implementation.

My p[i] stands for longest common string length of prefix string and the string ended with position i.

The important point is the last one: len % (len - p[len - 1]) == 0

for a string like below, if p[len-1] = 15, len=20:

[#####~~~~~^^^^^]$$$$$  
#####[~~~~~^^^^^$$$$$]

看到这个图的时候, 就感觉其实这几个解法的核心思路都是很类似的;

by p[len-1] = 15, we know the strings colored red are the same.

so you can infer that:

##### == ~~~~~  

~~~~~ == ^^^^^  

^^^^^ == $$$$$

The whole is repeating as #####

the length of it is 5, which can be completely divided by len.

That's how this final condition works.


这个是最上面的代码的作者受到 mod 穷搜的思路启发之后, 优化加速之后的一个代码:

public class Solution {  
    public boolean repeatedSubstringPattern(String str) {  
        int l = str.length();  
        for(int i = (l + 1) / 2; i < l; i++) {  
            if(l % (l - i) == 0) {  
                String prefix = str.substring(0, i);  
                String remainder = str.substring(i);  
                String suffix = str.substring(l - i);  
                if(str.startsWith(remainder) && suffix.equals(prefix)){  
                    return true;  
                }  
            }  
        }  
        return false;  
    }  
}

这个算法有人提出后面的str.startsWith(remainder)其实是多余的.
仔细看看这个算法, 感觉其实跟那个str + str的解法的核心思路是类似的;


Problem Description

Given a non-empty string check if it can be constructed by taking a substring of it and appending multiple copies of the substring together. You may assume the given string consists of lowercase English letters only and its length will not exceed 10000.

Example 1:
Input: "abab"

Output: True

Explanation: It's the substring "ab" twice.
Example 2:
Input: "aba"

Output: False
Example 3:
Input: "abcabcabcabc"

Output: True

Explanation: It's the substring "abc" four times. (And the substring "abcabc" twice.)
Difficulty:Easy
Category:Algorithms
Acceptance:38.17%
Contributor: YuhanXu
Companies
amazon google
Related Topics
string
Similar Questions
Implement strStr()

results matching ""

    No results matching ""