IsSubsequence [source code]

public class IsSubsequence {
static
/******************************************************************************/
class Solution {
    public boolean isSubsequence(String s, String t) {
        char[] chs = s.toCharArray(), cht = t.toCharArray();
        int i = 0, j = 0;
        while (j < cht.length && i < chs.length) {        //think: T will always be exhausted first? not really
            if (cht[j] == chs[i]) {
                i++;
                j++;
            } else {
                j++;
            }
        }
        return i == chs.length;
    }
}
/******************************************************************************/

    public static void main(String[] args) {
        IsSubsequence.Solution tester = new IsSubsequence.Solution();
        String[] inputs = {
            "abc", "ahbgdc", "true",
            "axc", "ahbgdc", "false",
        };
        for (int i = 0; i < inputs.length / 3; i++) {
            String s = inputs[3 * i];
            String t = inputs[1 + 3 * i];
            boolean ans = inputs[2 + 3 * i].equals("true");
            System.out.println(Printer.separator());
            boolean output = tester.isSubsequence(s, t);
            System.out.println(
                Printer.wrapColor(s + " AND " + t, "magenta") +
                " -> " + output +
                Printer.wrapColor(", expected: " + ans, ans == output ? "green" : "red")
                );
        }
    }
}

这题tag上面给的思路还是很明显的, 题目本身的思路也很清晰; 感觉这个题目用DP好像就有点慢了? 先想想人逻辑, 直接对S一个一个按顺序找就行了, 找到一个, 就在剩下的T的范围里面找其他的; 如果某一个s[i]在T里面有多个位置, 直接找最左边的就行了, 因为假如s是一个sub, 那么选哪一个都无所谓, 但是选最左边的这个s[i]可以保证i后面的char可以有更大的可能性命中. 这个应该就是所谓的greedy方案, 因为这个其实是一个确定搜索, 所以最后的算法其实很直白; 但是这个算法的复杂度可能并不那么优秀, T里面的每一个char都要被扫描一遍, 按照题目的表述的意思来看, 这个可能不是最好的结果;

另外这里这个DP的思路, 其实好像也有点理解; 提到DP, 就要去想怎么设计这个reduce to smaller problems的过程, 这里就很简单, S和T都可以直接缩小; 这个算法, 如果我们每一个iteration的时候掐两头, 最后可能速度是greedy的两倍, 但是其实也不是质变;

至于Binary Search的做法没有什么思路; 这里的两个string都不是sorted的, 不太理解Binary Search怎么使用;

用上面笨办法直接写了一个, 立刻就AC了, 算法本身是很简单的, 最后速度是12ms (91%), 居然还可以接受;


这个是Stefan的一个C版本的小程序:

@StefanPochmann said in 3 lines C:

bool isSubsequence(char* s, char* t) {  
    while (*t)  
        s += *s == *t++;  
    return !*s;  
}

Just go through t and "cross off" the matching characters in s. Then return whether there's nothing left in s.

Sometimes, C makes things easier... here it's helpful that C strings are null terminated and that I can easily pop off a string's first character in constant time just by incrementing the pointer.

算法本身意思是不复杂的, 不过可以顺带看一看c的写法;

这个是下面的一个C++的版本, 附带Follow-Up:

@haiwei624 said in 3 lines C:

@Th111 for your question,
First, here is a C++ version:
O(len(t))

class Solution {  
public:  
    bool isSubsequence(string s, string t) {  
      auto i = s.begin();  
        for(char c : t) i += (*i == c);  
        return i == s.end();  
    }  
};

For the Follow-up, assume we have S1...Sk, above C++ code can be change to a simple follow-up version:
Follow-up, O(len(t) * k)

class Solution {  
public:  
    vector<bool> isSubsequence(vector<string> ss, string t) {  
      vector<string::const_iterator> iters(ss.size());  
      for(int i = 0; i < ss.size(); i++) {  
          iters[i] = ss[i].begin();  
      }  
        for(char c : t) {  
          for(int i = 0; i < ss.size(); i++) {  
              iters[i] += *iters[i] == c;  
          }  
        }  
        vector<bool> ans(ss.size());  
        for(int i = 0; i < ss.size(); i++) {  
          ans[i] = iters[i] == ss[i].end();  
        }  
        return ans;  
    }  
};

If you want to make it faster, you can improve it by avoid unnecessary compare:
Follow-up Faster, O(len(t) + sum(len(s[i])))

class Solution {  
public:  
    vector<bool> isSubsequence(vector<string> ss, string t) {  
      vector<string::const_iterator> iters(ss.size());  
      vector<vector<int> > waitingList(26);  
      for(int i = 0; i < ss.size(); i++) {  
          iters[i] = ss[i].begin();  
          waitingList[*iters[i] - 'a'].push_back(i);  
      }  
        for(char c : t) {  
          vector<int> updateList = waitingList[c - 'a'];  
          waitingList[c - 'a'].clear();  
          for(int i : updateList) {  
              iters[i]++;  
              waitingList[*iters[i] - 'a'].push_back(i);  
          }  
        }  
        vector<bool> ans(ss.size());  
        for(int i = 0; i < ss.size(); i++) {  
          ans[i] = iters[i] == ss[i].end();  
        }  
        return ans;  
    }  
};

上面两个算法都没有认真看, 他这里是最后的这个Follow-Up写的还算好; 这个Follow-Up因为有一个T, 和很多的S, 所以我当时的第一想法其实是预处理T, 做成一个类似dictionary的东西, 或者trie这种, 可以查询的结构;

他这里的做法严格来说不是这样; 他其实是调换了查询方向: 一般理解上, 是拿着S去查询T, 但是他这里其实是用一个T去查询众多的S; 具体的算法思路, 有点像bingo game一样的, T只需要走一遍, 然后所有的S看情况更新自己的当前位置(hit了多少个); 最后统计一下每个S有没有hit完就行了;

另外, 这个算法, 有人进一步给出优化:

@wangxinbo said in 3 lines C:

@haiwei624 Hi, I made several improvement on your code. Hope it is useful.

  1. We should check if an iterator has reached the end of s;
  2. We should consider corner case when s is empty;
  3. Also, we can maintain a counter to record the number of s whose comparison is unfinished. If all s have finished their comparison, we can jump out of the loop. This can be helpful when strings in ss are similar, and t is extremely long, however, the worst case time complexity is still O(len(t) + sum(len(s[i]))). BTW, I think I agree with your time complexity, because every character can and can only be touched once, so that's why, right?
    Here I provide my code, please correct me if anybody find a bug.
vector<bool> isSubsequence(vector<string> ss, string t) {  
  vector<string::const_iterator> iters(ss.size());  
  vector<vector<int> > waitingList(26);  
  int unfinished = ss.size();  
  for(int i = 0; i < ss.size(); i++) {  
      iters[i] = ss[i].begin();  
      if(iters[i] != ss[i].end()) waitingList[*iters[i] - 'a'].push_back(i);  
  }  
  for(char c : t) {  
      vector<int> updateList = waitingList[c - 'a'];  
      waitingList[c - 'a'].clear();  
      for(int i : updateList) {  
          iters[i]++;  
          if(iters[i] != ss[i].end()) waitingList[*iters[i] - 'a'].push_back(i);  
          else unfinished--;  
      }  
      if(unfinished == 0) break;  
  }  
  vector<bool> ans(ss.size());  
  for(int i = 0; i < ss.size(); i++) {  
      ans[i] = iters[i] == ss[i].end();  
  }  
  return ans;  
}

思路是对的, 虽然不是质变型的优化, 不过确实是几个常见的思路; 主要就是一个有序的early exit;


一个类似我的做法, 不过他在header里面只判断一个指针, 另外一个指针用来放在里面完成premature exit:

public class Solution {  
    public boolean isSubsequence(String s, String t) {  
        if (s.length() == 0) return true;  
        int indexS = 0, indexT = 0;  
        while (indexT < t.length()) {  
            if (t.charAt(indexT) == s.charAt(indexS)) {  
                indexS++;  
                if (indexS == s.length()) return true;  
            }  
            indexT++;  
        }  
        return false;  
    }  
}

这个是一个scale到Follow-Up的做法:

    // Follow-up: O(N) time for pre-processing, O(Mlog?) for each S.  
    // Eg-1. s="abc", t="bahbgdca"  
    // idx=[a={1,7}, b={0,3}, c={6}]  
    //  i=0 ('a'): prev=1  
    //  i=1 ('b'): prev=3  
    //  i=2 ('c'): prev=6 (return true)  
    // Eg-2. s="abc", t="bahgdcb"  
    // idx=[a={1}, b={0,6}, c={5}]  
    //  i=0 ('a'): prev=1  
    //  i=1 ('b'): prev=6  
    //  i=2 ('c'): prev=? (return false)  
    public boolean isSubsequence(String s, String t) {  
        List<Integer>[] idx = new List[256]; // Just for clarity  
        for (int i = 0; i < t.length(); i++) {  
            if (idx[t.charAt(i)] == null)  
                idx[t.charAt(i)] = new ArrayList<>();  
            idx[t.charAt(i)].add(i);  
        }  

        int prev = 0;  
        for (int i = 0; i < s.length(); i++) {  
            if (idx[s.charAt(i)] == null) return false; // Note: char of S does NOT exist in T causing NPE  
            int j = Collections.binarySearch(idx[s.charAt(i)], prev);  
            if (j < 0) j = -j - 1;  
            if (j == idx[s.charAt(i)].size()) return false;  
            prev = idx[s.charAt(i)].get(j) + 1;  
        }  
        return true;  
    }

思路总体还是比较清晰的, 这个思路就是跟我刚开始的直觉一样, 先对T进行一个预处理; 这里预处理得到是一个array of list, 其实就是一个dictionary/Map结构, 不过他这里把第一维改成array这样可能能够适当加速, 以及第二维用list而不是array, 是因为可以节省一些空间;

关于Collections.binarySearch:

// Returns index of key in sorted list sorted in  
// ascending order  
public static int binarySearch(List slist, T key)  

// Returns index of key in sorted list sorted in  
// order defined by Comparator c.  
public static int binarySearch(List slist, T key, Comparator c)  

If key is not present, the it returns "(-(insertion point) - 1)".   
The insertion point is defined as the point at which the key   
would be inserted into the list.

另外, 这里比如在第二个char的list里面用Binary Search来搜prev其实也是很直观合理的, 因为这就相当于在检查, 是否可以将前一个char插入到当前的这个第二个char之前的位置, 这个也是在做subseq问题的时候需要检查一个order;

注意这里if (j < 0) j = -j - 1;是对返回值进行一个统一化的处理, 因为我们其实不在乎这个search是否hit; 我们只需要一个hit position or insert position; 事实上, 可能hit吗? 不可能的, 不同的char的list之间肯定是disjoint的;

这个是下面一个人给出的总结:

@wangxinbo said in Binary search solution for follow-up with detailed comments:

@cdai Here is my idea for the follow up, based upon your solution : )

Binary search:

  • record the indexes for each character in t, if s[i] matches t[j], then s[i+1] should match a character in t with index bigger than j. This can be reduced to find the first element larger than a value in an sorted array (find upper bound), which can be achieved using binary search.

Trie:

  • For example, if s1 has been matched, s1[last char] matches t[j]. Now, s2 comes, if s1 is a prefix of s2, i.e., s1 == s2.substr[0, i-1], we can start match s2 from s2[i], right?
  • So, the idea is to create a Trie for all string that have been matched so far. At a node, we record the position in t which matches this char represented by the node. Now, for an incoming string s, we first search the longest prefix in the Trie, find the matching position of the last prefix-char in t, say j. Then, we can start matching the first non-prefix-char of s from j+1.
  • Now, if we have done the preprocessing as stated in the binary search approach, we can be even faster.

但是他的第二点这个trie的做法其实我不是很认同, trie的insert也是需要时间的, 这个人虽然利用一个trie来完成对S之间的similarity的利用, 但是现在每一个S被处理的同时还要完成一个对trie的维护, 这个其实最后就相当于一个时间的转嫁; 而且像他说的这种一个S是另外一个S的完整prefix, 这种情况在S很多的情况下出现的不太可能很多; 大部分情况下不同的S之间可能用Shared prefix, 但是不太可能一个整个都是另外一个S的prefix; 这个就当大概了解一下思路, 并不是特别的有意义;

另外, 最上面那个算法, 如果要改成一个针对Follow-Up的算法也很简单, 直接就把这个dictionary做成global to所有的S就行了; 然后每一个S做的内容就是查询这个dictionary;

复杂度分析:

@jdrogin said in Easy to understand binary search solution:

@WKVictor seems this could be a bit simpler

space complexity is length of t, we need to add an index value to our dictionary foreach element of t. You also have the overhead of the keys for t but this does not changed that fact that this is still O(length of t)

time complexity is 1 binary search through 1 dictionary entry for each character of s. Worst case the dictionary entry has all indexes of t so it's cost would be log of the length of t. Multiply that against length of s and you get O(length of s * log(length of t))


这个是一个快的很奇怪的版本:

@reboot329 said in Java. Only 2ms. Much faster than normal 2 pointers.:

Actually another way of 2 pointers, guess indexOf() performs better.

Tested with other 2-pointers methods in discussion, both took 30ms+.

The one below only takes 2ms.

public class Solution   
{  
    public boolean isSubsequence(String s, String t)   
    {  
        if(t.length() < s.length()) return false;  
        int prev = 0;  
        for(int i = 0; i < s.length();i++)  
        {  
            char tempChar = s.charAt(i);  
            prev = t.indexOf(tempChar,prev);  
            if(prev == -1) return false;  
            prev++;  
        }  

        return true;  
    }  
}

Thanks to @Ankai.Liang who looked into both functions and provided us the answer.

In case you guys do not notice, I post Liang Ankai's answer.

@Ankai.Liang said

Hi, good solution.
I checked the origin code of func "indexOf" and "charAt". These two solution both traversed the char of String one by one to search the first occurrence specific char.
The difference is that indexOf only call once function then traversed in "String.value[]" arr, but we used multiple calling function "charAt" to get the value in "String.value[]" arr.
The time expense of calling function made the difference.

这个算法其实还是普通的2pointer, 但是最后的速度我试了一下, 确实有2(97)这么快;

另外, 关于这里indexOf的用法:

Syntax:  
public int indexOf(int ch, int strt)  
Parameters:  
ch :a character.  
strt : the index to start the search from.

这个是另外一个Binary Search的解法:

 // Follow-up  
 // If we check each sk in this way, then it would be O(kn) time where k is the number of s and t is the length of t.   
 // This is inefficient.   
 // Since there is a lot of s, it would be reasonable to preprocess t to generate something that is easy to search for if a character of s is in t.   
 // Sounds like a HashMap, which is super suitable for search for existing stuff.   
public boolean isSubsequence(String s, String t) {  
    if (s == null || t == null) return false;  

    Map<Character, List<Integer>> map = new HashMap<>(); //<character, index>  

    //preprocess t  
    for (int i = 0; i < t.length(); i++) {  
        char curr = t.charAt(i);  
        if (!map.containsKey(curr)) {  
            map.put(curr, new ArrayList<Integer>());  
        }  
        map.get(curr).add(i);  
    }  

    int prev = -1;  //index of previous character  
    for (int i = 0; i < s.length(); i++) {  
        char c = s.charAt(i);  

        if (map.get(c) == null)  {  
            return false;  
        } else {  
            List<Integer> list = map.get(c);  
            prev = binarySearch(prev, list, 0, list.size() - 1);  
            if (prev == -1) {  
                return false;  
            }  
            prev++;  
        }  
    }  

    return true;  
}  

private int binarySearch(int index, List<Integer> list, int start, int end) {  
    while (start <= end) {  
        int mid = start + (end - start) / 2;  
        if (list.get(mid) < index) {  
            start = mid + 1;  
        } else {  
            end = mid - 1;  
        }  
    }  

    return start == list.size() ? -1 : list.get(start);  
}

整体思路还是类似的, 不过这个人Binary Search是自己写的; 另外注意他这里的Binary Search的写法, 是2branch的写法, 同时注意他的header: if (list.get(mid) < index) {, 这里是没有取等号的, 好像2branch的都喜欢这么写? 另外, 因为这里是闭区间, 所以while的header也是取了等号的;

另外, 就是注意他这里的range定义的是一个闭区间, 好像闭区间是比半开半闭要好理解一些;

这个是评论里面一个分析:

your helper function binarySearch finds the first element of list that is greater than or equal to the index that is passed in.

这里用到一个"first", 这个跟之前那个kth in matrix的问题有点类似, 当时一个人也说Binary Search找到的是"first"还是什么的; 为什么能够保证这个first? 当时无法理解那个Binary Search的一个很大的问题就是不能justify这个first的论断;

这两个Binary Search都有一个特点, 就是都是2branch, 也就是都是收敛型的Binary Search;

大概思考了一下, 好像可以理解为什么这种2branch收敛型的就可以保证找到的是第一个了: 你就从第一个level开始分析, 不需要想的太深; 假如我们的第一个mid就hit了, 会发生什么, 比如array是0,1,2,3,4,5,6, 然后第一个是[3], 直接hit, 这个时候我们下面会进入[0,2]搜索, 但是下面的搜索全都保证一直是lo在上升, 最后得到的一个区间(termination的区间是)[3,2], 这个时候的lo是3, 直接return就行了;

假如在[3]的位置有一个平面呢(脑子里, sorted的, 要有一个visual):

                        6  
                    5  
                4  
    1   2   3   
0

比如, 从[1]到[3]全都是相等的, 全都是hit, 那么进入[0,2]之后, 下一个是[0,1], 然后下一个是[1,1], 然后是[1,0], terminate, 最后return的这个lo, 还真的就是first.

First hit, not just any hit!!!

或许这种收敛型的Binary Search就是可以完成这么一个找first hit, not just any hit!!的作用; 不过具体怎样保证自己最后写出来的是一个收敛型的? 首先2branch这个是很自然的; 开区间闭区间这个其实无所谓, 我建议以后还是写闭区间, 比较直观理解一些, 能够清楚的看到我们的range到底是哪个到哪个; 比较难的地方是这个if的header里面怎么选择, 什么时候向左, 什么时候向右; 一般来说, 好像hit的情况(等于的情况), 是向右的; 然后这样就行了吗?

感觉可能还是有很多细节没有真正的理解, 不过暂时就先这样了;


Problem Description

Given a string s and a string t, check if s is subsequence of t.

You may assume that there is only lower case English letters in both s and t. t is potentially a very long (length ~= 500,000) string, and s is a short string (<=100).

A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie, "ace" is a subsequence of "abcde" while "aec" is not).

Example 1:

s = "abc", t = "ahbgdc"  

Return true.

Example 2:

s = "axc", t = "ahbgdc"  

Return false.

Follow up:

  • If there are lots of incoming S, say S1, S2, ... , Sk where k >= 1B, and you want to check one by one to see if T has its subsequence. In this scenario, how would you change your code?

Credits:
Special thanks to @pbrother for adding this problem and creating all test cases.

Difficulty:Medium
Total Accepted:37.2K
Total Submissions:83.5K
Contributor: LeetCode
Companies
pinterest
Related Topics
binary search dynamic programming greedy

results matching ""

    No results matching ""