[Leetcode] Binary search, two pointers -- 392. Is Subsequence

Given a string s and a string t, check if s is subsequence of t.

You may assume that there is only lower case English letters in both s and t. t is potentially a very long (length ~= 500,000) string, and s is a short string (<=100).

A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie, "ace" is a subsequence of "abcde" while "aec" is not).

Example 1:
s = "abc", t = "ahbgdc"

Return true.

Example 2:
s = "axc", t = "ahbgdc"

Return false.

Follow up:
If there are lots of incoming S, say S1, S2, ... , Sk where k >= 1B, and you want to check one by one to see if T has its subsequence. In this scenario, how would you change your code?

Solution:

1. 1st naive method, use two pointers;
　 iterate every s' character cs using ponter i ,and every t's character ct using pointer j, judge cs == ct and record the matching prevPos = j
　　if cs == ct, increase i by 1 or else increase j + 1 until every cs is checked
　　m = len(s), n= len(t), worst time complexity o(m*(n-m))

 1         i = 0
 2         prevPos = -1
 3         while (i < len(s)):
 4             flag = False
 5             j = prevPos + 1
 6             while (j < len(t)):
 7                 if s[i] == t[j]:
 8                     flag = True
 9                     prevPos = j
10                     break
11                 j += 1
12             if not flag:
13                 return False
14             else:
15                 i += 1
16         return True

2. 2 nd use binary search (but it has TIME LIMIT EXCEEDED)

　　(1)set up a new list indtLst contain the tuple of element and its index ,
　　(2)then sort the indtLst
　　(3) iterate every cs of s in t and binary search cs in t, record the search current position pos and previous position prevPos

 1         indtLst = sorted([(ct, i) for i, ct in enumerate(t)])
 2         #print (" t: ", t)
 3         i = 0
 4         prevPos = -1
 5         flag = True
 6         while (i < len(s)):
 7             pos = bisect_left(indtLst, (s[i],))
 8             #print (" s111: ", i, s[i], pos, prevPos ,t[pos][1] )
 9             flag = False
10             if pos ==len(indtLst) or indtLst[pos][0] != s[i]:
11                 return False
12             elif indtLst[pos][0] == s[i] and indtLst[pos][1]  > prevPos:
13                 flag = True
14                 prevPos = indtLst[pos][1]
15                 #t.pop(pos)
16                 i += 1
17                 #print (" t222: ", i, s[i], pos, prevPos)
18             elif indtLst[pos][0] == s[i] and indtLst[pos][1]  <= prevPos:         #if there is duplicated element cs in s
19                 #if there are repeated character in "s"
20                 k = pos+1
21                 while k < len(indtLst) and indtLst[k][0] == s[i]:
22                     if indtLst[k][0] == s[i] and indtLst[k][1] > prevPos:
23                         #t.pop(k)
24                         #print ('ddddddddd: ',t[k][1], prevPos, k)
25                         prevPos = indtLst[k][1]
26                         flag = True
27                         i += 1
28                         break
29                     k += 1
30                 if not flag:
31                     return False
32                 #print ('w2233: ', i, s[i], pos, prevPos,k)
33 
34         return True

note: the reason for the TLE problem is (1) the sort of lst and the binary search could also take extra time (2) the multiple duplicated cs search in t also take linear time.

so we may set up a dictionary to store every element ct and its index in t, then the search for cs in t would take o(1) time for non-duplicated cs, o(n) time for duplicated cs.