[LeetCode#30]Substring with Concatenation of All Words

Problem:

https://leetcode.com/problems/substring-with-concatenation-of-all-words/

You are given a string, s, and a list of words, words, that are all of the same length. Find all starting indices of substring(s) in s that is a concatenation of each word in wordsexactly once and without any intervening characters.

For example, given:
s: "barfoothefoobarman"
words: ["foo", "bar"]

You should return the indices: [0,9].
(order does not matter).

Analysis 1:

This problem is very elegant and important. You must master the coding skills behind it.
The idea behind this problem is powerful, you need to possess a good understanding of indexing in String.
---------------------------------------------------------------------
KEY CHARACTERISTIC: all words in the word list share the same length.
----------------------------------------------------------------------
Solution 1 (brute force way):
Brief idea - checking all substrings that could meet the condition. 
How to check: use a hashmap for this purpose.
HashMap<String, Integer> to_find = new HashMap<String, Integer> ();
HashMap<String, Integer> found = new HashMap<String, Integer> ();

to_find is the hash map used for recording the occurence of each word in the array: words.
Note: the same word could appear multiple time in the array.
for (int i = 0; i < word_num; i++) {
    if (to_find.containsKey(words[i]))
        to_find.put(words[i], to_find.get(words[i]) + 1);
    else
        to_find.put(words[i], 1);
}

found is the hash map used for recording the occurence of founed word in a string. 

1.1 How to check the condition by compare found and to_find.
a. if the word does not appear in to_find, the current string could be discarded. 
if (!to_find.containsKey(stub)) break;

b. if the word's occurence in the current string appears more than that in the words array. 
if (found.containsKey(stub))
    found.put(stub, found.get(stub) + 1);
else 
    found.put(stub, 1);
    if (found.get(stub) > to_find.get(stub)) break;

1.2 How to index string and word ?  
Apparently, to index on a string is always difficult and easy to be wrong. 
a. the string must at least have the length of "the concatenation of all words in the array".
The length of the concatenation: word_num * word_len
The length of the string: s_len
***The start index of the last avialable string must be: s_len - word_num * word_len
The reason:
string[s_len - word_num * word_len, s_len - 1] perfectly contain word_num * word_len words.
Calculation: s_len - 1 - (s_len - word_num * word_len) = word_num * word_len

b. the start index of jth word in the current string, whose start index is i.
int cur = i + j * word_len;
String stub = s.substring(cur, cur + word_len);

Solution 1

public List<Integer> findSubstring(String s, String[] words) {
        ArrayList<Integer> ret = new ArrayList<Integer> ();
        if (s == null || s.length() == 0 || words == null || words.length == 0)
            return ret;
        int s_len = s.length();
        int word_num = words.length;
        int word_len = words[0].length();
        HashMap<String, Integer> to_find = new HashMap<String, Integer> ();
        HashMap<String, Integer> found = new HashMap<String, Integer> ();
        for (int i = 0; i < word_num; i++) {
            if (to_find.containsKey(words[i]))
                to_find.put(words[i], to_find.get(words[i]) + 1);
            else
                to_find.put(words[i], 1);
        }
        for (int i = 0; i <= s_len - word_len * word_num; i++) {
            //do not foreget to clear the recording hashmap
            found.clear();
            int j = 0;
            for (j = 0; j < word_num; j++) {
                int cur = i + j * word_len;
                String stub = s.substring(cur, cur + word_len);
                if (!to_find.containsKey(stub)) break;
                if (found.containsKey(stub))
                    found.put(stub, found.get(stub) + 1);
                else 
                    found.put(stub, 1);
                if (found.get(stub) > to_find.get(stub)) break;
            }
            if (j == word_num) ret.add(i);
        }
        return ret;
    }

Analysis 2:

Although the above solution is simple and easy to implement, the time complexity could be as high as O(n^2). There are apparently many repetations in checking.
For example:
abc def ghi jkl ...
start from index 0: check abc, def, ghi, jkl
start from index 3: check def, ghi, jkl
...
We have repeatedly perform the same checking over the same sub sequence, that's totally uncessary.


**************************************************************************************************
Here, let us introduce the idea of sliding window, which could sovle the problem in linear time. 

A fragile logic:
if (found.get(stub) > to_find.get(stub)) {
    found.clear();
    count = 0;
    start = j;
} else{
    count++;
}
if (count == word_num) {
    ret.add(start);
    found.clear();
    count = 0;
    start = j;
}

Solution 2:
The basic idea of sliding window is to avoid repetitive testing against same substring. 
1. Since the word's length is fixed, each character could only appear amongst the position range[0, word_len-1], the same for its neighbor. 
for (int i = 0; i < word_len; i++) {
    int start = i;
    int count = 0;
    HashMap<String, Integer> found = new HashMap<String, Integer> ();
}
We start from each position, then begin to scan the string in a step manner.
for (int j = i; j <= s.length() - word_len; j += word_len) {
    ...
}
Skill: The last available index s.length() - word_len (proved at solution 1)

2. The valid slid window. 
The slid window is the magic tool we used for this solution. The substring in the slide window must meet the concatentaion of some words. The window has left side and right side.
At here, we represent left side as start, we use the 'j' in the loop for right side. 

Cases in the slide window:
2.1 the current stub does not appear in the words array.
solution: move the left side of the window to the start of next stub, since all substring include the current stub could meet the condition.
if (to_find.containsKey(stub)){
...
} else{
    found.clear();
    count = 0;
    start = j + word_len;
}

2.2 the current stub appear in the words array, but it exceeds the occurence in the words array. 
solution: we should move the left side along right direction, until the violation case was removed. 

if (found.get(stub) > to_find.get(stub)) {
    while (found.get(stub) > to_find.get(stub)) {
        String temp = s.substring(start, start + word_len);
        found.put(temp, found.get(temp) - 1);
        start += word_len;  
        if (found.get(temp) < to_find.get(temp))
            count--;
        }
}

Note: other innocent stubs could also be removed at this process. Since we have not add the count for the current stub(violated), but it was included for the next window. Thus we actually chop off one stub, and add the same stub in the window, we do not need to change count.

if (found.get(temp) < to_find.get(temp))
    count--;
}
2.3 the current stub appear in the words array, and it not exceed the occurence in the words array.
solution: increse the count for the slide window
if (found.get(stub) > to_find.get(stub)) {
    ...
} else{
    count++;
}
2.4 the current stub appear in the words array, and the current substring(in the slide window) just meet the condition.
solution: check if we reach the state by using the count of the window. then add the result into the result set. 
Note: we just need to move the left side of the window one step further.
if (count == word_num) {
    ret.add(start);
    String temp = s.substring(start, start + word_len);
    found.put(temp, found.get(temp) - 1);
    start += word_len;
    count--;
}
**********************
The main idea in matining the slide window is to properly adjust the left side, valid count and Hashmap for found.

Solution 2:

public class Solution {
    public List<Integer> findSubstring(String s, String[] words) {
        ArrayList<Integer> ret = new ArrayList<Integer> ();
        if (s == null || s.length() == 0 || words == null || words.length == 0)
            return ret;
        int s_len = s.length();
        int word_num = words.length;
        int word_len = words[0].length();
        HashMap<String, Integer> to_find = new HashMap<String, Integer> ();
        for (int i = 0; i < word_num; i++) {
            if (to_find.containsKey(words[i]))
                to_find.put(words[i], to_find.get(words[i]) + 1);
            else
                to_find.put(words[i], 1);
        }
        for (int i = 0; i < word_len; i++) {
            int start = i;
            int count = 0;
            HashMap<String, Integer> found = new HashMap<String, Integer> ();
            for (int j = i; j <= s.length() - word_len; j += word_len) {
                String stub = s.substring(j, j + word_len);
                if (to_find.containsKey(stub)){
                    if (found.containsKey(stub))
                        found.put(stub, found.get(stub) + 1);
                    else
                        found.put(stub, 1);
                    if (found.get(stub) > to_find.get(stub)) {
                        //chop off the substring from start, including the repetative word
                        while (found.get(stub) > to_find.get(stub)) {
                            String temp = s.substring(start, start + word_len);
                            found.put(temp, found.get(temp) - 1);
                            start += word_len;
                            //for the repeative one, it is already included in the window, we do not need to decrease count for                             it
                            if (found.get(temp) < to_find.get(temp))
                                count--;
                        }
                    } else{
                        count++;
                    }
                    if (count == word_num) {
                        ret.add(start);
                        String temp = s.substring(start, start + word_len);
                        found.put(temp, found.get(temp) - 1);
                        start += word_len;
                        count--;
                    }
                } else{
                    //reset the slide window's left boundary
                    found.clear();
                    count = 0;
                    start = j + word_len;
                }
            }
        }
        return ret;
    }
}