LeetCode算法编程连载之五

1、题目 - Word Break

Given a string s and a dictionary of words dict, determine if s can be segmented into a space-separated sequence of one or more dictionary words.

For example, given
s = "leetcode",
dict = ["leet", "code"].

Return true because "leetcode" can be segmented as "leet code".

题目解释：

指定字符串s，是否可由字典dict中的字符串组合起来进行表示？

分析：

这题很容易想到回溯法，从字符串第一个位置进行分隔，并逐步往后推移，然后开始递归确认剩余的子串，这样会有很多重复的查找。

采用动态规划，增加记忆功能，大大提高了速度。

最优解公式：dp[j] = dp[i] + EleOfSet(i, j, dict)， i = 0, 1, …, j – 1

（j表示s字符串第j个位置是否可由dict的元素来表示，EleOfSet(i, j, dict)从i到j的字符串，是否是dict的元素）

边界条件：dp[0] = true;

上源码：是不是很简洁？

class Solution {
public:

    bool wordBreak(string s, unordered_set<string> &dict) {

        int size = s.length();

        vector<bool> dp(size + 1, false);
        dp[0] = true;

        for (int i = 1; i <= size; i ++)
        {
            for (int j = 0;  j < i ; j ++)
            {
                if (dp[j] && dict.count(s.substr(j, i - j)) > 0)
                {
                    dp[i] = true;
                }
            }
        }

        return dp[size];
    }
};

2、题目 - Distinct Subsequences

Given a string S and a string T, count the number of distinct subsequences of T in S.

A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie, "ACE" is a subsequence of "ABCDE" while "AEC" is not).

Here is an example:
S = "rabbbit", T = "rabbit"

Return 3.

题目解释：

给定2个字符串S, T，求T在S中出现的次数。要求可以是不连续的，但是T在S中的顺序必须和T以前的一致。

思路：

此题很容易就想到DP来做，做完后看了下网上还是有比较多这题的解释的，但是第一次做的话，要直接想到最优解的公式还是比较难的。

刚开始想到一种方式，用空间来换时间，从s的开头进行遍历，并缓存下所有可能的字符串，运算公式如下：

dp[i] = dp[i- 1] + match_num（dp[i ]：s中第i个位置，已包含的子串T的数量，match_num：所有可能字符串，在添加s[i]后，match上子串T的数量）

class Solution {
public:
    int numDistinct(string s, string t) {
                
        int len = s.length();
        int sub_len = t.length();

        // 缓存下所有可能的字符串
        list<string> possible_strs;
        vector<int> dp(len + 1, 0);
        
        list<string> copy_list;
        for (int i = 0; i < len; i ++)
        {
            char cur_ch = s[i];
            string empty_str;

            int match_num = 0;
            copy_list.clear();

            // 对所有可能的字符串进行遍历，看看能产生多少的匹配量，并更新可能的字符串list
            for (list<string>::iterator iter = possible_strs.begin(); iter != possible_strs.end(); iter ++)
            {
                iter->append(1, cur_ch);
                int  str_len = iter->length();
                if (iter->compare(0, str_len, t, 0, str_len) != 0)
                {
                    *iter = iter->substr(0, str_len - 1);

                    str_len = iter->length();
                    for (int m = str_len - 1; m >= 0; m --)
                    {
                        if ((*iter)[m] != cur_ch)
                        {
                            break;
                        }

                        copy_list.push_back(*iter);
                    }

                    continue;
                }

                if (str_len == sub_len)
                {
                    match_num++;
                }
            }

            for (list<string>::iterator iter = copy_list.begin(); iter != copy_list.end(); iter ++)
            {
                possible_strs.push_back(*iter);
            }

            if (s[i] == t[0])
            {
                string new_str;
                new_str.assign(1, s[i]);
                possible_strs.push_back(new_str);
            }

            dp[i + 1] = dp[i] + match_num;
        }

        return dp[len];
    }
};

很遗憾的是，由于缓存字符串list是个瓶颈，T的长度越长，要缓存的List就越大。

更好的办法

还是dp，关键是如何想到一个能够使用固定空间的最优解公式来解决这个问题，对于DP的问题，个人觉得还是多练习一些这样题目，培养这方面的思路，而且有时候，很容易思路陷进去，一维的数据记录搞不定，就一直会在那想，或者放弃了；其实可以尝试换一种思路，尝试下二维的数据，实在不行，就参考下网上其它人的思路，但问题的关键是，你一定有目标去锻炼你这方面的能力，并在做这些题目的时候，知道如何去锻炼你对算法的能力，以及你如何去扩展思考一些相关的问题。

最优解公式：

如果a[i] == b[j];则 dp[i][j] = dp[i-1][j] + dp[i-1][j-1]

如果a[i] != b[j];则 dp[i][j] = dp[i-1][j]

dp[i][j]表示：T的前j个字符在T的前i个字符中出现的次数。

边界条件：

dp[i][0] = 1,含义是：任何长度的S，如果转换为空串，那就只有删除全部字符这1种方式。

class Solution {
public:
    int numDistinct(string s, string t) {
                
        int len = s.length();
        int sub_len = t.length();

        vector<vector<int> > dp;

        // 初始化二维数组
         vector<int> tmp;
        for (int i = 0; i <= len; i ++)
        {
            tmp.clear();
            for (int j = 0; j <= sub_len; j ++)
            {
                if (j == 0)
                {
                    // 初始化，当子串为空串，只有一种情况可以转换，就是把自已的数据删除掉
                      tmp.push_back(1);
                }
                else
                {
                    tmp.push_back(0);
                }
            }
            dp.push_back(tmp);
        }

        for (int i = 0; i < len; i ++)
        {
            for (int j = 0; j < sub_len; j ++)
            {
                if (s[i] == t[j])
                {
                    dp[i + 1][j + 1] = dp[i][j + 1] + dp[i][j];
                }
                else
                {
                    dp[i + 1][j + 1] = dp[i][j + 1];
                }
            }
        }

        return dp[len][sub_len];
    }
};