HDU1159 Common Subsequence

Common Subsequence

Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 65536/32768 K (Java/Others)
Total Submission(s): 17390 Accepted Submission(s): 7290

Problem Description

A subsequence of a given sequence is the given sequence with some elements (possible none) left out. Given a sequence X = <x1, x2, ..., xm> another sequence Z = <z1, z2, ..., zk> is a subsequence of X if there exists a strictly increasing sequence <i1, i2, ..., ik> of indices of X such that for all j = 1,2,...,k, xij = zj. For example, Z = <a, b, f, c> is a subsequence of X = <a, b, c, f, b, c> with index sequence <1, 2, 4, 6>. Given two sequences X and Y the problem is to find the length of the maximum-length common subsequence of X and Y.
The program input is from a text file. Each data set in the file contains two strings representing the given sequences. The sequences are separated by any number of white spaces. The input data are correct. For each set of data the program prints on the standard output the length of the maximum-length common subsequence from the beginning of a separate line.

Sample Input

    abcfbc abfcab
programming contest 
abcd mnp
   

Sample Output

Source

Southeastern Europe 2003

Recommend

Ignatius

解题思路：本题是求两个字符串的最长公共子序列。
先用公共子串引入处理方法

最长公共子串（LCS）

找两个字符串的最长公共子串，这个子串要求在原字符串中是连续的。其实这又是一个序贯决策问题，可以用动态规划来求解。我们采用一个二维矩阵来记录中间的结果。这个二维矩阵怎么构造呢？直接举个例子吧："bab"和"caba"(当然我们现在一眼就可以看出来最长公共子串是"ba"或"ab")

b a b

c 0 0 0

a 0 1 0

b 1 0 1

a 0 1 0

我们看矩阵的斜对角线最长的那个就能找出最长公共子串。

不过在二维矩阵上找最长的由1组成的斜对角线也是件麻烦费时的事，下面改进：当要在矩阵是填1时让它等于其左上角元素加1。

b a b

c 0 0 0

a 0 1 0

b 1 0 2

a 0 2 0

这样矩阵中的最大元素就是最长公共子串的长度。

在构造这个二维矩阵的过程中由于得出矩阵的某一行后其上一行就没用了，所以实际上在程序中可以用一维数组来代替这个矩阵。

同理处理最长公共子序列

最长公共子序列

最长公共子序列与最长公共子串的区别在于最长公共子序列不要求在原字符串中是连续的，比如ADE和ABCDE的最长公共子序列是ADE。

我们用动态规划的方法来思考这个问题如是求解。首先要找到状态转移方程：

等号约定，C1是S1的最右侧字符，C2是S2的最右侧字符，S1‘是从S1中去除C1的部分，S2'是从S2中去除C2的部分。

LCS(S1,S2)等于下列3项的最大者：

（1）LCS（S1，S2’）

（2）LCS（S1’，S2）

（3）LCS（S1’，S2’）--如果C1不等于C2； LCS（S1'，S2'）+C1--如果C1等于C2；

边界终止条件：如果S1和S2都是空串，则结果也是空串。

下面我们同样要构建一个矩阵来存储动态规划过程中子问题的解。这个矩阵中的每个数字代表了该行和该列之前的LCS的长度。与上面刚刚分析出的状态转移议程相对应，矩阵中每个格子里的数字应该这么填，它等于以下3项的最大值：

（1）上面一个格子里的数字

（2）左边一个格子里的数字

（3）左上角那个格子里的数字（如果 C1不等于C2）；左上角那个格子里的数字+1（如果C1等于C2）

举个例子：

G C T A

0 0 0 0 0

G 0 1 1 1 1

B 0 11 1 1

T 0 1 1 2 2

A 0 1 1 2 3

填写最后一个数字时，它应该是下面三个的最大者：

（1）上边的数字2

（2）左边的数字2

（3）左上角的数字2+1=3,因为此时C1==C2

所以最终结果是3。

在填写过程中我们还是记录下当前单元格的数字来自于哪个单元格，以方便最后我们回溯找出最长公共子串。有时候左上、左、上三者中有多个同时达到最大，那么任取其中之一，但是在整个过程中你必须遵循固定的优先标准。在我的代码中优先级别是左上>左>上。

下图给出了回溯法找出LCS的过程：

#include<cstdio>
#include<cstring>
#include<algorithm>
using namespace std;
int num[1002][1002];
int main()
{
    int i,j,k;
    char a[1002],b[1002];
    while(scanf("%s",a)!=EOF)
    {
        scanf("%s",b);
        int stra=strlen(a);
        int strb=strlen(b);
        memset(num,0,sizeof(num));
        for(i=1;i<=stra;i++)
        {
            for(j=1;j<=strb;j++)
            {
                k=num[i-1][j-1];
                if(a[i-1]==b[j-1])
                    k++;
                num[i][j]=max(max(num[i][j-1],num[i-1][j]),k);  //状态方程
            }
        }
        printf("%d
",num[stra][strb]);
    }
    return 0;
}