1、串（字符串）以及串的模式匹配算法

串（或字符串）是由0个或多个字符组成的有限序列。0个字符的串成为空串。串中任意个连续的字符组成的子序列称为该串的子串。

串的表示和实现

串有3种机内表示方法：定长顺序存储表示，堆分配存储表示，串的块链存储表示。

1、定长顺序存储表示：用一组地址连续的存储单元存储串值的字符序列。

//--------串的定长顺序存储表示--------
#define MAXSTRLEN 255     //用户可以在255以内定义最大串长
typedef unsigned char SString[MAXSTRLEN+1];    //C语言中以‘’结束

弊端：如果操作中出现串值序列的长度超过上界MAXSTRLEN 时，约定用截尾法处理。要克服这种弊病唯有不限定串长的最大长度，即动态分配串值的存储空间。

2、堆分配存储表示：仍以一组地址连续的存储单元存放串值字符序列，但它们的存储空间是在程序执行过程中动态分配而得。在C语言中使用malloc()和free()来管理堆空间。

//--------串的堆分配存储表示--------
typedef struct{
    char *ch;    //若是非空串，则按串长分配存储区，否则ch为NULL
    int length;      //串长度
}HString;

3、串的块链存储表示：类似于链表，只是每个节点是一块连续结构。最后一个节点不一定会被串值占满，此时通常补上'#'或者其它非串值字符（通常‘#’不属于串的字符集，是一个特殊的符号）。

//--------串的块链存储表示--------
#define CHUNKSIZE 80

typedef struct Chunk{
    char ch[CHUNKSIZE];
    struct Chunk *next;
}Chunk;

typedef struct {
    Chunk *head, *tail;    //指向头和尾，tail是方便串连接用的
    int curlen;    //串的当前长度
}Lstring;

串的模式匹配算法

　　子串的定位操作通常称为串的模式匹配，是各种串处理系统中最重要的操作之一。

下面的算法是一种很自然的思路：

 1 int index(HString S, HString T, int pos){
 2 　　//返回子串T在主串S中第pos个字符之后的位置。若不存在，则函数值为0。
 3 　　//其中，T非空，1≤pos≤StrLength(S)。
 4　　 if(S == null || T == null || pos < 0 || pos >= S.length)
 5    　　 return ERROR;
 6     i = pos;    j = 0;  
 7     while(i < S.length && j < T.length){
 8         if(S[i] == T[j]){
 9             i++;  j++;    //继续比较后续字符
10         }else{
11             i = i - j + 1;  j = 0;    //指针后退，重新开始匹配
12         }
13     }
14     if(j = T.length)    return i - T.length;
15     else  return 0;
16 }

字符串的模式匹配算法的改进是KMP算法。具体的可以参看相关的论文或者是《数据结构——C语言版》（清华大学出版）。

下面看看java的String中是如何实现字符串的模式识别的（String.class关联的源代码）：

 1 static int indexOf(char[] source, int sourceOffset, int sourceCount,
 2             char[] target, int targetOffset, int targetCount,
 3             int fromIndex) {
 4         if (fromIndex >= sourceCount) {
 5             return (targetCount == 0 ? sourceCount : -1);
 6         }
 7         if (fromIndex < 0) {
 8             fromIndex = 0;
 9         }
10         if (targetCount == 0) {
11             return fromIndex;
12         }
13 
14         char first = target[targetOffset];
15         int max = sourceOffset + (sourceCount - targetCount);
16 
17         for (int i = sourceOffset + fromIndex; i <= max; i++) {
18             /* Look for first character. */
19             if (source[i] != first) {
20                 while (++i <= max && source[i] != first);
21             }
22 
23             /* Found first character, now look at the rest of v2 */
24             if (i <= max) {
25                 int j = i + 1;
26                 int end = j + targetCount - 1;
27                 for (int k = targetOffset + 1; j < end && source[j]
28                         == target[k]; j++, k++);
29 
30                 if (j == end) {
31                     /* Found whole string. */
32                     return i - sourceOffset;
33                 }
34             }
35         }
36         return -1;
37     }