[LeetCode] Wildcard Matching 题解

6. Wildcard Matching

题目

Implement wildcard pattern matching with support for '?' and '*'.

'?' Matches any single character.'*' Matches any sequence of characters (including the empty sequence).

The matching should cover the entire input string (not partial).

The function prototype should be:bool isMatch(const char *s, const char *p)

Some examples:isMatch("aa","a") ? falseisMatch("aa","aa") ? trueisMatch("aaa","aa") ? falseisMatch("aa", "*") ? trueisMatch("aa", "a*") ? trueisMatch("ab", "?*") ? trueisMatch("aab", "c*a*b") ? false

解答

DFS

这里的难点在于如何处理*，因为这个星号可以代表0到多个字符，而且有可能会遇到递归一开始匹配正确后面不正确，但实际上应该从后面开始匹配。

class Solution(object):
    # p为匹配模式，s为字符串
    def recursive(self, s, p, si, pi, cur):
        first = True
        n_cur = cur
        while si < len(s) and pi < len(p) and (s[si] == p[pi] or p[pi] == '?'):
            si += 1
            pi += 1
        if pi == len(p):
            return si == len(s)
        if p[pi] == '*':
            while pi < len(p) and p[pi] == '*':
                pi += 1
            if pi >= len(p):
                return True
            for i in range(si, len(s)):
                # 表明开始重合，从这里再度开始递归
                if p[pi] != s[i] and p[pi] != '?':
                    continue
                if first:
                    cur += 1
                    first = False
                # 可能存在多次重合但是还不算真正匹配的情况
                if self.recursive(s, p, i, pi, cur + 1):
                    return True
                if cur > n_cur + 1: # 正常来说n_cur = cur + 1
                    return False
        return False
    def isMatch(self, s, p):
        """
        :type s: str
        :type p: str
        :rtype: bool
        """
        return self.recursive(s, p, 0, 0, 0)

这种做法超时。

DP

我们定义一个二维数组dp，横坐标为待匹配字符串，纵坐标为模式字符串，dp[i][j]则代表到模式字符串从0到 i 对应待匹配字符串的的0到 j 是否是匹配的。举个例子：

pattern = "a*bc"
str = "abbc"

我们可以根据前面提到的画出整个二维数组

abbc
TFFFF
aFTFFF
*FTTTT
bFFTTF
cFFFFT

		a	b	b	c
	T	F	F	F	F
a	F	T	F	F	F
*	F	T	T	T	T
b	F	F	T	T	F
c	F	F	F	F	T

我们可以发现一个规律，每当遇到两个字符不相等的时候，那么数组的值则肯定是False，相反相等的时候肯定是True，这里需要注意的是*，这里则需要考虑到它当前可能匹配0个字符串或者匹配多个字符，比如上面中的a*和ab的情况，此时我们需要发现a*及a或者a和ab其中有任何一个成功匹配的，它的结果也肯定为T。

这个状态转义方程要怎么推算出来呢？

如果p.charAt(i)=='*'，'*'可以选择匹配0个字符，此时flag[i][j]=flag[i-1][j];可以选择匹配1个字符，此时flag[i][j]=flag[i-1][j-1];……所以可以得到下面的公式：
因为flag[i][j]=flag[i-1][j]||flag[i-1][j-1]||……||flag[i-1][0]，我们可以代入上面的公式得到：

于是我们可以很简单的写出程序了（下面的程序的i，j和状态转义方程是相反的，但是原理是相同的）

class Solution(object):
    # p为匹配模式，s为字符串
    def isMatch(self, s, p):
        """
        :type s: str
        :type p: str
        :rtype: bool
        """
        if len(s) != len(p) - p.count('*'):
            return False
        newp = ""
        i = 0
        while i < len(p):
            newp += p[i]
            if p[i] == '*':
                while i + 1 < len(p) and p[i + 1] == '*':
                    i += 1
            i += 1
        sl, pl = len(s), len(newp)
        dp = [[False for x in range(pl + 1)] for y in range(sl + 1)]
        dp[0][0] = True
        if pl > 0 and p[0] == '*':
            dp[0][1] = True
        for x in range(1, sl + 1):
            for y in range(1, pl + 1):
                if newp[y - 1] != '*':
                    dp[x][y] = dp[x - 1][y - 1] and (s[x - 1] == newp[y - 1] or newp[y - 1] == '?')
                else:
                    dp[x][y] = dp[x - 1][y] or dp[x][y - 1]
        return dp[sl][pl]

同样的原理，我们还可以把它缩减成一维数组，你可以把它想象成在二维数组中计算每一行的数据，如果遇到*则更新当前行的数据；为什么可以这么做呢？我们可以根据前面提到的公式发现，其中当前的数据依赖于j的变化，也就是待匹配字符串的值，我们还需要在外面写个模式串的循环，其实和二维数组的做法的时间复杂度是一样的，但是缩减了空间，但是并不是所有的都可以这么做，这个取决于你的依赖项是什么。总而言之，其原理还是一样的，只是想办法让它们的数据能够共存到一维数组中。

class Solution:
    # @return a boolean
    def isMatch(self, s, p):
        length = len(s)
        if len(p) - p.count('*') > length:
            return False
        dp = [True] + [False]*length
        for i in p:
            if i != '*':
                # 因为依赖项是前面的值，所以不能从前面往后面扫，得从后往前计算
                for n in reversed(range(length)):
                    dp[n+1] = dp[n] and (i == s[n] or i == '?')
            else:
                # 更新当前行的数据
                for n in range(1, length+1):
                    dp[n] = dp[n-1] or dp[n]
            dp[0] = dp[0] and i == '*'
        return dp[-1]

贪心算法

下标描述
si待匹配字符串的移动下标
pi模式串的移动下标
lastmatch上一次匹配的待匹配字符串的下标
laststar上一次匹配的模式串的下标

下标	描述
si	待匹配字符串的移动下标
pi	模式串的移动下标
lastmatch	上一次匹配的待匹配字符串的下标
laststar	上一次匹配的模式串的下标

如果当前相等或者模式串中字符为?，则移动相互的下标即可；
如果当前模式串字符为*，分别纪录lastmatch、laststar，并且移动模式串下标，但是不移动待匹配字符串下标，因为可能存在匹配0个字符串的情况；
如果当前相互对应的字符不再相等且不为*，如果前面有*号，说明之前的匹配失败了，模式字符串下标回到之前纪录laststar的后一位，不再移动，专门用来给待匹配字符串字符来匹配，这段时间内，si会不断的向前移动，直到匹配到相互的值相等才移动模式字符串的下标；
如果前面的情况都不符合，则肯定为False；

看看我的抽象派画风。

class Solution(object):
    # p为匹配模式，s为字符串
    def isMatch(self, s, p):
        si, pi = 0, 0
        lastmatch, laststar = -1, -1
        sl, pl = len(s), len(p)
        if pl - p.count('*') > sl:
            return False
        # 注意条件顺序
        while si < sl:
            if pi < pl and (s[si] == p[pi] or p[pi] == '?'):
                pi += 1
                si += 1
            elif pi < pl and p[pi] == '*':
                lastmatch, laststar = si, pi  # 之所以不更新lastmatch是因为考虑到*只匹配0个字符串
                pi += 1
            # 再次进到这个判断，说明当前下标对应的值不相等
            elif laststar != -1:
                pi = laststar + 1  # pi当前不是*，并且回到上一次星的后面，专门用来给si匹配
                lastmatch += 1  # 必须更新lastmatch，因为之前已经不想等，如果在回到开始的状态就会陷入死循环
                si = lastmatch
            else:
                return False
        # 可能存在p的末尾都是*的情况
        while pi < len(p) and p[pi] == '*':
            pi += 1
        # 最后匹配成功模式字符串的下标必然为其长度，表示已经匹配完成
        return pi == pl

tips：不要小看保存你的长度值，如果你频繁的用到的话，最好保存下来，比如在这里，我保存下来以后可以让我提升%10的beat submissions！

一样的原理，但是使用了递归的方式来做

class Solution(object):
    def isMatch(self, s, p):
        """
        :type s: str
        :type p: str
        :rtype: bool
        """
        seen = {}
        wild_single, wild_multi = "?", "*"
        # seen has the pattern - source tuple as key, and bool result as success
        source, pattern = s, p
        def is_match(sindex, pindex):
            key = (sindex, pindex)
            if key in seen:
                return seen[key]
            result = True
            # if there's no string, and pattern is not only * then fail
            if sindex >= len(source):
                for wildindex in xrange(pindex, len(pattern)):
                    if pattern[wildindex] != wild_multi:
                        result = False
                        break
            # there's a string, but no pattern
            elif pindex >= len(pattern):
                result = False
            # if next pattern is multi though, that's something
            elif pattern[pindex] == wild_multi:
                # for zero, simply check sindex, pindex + 1
                result = is_match(sindex, pindex + 1) # just for easier debug
                # if zero, than it's a match
                # otherwise we need to check multi
                # for that, if char is not a wild, then it has to match the source,
                result = result or is_match(sindex + 1, pindex)
            else:
                # either a regular char, or wild_single
                result = (( pattern[pindex] == wild_single or pattern[pindex] == source[sindex]) and 
                                    is_match(sindex + 1, pindex + 1))
            seen[key] = result
            return result
        if (len(p) - p.count(wild_multi) > len(s)):
            return False
        return is_match(0, 0)