MATLAB中的Regex

  • regexprep——用于对字符串进行查找并替换。

regexp

Definition

用于对字符串进行查找,大小写敏感。

  •  startIndex = regexp(str,expression)

返回与正则表达式指定的字符模式匹配的每个str子字符串的起始索引。如果没有匹配,startIndex就是一个空数组。

  • [startIndex,endIndex] = regexp(str,expression)

返回起始索引和结束索引。

  • out = regexp(str,expression,outkey)

返回由outkey指定的输出。例如,如果outkey是'match',那么regexp将返回与表达式匹配的子字符串,而不是它们的起始索引。

  • [out1,...,outN] = regexp(str,expression,outkey1,...,outkeyN)

用于指定多个输出关键字outkey,获得多个输出。

outkey

  • 'start':起始索引;
  • 'end':结束索引;
  • 'tokenExtents':返回HTML标签的起始和结束索引;
  • 'match':匹配到的文本;
  • 'tokens':返回匹配的HTML标签;
  • 'names':匹配数值并分配给命名;
  • 'split':被expression分隔开的str的非匹配子字符串的文本。

examples

普通索引匹配

str = 'bat cat can car coat court CUT ct CAT-scan';
expression = 'c[aeiou]+t';
startIndex = regexp(str,expression)
startIndex = 1×2
     5    17

多个字符串同时匹配

str = {'Madrid, Spain','Romeo and Juliet','MATLAB is great'};
capExpr = '[A-Z]';
capStartIndex = regexp(str,capExpr);
celldisp(capStartIndex)
capStartIndex{1} =
     1     9
capStartIndex{2} =
     1    11
capStartIndex{3} =
     1     2     3     4     5     6

字符串匹配('match')

str = 'EXTRA! The regexp function helps you relax.';
expression = 'w*xw*';
matchStr = regexp(str,expression,'match');
celldisp(matchStr)
matchStr{1} =
regexp
matchStr{2} =
relax

非匹配文本

str = 'She sells sea shells by the seashore.';
expression = '[Ss]h.';
[match,noMatch] = regexp(str,expression,'match','split')
match = 1×3 cell 数组
    {'She'}    {'she'}    {'sho'}
combinedStr = strjoin(noMatch,match)
combinedStr = 'She sells sea shells by the seashore.'

捕获HTML标记

str = '<title>My Title</title><p>Here is some text.</p>';
expression = '<(w+).*>.*</1>';
[tokens,matches] = regexp(str,expression,'tokens','match');
tokens{1}{1} =
title
tokens{2}{1} =
p
matches{1} =
<title>My Title</title>
matches{2} =
<p>Here is some text.</p>

Enclosing w+ in parentheses captures the name of the HTML tag in a token. (回溯引用)

命名匹配分配('names')

str = '01/11/2000  20-02-2020  03/30/2000  16-04-2020';
expression = ['(?<month>d+)/(?<day>d+)/(?<year>d+)|'...
              '(?<day>d+)-(?<month>d+)-(?<year>d+)'];
tokenNames = regexp(str,expression,'names');
for k = 1:length(tokenNames)
disp(tokenNames(k))
end
    month: '01'
      day: '11'
     year: '2000'

    month: '02'
      day: '20'
     year: '2020'

    month: '03'
      day: '30'
     year: '2000'

    month: '04'
      day: '16'
     year: '2020'

(?<name>d+) finds one or more numeric digits and assigns the result to the token indicated by name. 

regexpi

和regexp用法类似,大小写不敏感。

regexprep

  • newStr = regexprep(str,expression,replace)

Replaces the text in str that matches expression with the text described by replace. The regexprep function returns the updated text in newStr.

examples

回溯引用替换

str = 'I walk up, they walked up, we are walking up.';
expression = 'walk(w*?) up';
replace = 'ascend$1';

newStr = regexprep(str,expression,replace)
newStr = 'I ascend, they ascended, we are ascending.'

引用函数(这个似乎只有在MATLAB里面试用,在其他场合如Notepad++并不适用)

str = 'here are two sentences. neither is capitalized.';
expression = '(^|.)s*.';
replace = '${upper($0)}';

newStr = regexprep(str,expression,replace)
newStr = 'Here are two sentences. Neither is capitalized.'

The replace expression calls the upper function for the currently matching character ($0). 

原文地址:https://www.cnblogs.com/dingdangsunny/p/12337196.html