正则表达式 Pattern & Matcher

1 compile and pattern

Pattern类用于创建一个正则表达式,也可以说创建一个匹配模式,它的构造方法是私有的,不可以直接创建,但可以通过Pattern.complie(String regex)简单工厂方法创建一个正则表达式

Pattern p=Pattern.compile("\w+");
p.pattern();//返回 w+

pattern() 返回正则表达式的字符串形式,其实就是返回Pattern.complile(String regex)的regex参数

2 Pattern.split(CharSequence input) compile and pattern

Pattern p=Pattern.compile("\d+");
String[] str=p.split("我的姓名是:123我的电话是:456我的年龄是:789");

运行结果:　str[0]=我的姓名是: str[1]=我的电话是: str[2]=我的年龄是:

Pattern p = Pattern.compile("/");
String[] result1 = p.split(
        "Kevin has seen《LEON》seveal /times,because /it is a good filmaaa.fdfdfd"
                +"/ 凯文已经看过《这个杀手不太冷》几次了，因为它是一部"
                +"好电影。/名词:凯文。", 3);   //后面的数字limit如果是0则不限制，否则会只匹配前limit个复合条件的项
for (int i=0; i<result1.length; i++)
    System.out.println(result1[i]);
}

运行结果：

Kevin has seen《LEON》seveal
times,because
it is a good filmaaa.fdfdfd/ 凯文已经看过《这个杀手不太冷》几次了，因为它是一部好电影。/名词:凯文。

3 Pattern.matches(String regex,CharSequence input)

静态方法,用于快速匹配字符串,该方法适合用于只匹配一次,且匹配全部字符串.

Pattern.matches("\d+","2223");//返回true
Pattern.matches("\d+","2223aa");//返回false,需要匹配到所有字符串才能返回true,这里aa不能匹配到

4.Matcher.matches()/ Matcher.lookingAt()/ Matcher.find()

1) matches:整个字符串进行匹配,只有整个字符串都匹配了才返回true all

ps:Pattern.compile(regex).matcher(input).matches() 与 Pattern.matches(regex, input)　等价

2) lookingAt()对字符串进行前置匹配,匹配到的字符串可以在任何位置. first

Matcher matcher = Pattern.compile("\d+").matcher("22bb23");
m.lookingAt();//返回true,因为d+匹配到了前面的22

Matcher matcher = Pattern.compile("\d+").matcher("aa2223");
m2.lookingAt();//返回false,因为d+不能匹配前面的aa

3) find()对字符串进行匹配,匹配到的字符串可以在任何位置. any

Matcher matcher = Pattern.compile("\d+").matcher("22bb23");
m.find();//返回true

5.Mathcer.start()/ Matcher.end()/ Matcher.group()

当使用matches(),lookingAt(),find()执行匹配操作成功后,就可以利用以上三个方法得到更详细的信息

start()返回匹配到的子字符串在字符串中的索引位置.
end()返回匹配到的子字符串的最后一个字符在字符串中的索引位置.
group()返回匹配到的子字符串

Pattern p=Pattern.compile("\d+");
Matcher m1=p.matcher("aaa2223bb");
m1.find();//匹配2223
m1.start();//返回3
m1.end();//返回7,返回的是2223后的索引号
m1.group();//返回2223

Matcher m2=p.matcher("2223bb");
m2.lookingAt();   //匹配2223
m2.start();   //返回0,由于lookingAt()只能匹配前面的字符串,所以当使用lookingAt()匹配时,start()方法总是返回0
m2.end();   //返回4
m2.group();   //返回2223

Matcher m3=p.matcher("2223bb");
m3.matches();   //匹配整个字符串
m3.start();   //java.lang.IllegalStateException: No match found
m3.end();   //java.lang.IllegalStateException: No match found
m3.group();   //java.lang.IllegalStateException: No match found

6.Mathcer.group()

例子１

String str = "Hello,World! in Java.";
Pattern pattern = Pattern.compile("W(or)(ld!)");
Matcher matcher = pattern.matcher(str);
while(matcher.find()){
    System.out.println("Group 0:"+matcher.group(0));//得到第0组——整个匹配  
    System.out.println("Group 1:"+matcher.group(1));//得到第一组匹配——与(or)匹配的  
    System.out.println("Group 2:"+matcher.group(2));//得到第二组匹配——与(ld!)匹配的，组也就是子表达式  
    System.out.println("Start 0:"+matcher.start(0)+" End 0:"+matcher.end(0));//总匹配的索引  
    System.out.println("Start 1:"+matcher.start(1)+" End 1:"+matcher.end(1));//第一组匹配的索引  
    System.out.println("Start 2:"+matcher.start(2)+" End 2:"+matcher.end(2));//第二组匹配的索引  
    System.out.println(str.substring(matcher.start(0),matcher.end(1)));//从总匹配开始索引到第1组匹配的结束索引之间子串——Wor  
}

程序的运行结果为：

Group 0:World!  
Group 1:or  
Group 2:ld!  
Start 0:6 End 0:12  
Start 1:7 End 1:9  
Start 2:9 End 2:12  
Wor

例子２

Pattern p=Pattern.compile("([a-z]+)(\d+)");
Matcher m=p.matcher("aaa2223bb");
m.find();   //匹配aaa2223
m.groupCount();   //返回2,因为有2组
m.start(1);   //返回0 返回第一组匹配到的子字符串在字符串中的索引号
m.start(2);   //返回3
m.end(1);   //返回3 返回第一组匹配到的子字符串的最后一个字符在字符串中的索引位置.
m.end(2);   //返回7
m.group(1);   //返回aaa,返回第一组匹配到的子字符串
m.group(2);   //返回2223,返回第二组匹配到的子字符串

区分第几组的标识是是圆括号咩。。。

ps:　group不加参数默认值是0

7.Mathcer.find()

Pattern compile = Pattern.compile("[0-9]+");
Matcher matcher = compile.matcher("1q2w3eee44rrr");
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
    System.out.println(matcher.group());
    matcher.appendReplacement(sb, "--");
}
matcher.appendTail(sb);
System.out.println(sb);

运行结果：

1
2
3
44
--q--w--eee--rrr

Pattern.split 替代　String.split

String.split方法很常用，用于切割字符串，split传入的参数是正则表达式，它的内部是每次都comiple正则表达式，再调用Pattern.split方法：

public String[] split(String regex, int limit) {
    return Pattern.compile(regex).split(this, limit);
    }

  public String[] split(String regex) {
        return split(regex, 0);
    }

因此，如果你调用String.split非常频繁的话，每次都重新编译正则表达式的代价很高，性能会受到很大影响，此时最好自己预编译Pattern,再调用Pattern.split方法为妙。

String []items=line.split(" ");

//替代为

static Pattern pattern=Pattern.compile(" ");



String []items=pattern.split(line,0);

也可以看看StringTokenizer，也是分割字符的～

rules

    /**
    字符
    x 字符 x
    \ 反斜线字符
    	 制表符('u0009')
    
 换行符 ('u000A')
    
 回车符 ('u000D')
    f 换页符 ('u000C')
    a 响铃符 ('u0007')
    e 转义符 ('u001B')
    cx T对应于x的控制字符 x

            字符类
    [abc] a, b, or c (简单类)
            [^abc] 除了a、b或c之外的任意 字符（求反）
            [a-zA-Z] a到z或A到Z ，包含（范围)
            [a-z-[bc]] a到z，除了b和c ： [ad-z]（减去）
            [a-z-[m-p]] a到z，除了m到 p： [a-lq-z]
            [a-z-[^def]] d, e, 或 f
    备注：
    方括号的正则表达式“t[aeio]n”只匹配“tan”、“Ten”、“tin”和“ton”，只能匹配单个字符。
    圆括号，因为方括号只允许匹配单个字符；故匹配多个字符时使用圆括号“()”。比如使用“t(a|e|i|o|oo)n”正则表达式，就必须用圆括号。

    预定义的字符类
            . 任意字符（也许能与行终止符匹配，也许不能） 备注：句点符号代表任意一个字符。比如：表达式就是“t.n”，它匹配“tan”、“ten”、“tin”和“ton”，还匹配“t#n”、“tpn”甚至“t n”。
            d 数字: [0-9]
            D 非数字: [^0-9]
            s 空格符: [ 	
x0Bf
]
            S 非空格符: [^s]
            w 单词字符: [a-zA-Z_0-9]
            W 非单词字符: [^w]

    表达次数的符号
    符号 次数
    * 0次或者多次
    + 1次或者多次
    ? 0次或者1次
    {n} 恰好n次
    {n,m} 从n次到m次
    */

    /**
     *
     * java.util.regex是一个用正则表达式所订制的模式来对字符串进行匹配工作的类库包
     *
     * 它包括两个类：Pattern和Matcher
     * 
     * Pattern： 一个Pattern是一个正则表达式经编译后的表现模式
     * 
     * Matcher： 一个Matcher对象是一个状态机器，它依据Pattern对象做为匹配模式对字符串展开匹配检查
     * 
     * Pattern实例订制了一个所用语法与PERL的类似的正则表达式经编译后的模式，然后Matcher实例在这个给定的Pattern实例的模式控制下进行字符串的匹配
     */