用正则表达式匹配用rdf3x处理过后的TTL格式文档

1、比如下面这个用rdf3x处理过后的TTL文档片段:

注意缩进的是两个空格

<http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363853> <http://rdf.ebi.ac.uk/terms/chembl#hasBindingSite> <http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2622>.
<http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2659> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.ebi.ac.uk/terms/chembl#BindingSite>;
  <http://www.w3.org/2000/01/rdf-schema#label> "CHEMBL_BS_2659";
  <http://rdf.ebi.ac.uk/terms/chembl#chemblId> "CHEMBL_BS_2659";
  <http://rdf.ebi.ac.uk/terms/chembl#hasTarget> <http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363965>;
  <http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName> "30S ribosomal protein S1".
<http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363965> <http://rdf.ebi.ac.uk/terms/chembl#hasBindingSite> <http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2659> , <http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2623>.
<http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2623> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.ebi.ac.uk/terms/chembl#BindingSite>;
  <http://www.w3.org/2000/01/rdf-schema#label> "CHEMBL_BS_2623";
  <http://rdf.ebi.ac.uk/terms/chembl#chemblId> "CHEMBL_BS_2623";
  <http://rdf.ebi.ac.uk/terms/chembl#hasTarget> <http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363965>;
  <http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName> "16S/23S ribosomal RNA interface".
<http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2624> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdf.ebi.ac.uk/terms/chembl#BindingSite>;
  <http://www.w3.org/2000/01/rdf-schema#label> "CHEMBL_BS_2624";
  <http://rdf.ebi.ac.uk/terms/chembl#chemblId> "CHEMBL_BS_2624";
  <http://rdf.ebi.ac.uk/terms/chembl#hasTarget> <http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022>;
  <http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName> "23S ribosomal RNA".
<http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022> <http://rdf.ebi.ac.uk/terms/chembl#hasBindingSite> <http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2624>.

2、Java编写的正则表达式代码

代码里注释的部分和上面那行是输出三种所需的不同结果

package com.jena;

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class rdfReader3 {
    static String url="";
    
    public static void main(String[] args) {
        FileReader fr=null;
        BufferedReader br=null;
        try{
            fr=new FileReader("C:/Users/Don/workspace/Jena/src/com/jena/bindingsite");
            br=new BufferedReader(fr);
            String s=" ";
            StringBuffer str=new StringBuffer();
            while((s=br.readLine())!=null){
                Pattern p= Pattern.compile("<([^<>]*)>");    //匹配所有尖括号里的内容
//                Pattern p= Pattern.compile("^
*<([^<>]*)>");    //匹配每一个主语,开头匹配“除了空格所有字符”,后面匹配"<>里的所有内容,内容为非尖括号"
//                Pattern p= Pattern.compile("  <([^<>]*)>");        //匹配“两个空格开头”,后面匹配"<>里的所有内容,内容为非尖括号"
                Matcher m=p.matcher(s);
              
                while(m.find()){
                    System.out.println(m.group(1));
                }
            }
            
        }catch(Exception e){
            System.out.println(e.getMessage());
        }
        
        
    }
    
    

}

(1)匹配所有尖括号里的内容

运行结果

http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363853
http://rdf.ebi.ac.uk/terms/chembl#hasBindingSite
http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2622
http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2659
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://rdf.ebi.ac.uk/terms/chembl#BindingSite
http://www.w3.org/2000/01/rdf-schema#label
http://rdf.ebi.ac.uk/terms/chembl#chemblId
http://rdf.ebi.ac.uk/terms/chembl#hasTarget
http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363965
http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName
http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363965
http://rdf.ebi.ac.uk/terms/chembl#hasBindingSite
http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2659
http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2623
http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2623
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://rdf.ebi.ac.uk/terms/chembl#BindingSite
http://www.w3.org/2000/01/rdf-schema#label
http://rdf.ebi.ac.uk/terms/chembl#chemblId
http://rdf.ebi.ac.uk/terms/chembl#hasTarget
http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363965
http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName
http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2624
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://rdf.ebi.ac.uk/terms/chembl#BindingSite
http://www.w3.org/2000/01/rdf-schema#label
http://rdf.ebi.ac.uk/terms/chembl#chemblId
http://rdf.ebi.ac.uk/terms/chembl#hasTarget
http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022
http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName
http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022
http://rdf.ebi.ac.uk/terms/chembl#hasBindingSite
http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2624

(2)匹配每一个主语,即开头不是两个空格的那一行数据的第一对尖括号里的内容

运行结果

http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363853
http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2659
http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2363965
http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2623
http://rdf.ebi.ac.uk/resource/chembl/binding_site/CHEMBL_BS_2624
http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022

(3)匹配“两个空格开头”,后面匹配"<>里的所有内容,内容为非尖括号"

http://www.w3.org/2000/01/rdf-schema#label
http://rdf.ebi.ac.uk/terms/chembl#chemblId
http://rdf.ebi.ac.uk/terms/chembl#hasTarget
http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName
http://www.w3.org/2000/01/rdf-schema#label
http://rdf.ebi.ac.uk/terms/chembl#chemblId
http://rdf.ebi.ac.uk/terms/chembl#hasTarget
http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName
http://www.w3.org/2000/01/rdf-schema#label
http://rdf.ebi.ac.uk/terms/chembl#chemblId
http://rdf.ebi.ac.uk/terms/chembl#hasTarget
http://rdf.ebi.ac.uk/terms/chembl#bindingSiteName

 匹配前面两个空格开始的数据时,在前面直接输入两个空格即可

  Pattern p= Pattern.compile("  <([^<>]*)>"); 
原文地址:https://www.cnblogs.com/Donnnnnn/p/5718947.html