Regex

We need to add a prefix "prefix-" to <MsgId><Id> element in xml messages but there might be a new line between <MsgId> and <Id>.

Example of the xml messages are

        <MsgId>
          <Id>XXX</Id>
          <CreDtTm>2018-03-22T09:05:24.334054Z</CreDtTm>
        </MsgId>
        <Els>
              <Id>BBB</Id>
        </Els>



        <MsgId><Id>XXX</Id>
          <CreDtTm>2018-03-22T09:05:24.334054Z</CreDtTm>
        </MsgId>
        <Els> <Id>BBB</Id></Els>

1. Greedy match

 file_content = re.sub('(<MsgId>.*(\n)?.*<Id>)', r'1' + PARALLEL_PREFIX, file_content)

regular expression will match the element using greedy method by default, so the above regex will actually match "<MsgId>...</MsgId>..<Els><Id>" not "<MsgId><Id>"

2. Lazy match

Adding "?" to quantifiers like "+", "?", "*" will do a lazy match instead of greedy match. so

 file_content = re.sub('(<MsgId>.*?(\n)??.*?<Id>)', r'1' + PARALLEL_PREFIX, file_content)

the above regex will match "<MsgId><Id>", generally means match the keyword using as less characters as possible.

原文地址:https://www.cnblogs.com/codingforum/p/8711823.html