Regex multiple occurrence search

Here is a task we have very often, multiple occurrence or recurring match, this is made easy by understanding zero width assertion and greedy –lazy matching.

1. Zero width assertion

You can place conditions on what should occur before or after a match, through lookbehind, lookahead, anchors, and word boundaries.

The match does not include these conditions, therefore it is called zero width. Here I am going to talk about lookbehind and lookahead only, they are also called lookaround.

An easy example to help you understand lookaround is

 
Console.WriteLine (Regex.Match ("25 miles", @"\d+\s(?=miles)")); 

You will get output 25 , what the regex is searching is a number that precedes miles, but does not include miles

?= positive look ahead
?! negative look ahead
?<= positive look back ? var input = "abRcabTcabHcabcabJc"; var pattern = new Regex(@"ab.*?c"); var count = pattern.Matches(input).Count;

Well without lookaround, count is 5, matches return with abc in it

var pattern2 = new Regex(@"(?<=ab).*?(?=c)");
var count2 = pattern2.Matches(input).Count;

With lookaround, matches does not return with ab and c, this is very useful to find or extract the recurring () or [] instances in a string.

Be careful that .* and .*? is very different, this involves the fight between Greedy and Lazy which is critical for matches

2. Greedy and Lazy matching

General behavior for quantifier based search is greedy, as many characters as possible, but by adding ? it will count the matches based on minimum number of repetitions

a.*b The longest string starting with a and ending with b
a.*?b The shortest string starting with a and ending with b

*?
Repeat any number of times, but as few as possible
+?
Repeat one or more times, but as few as possible
??
Repeat zero or one time, but as few as possible
{n,m}?
Repeat at least n, but no more than m times, but as few as possible
{n,}?
Repeat at least n times, but as few as possible

Tags:

This entry was posted on Thursday, July 12th, 2012 at 10:43 pm and is filed under General, Software Engineering. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply

*