: Indeed. I wrote the following test:
: 
: Pattern p = Pattern.compile("(.*)");
: Matcher m = p.matcher("xyz");
: Assert.assertEquals("", "Video", m.replaceAll("Video"));
: 
: The test fails. It gives "VideoVideo" as the actual result. I guess there is
: something about Matcher.replaceAll that I don't know. Off to read the
: javadocs then.

".*" matches the empty string (for that matter any regex clause with the 
"*" modifier applied matches the empty string), and iterating over pattern 
matches (ie: what happens if you call Matcher.find() or 
Matcher.replaceAll()) always advances to "first character not matched by 
[the previous] match." (ie: let prev = m.end(); if (m.find) then prev <= 
m.start()).

So ".*" always matches twice on any given String x ... once when it 
matches from 0 to x.length()-1, and one when it matches the empty string 
starting and ending at x.length()-1.

That's why using "^.*" doesn't have this problem ... "*" is greedy so it 
only matches once at the start of the string and then there can't be any 
more matches.  Conversly: ".*$" and ".*\z" will still have this problem, 
because any number of matches can have the same ending offset.


-Hoss

Reply via email to