Sorry, I get nerdy about this subject and can't help following up.

I said:

- pcre2 regex matching is generally faster than re2 matching. The point of re2 regexen is that matches won't go into catastrophic backtracking on pathological cases.

Should have mentioned that pcre2 is even better at subexpression capture, which is what the OP's question is all about.

sub vcl_init {
     new query_pattern = re.regex(".*(q=)(.*?)(\&|$).*");
}

OMG no. Like this please:

        new query_pattern = re.regex("\b(q=)(.*?)(?:\&|$)");

I have sent an example of a pcre regex with .* (two of them!) to a public mailing list, for which I will burn in hell.

To match a name-value pair in a cookie, use a regex with \b for 'word boundary' in front of the name. That way it will match either at the beginning of the Cookie value, or following an ampersand.

And ?: tells pcre not to bother capturing the last expression in parentheses (they're just for grouping).

Avoid .* in pcre regexen if you possibly can. You can, almost always.

With .* at the beginning, the pcre matcher searches all the way to the end of the string, and then backtracks all the way back, looking for the first letter to match. In this case 'q', and it will stop and search and backtrack at any other 'q' that it may find while working backwards.

pcre2 fortunately has an optimization that ignores a trailing .* if it has found a match up until there, so that it doesn't busily match the dot against every character left in the string. So this time .* does no harm, but it's superfluous, and violates the golden rule of pcre: avoid .* if at all possible.

Incidentally, this is an area where re2 does have an advantage over pcre2. The efficiency of pcre2 matching depends crucially on how you write the regex, because details like \b instead of .* give it hints for pruning the search. While re2 matching usually isn't as fast as pcre2 matching against well-written patterns, re2 doesn't depend so much on that sort of thing.


OK I can chill now,
Geoff
--
** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

Tel +49 40 2880 5731
Mob +49 176 636 90917
Fax +49 40 42949753

http://uplex.de

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

_______________________________________________
varnish-misc mailing list
[email protected]
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc

Reply via email to