On Fri, 2010-03-12 at 16:27 +0200, Henrik K wrote:

> If you have enough words to require multiple REs, then sorting doesn't hurt.
> So the start boundaries for a single RE to catch on are minimized.
>
OK, so there are benefits if every alternate in a regex starts with the
same letter?

Almost everything I know about the innards of regexes comes from
implementing them when I translated the code in Kernighan & Plauger's
"Software Tools in Pascal" into PL/9 (FYI PL/9 is a derivative of PL/M
for the 6809, so I did this a long time ago). I remember that was a
quite workable regex engine, but it had no optimisations and wasn't
startlingly fast.  

I now think I need to know more about how modern regex engines work and
in particular about the optimisations used by PCRE. Can anybody
recommend documentation on this topic?


Martin


Reply via email to