On Fri, 2010-03-12 at 16:27 +0200, Henrik K wrote:

> If you have enough words to require multiple REs, then sorting doesn't hurt.
> So the start boundaries for a single RE to catch on are minimized.
OK, so there are benefits if every alternate in a regex starts with the
same letter?

Almost everything I know about the innards of regexes comes from
implementing them when I translated the code in Kernighan & Plauger's
"Software Tools in Pascal" into PL/9 (FYI PL/9 is a derivative of PL/M
for the 6809, so I did this a long time ago). I remember that was a
quite workable regex engine, but it had no optimisations and wasn't
startlingly fast.  

I now think I need to know more about how modern regex engines work and
in particular about the optimisations used by PCRE. Can anybody
recommend documentation on this topic?


Reply via email to