On Fri, 2010-03-12 at 16:27 +0200, Henrik K wrote: > If you have enough words to require multiple REs, then sorting doesn't hurt. > So the start boundaries for a single RE to catch on are minimized. > OK, so there are benefits if every alternate in a regex starts with the same letter?
Almost everything I know about the innards of regexes comes from implementing them when I translated the code in Kernighan & Plauger's "Software Tools in Pascal" into PL/9 (FYI PL/9 is a derivative of PL/M for the 6809, so I did this a long time ago). I remember that was a quite workable regex engine, but it had no optimisations and wasn't startlingly fast. I now think I need to know more about how modern regex engines work and in particular about the optimisations used by PCRE. Can anybody recommend documentation on this topic? Martin