On Mon, Mar 31, 2008 at 9:04 AM, Antony Scriven <[EMAIL PROTECTED]> wrote: > > On 31/03/2008, Ian Young <[EMAIL PROTECTED]> wrote: > > > [...] > > > > > I did a couple informal benchmarks when I preparing for > > a short talk I gave at my school. The slides are at > > <http://vim-soc-regexp.googlecode.com/files/ThursExtra.odp>, > > and near the end are some graphs with the data > > I collected. Short version: the pathological cases > > mirror Russ Cox's results pretty closely. The > > non-pathological cases tend to be a bit slower with the > > new engine than the old, but we're talking differences of > > a few nanoseconds in most cases. A few cases had more > > substantial differences (in the order of hundreds of > > nanoseconds), which is the reason I believe some work > > should go into optimizing before we incorporate the new > > engine into a release. > > Do you know why the new engine is slower than the old one in > these cases? Are your benchmarks for matching only, or do > they include the compilation phase? Also doesn't the old > engine use something like Boyer-Moore on string literals in > the regexp to home in quickly on a potential match? Could > you make use of that code too? --Antony > >
To some extent, a minor slowdown on many regexps is unavoidable. When the backtracking engine guesses right on one of the first tries, it has executed very quickly. In contrast, the NFA implementation tries all paths of execution simultaneously until at least one match is found. So on regexps with a lot of branches, the NFA engine will often do more work. The idea is that it's worth it to trade a few nanoseconds here and there for the really massive delays on cases the backtracking engine handles poorly (such as those illustrated in Russ' article and my slides). The graphs in the slideshow are only execution time. Compile time for the new engine is slightly slower across the board, but nothing in compilation topped 20ns, so it seemed less important (although our compilation phase is certainly not as streamlined as it could be). I forgot to link to the second file, which is a more complete set of data for the curious. You can find it at <http://vim-soc-regexp.googlecode.com/files/times_basic.ods>. It includes graphs of both compilation and execution times, plus all the raw data: time dumps for each run and the line numbers that will match bars on the graph to the regexp with that runtime. You can see in this file which of the regexps I tested were noticeably slower in the new implementation: generally those with lots of branches where the greedy backtracking approach guesses right the first time, such as /a*a*a*a*a*b/ against 'aaaaaaaaab'. As for some sort of optimization for string literals, it's certainly a possibility, but we haven't looked into it (we've been focusing on getting it running correctly before we bother optimizing). I don't know if the old code has such a thing - I haven't looked at it carefully enough. Ian --~--~---------~--~----~------------~-------~--~----~ You received this message from the "vim_dev" maillist. For more information, visit http://www.vim.org/maillist.php -~----------~----~----~----~------~----~------~--~---
