On Sat, Sep 24, 2022 at 2:31 PM Rob Landley <r...@landley.net> wrote:
> On 9/15/22 20:42, Rob Landley wrote: > > On 9/15/22 16:32, enh wrote: > >> ah, but on another box with 2021-10-22 it's broken. so it looks like > "real" xxd > >> had the same bug and fixed it recently? > > > > Easy enough to fix, I was just confused by "the one in debian fails the > same way"... > > > >> > A simple optimization would be to concat all patterns into one > huge regex > >> > pattern with '|'. > >> > >> Which you could do in your pattern creation just as easily as grep > could do so > >> internally? Except when I tried it: > >> > >> $ time toybox grep -Ef <(tr '\n' '|' < test_large_10000 | head -c > -1) \ > >> test_large_10000 > >> > >> The result was even _slower_ on glibc (and gave me "out of memory" > on musl). > >> > >> fwiw, BSD grep which we used to use before > >> had > https://android.googlesource.com/platform/system/core/+/kitkat-release/toolbox/grep/fastgrep.c#32 > >> which (although i haven't tested) i always assumed was for this. > > > > Yeah, doing more ourselves is the obvious way to go... > > I punted on most of the utf8 stuff in there. Let me know if I should > revisit > that. (Right now I'm treating ANY string has char > 127 in it as "hand off > to > the regex engine" instead of using the fast path -F logic. In theory I > only need > to do that for -i but see recent email about reordering width zero unicode > code > points... if the regex engine does it, it's not my fault when it's wrong. > :) > both users i've seen have wanted this for "filenames in a build", so they should be fine. (and, yes, "i18n is someone else's problem" is usually the right choice down at this level.) > >> I could also sort the patterns by first character and loop through > calling the > >> appropriate regexes based on character at string position... which > is harder > >> than you think because there could be a | later in the pattern so > that > >> optimization only applies to a _subset_ of patterns. (First char > being . or * or > >> ( kinda throws that off to, and that optimization would also have > to be aware of > >> case insensitivity...) > > > > I was trying to avoid writing another regex implementation, but a fast > subset > > one that does parallel cache-local searching for simple patterns and > hands off > > to libc for anything complicated isn't that hard. And I can even cheat > with a > > bitmask for starting byte: > > What I did was simpler: filter the incoming TT.e list (after it's been > split at > \n and TT.f entries collated into it, which the code was already doing) > into > fixed and regex categories (ala autodetect when I can apply -F logic, with > basic > support for non-unicode case insensitivity, backslash escapes, and "." not > in > the first position), and then bucket sort the resulting -F list into an > array of > 256 linked lists by starting character (punting for now that starting with > "." > means it autodetects as regex not fixed: the actual -F flag overrides > this), and > then use TT.fixed[c] being NULL to mean nothing can match here. Further > optimization: sorting each per-character list by length in the setup phase > means > the actual search logic can stop at the first hit because it's the longest > one. > > No bitmask needed: https://github.com/landley/toybox/commit/a7e49c3c7860 > > (Then it falls back to running the same old regex logic for any non-fixed > patterns, where regexec searches the whole string from start to finish for > each > pattern. If there aren't any to loop naturally skips itself traversing zero > entries, if there are a small number the performance hit shouldn't be too > bad.) > > It's still slower than the gnu/dammit grep (0.9 seconds vs 0.3 seconds on > my > laptop), but hopefully good enough. > heh, yeah, that sounds like plenty :-) i don't know of anyone who's actually timing this --- the complaint was just about the pathological behavior! > Try now? > (update in progress. given that yochiang and i are in horribly incompatible timezones, we might be a while getting back to you...) > Rob > > P.S. I added a speed test to grep.test (5 second timeout, should still > pass on > the original raspberry pi model A from 2012 not that I have one), using > base64 > on a long seq to get the many unique test lines instead of xxd > /dev/urandom, > because urandom is slow producing nontrivial amounts of data. Trying to > feed seq > into sha3sum or similar was equally slow: running the test should not be an > order of magnitude faster than generating the test data. > certainly fine on my rpi400... PASS: grep speed specifically it takes: real 0m0.609s user 0m0.581s sys 0m0.177s that seems like a good enough margin of error for slower rpis :-)
_______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net