On Thu, Jan 24 2019 17:03:57 -0700, Theo de Raadt wrote: > Scott Cheloha <scottchel...@gmail.com> wrote: > > > > On Jan 24, 2019, at 06:19, Lauri Tirkkonen <loth...@iki.fi> wrote: > > > > > > [...] > > > > > > I haven't done any actual measurements though, so it's possible my > > > reading is wrong. > > > > Is there a "grepbench" or canonical corpus we can use to get a 95% > > CI on this and future changes? People talk about grep(1) perf. like > > CPU time and memory use, there ought to be something better than > > "let's search the Linux kernel source for a string that isn't there." > > A study of call patterns with ltrace might help
Ok, so let's do some measurements. First, I compiled four copies of grep: /usr/bin/grep # no modification; uses fgetln /usr/bin/grep.small # as above, with -DSMALL /usr/bin/grep.getline # uses getline through my diff /usr/bin/grep.small.getline # as above, with -DSMALL First let's look at a somewhat pathological case: a ~2GB file where each line is longer than the previous line, by one byte. cd /tmp python3 -c 'for i in range(65536): print("a"*i)' > foo ltrace -u libc -f fgetln.out grep b foo ltrace -u libc -f getline.out grep.getline b foo ltrace -u libc -f fgetln.small.out grep.small b foo ltrace -u libc -f getline.small.out grep.small.getline b foo side note: my shell reported the following times for the above commands. these are not conclusive but it is very interesting that if we disable the mmap path (which supposedly is an optimisation), the process finishes quite a bit faster, with a lot less user time spent than the non-SMALL version, regardless of whether we use fgetln or getline... ltrace -u libc -f fgetln.out grep b foo 12.24s user 5.26s system 62% cpu 27.797 total ltrace -u libc -f getline.out grep.getline b foo 13.45s user 3.67s system 64% cpu 26.361 total ltrace -u libc -f fgetln.small.out grep.small b foo 3.62s user 5.00s system 42% cpu 20.418 total ltrace -u libc -f getline.small.out grep.small.getline b foo 3.79s user 3.25s system 35% cpu 19.726 total Since both /usr/bin/grep and /usr/bin/grep.getline are able to use mmap() on this file, I expect no difference in the number of allocations they make. ltrace agrees: rush /tmp % kdump -f getline.out|egrep -c '"(malloc|calloc|realloc|recalloc)' 9 rush /tmp % kdump -f fgetln.out|egrep -c '"(malloc|calloc|realloc|recalloc)' 9 So how about the -DSMALL versions? These should be more allocations. rush /tmp % kdump -f fgetln.small.out|egrep -c '"(malloc|calloc|realloc|recalloc)' 12 rush /tmp % kdump -f getline.small.out|egrep -c '"(malloc|calloc|realloc|recalloc)' 12 Well, that's actually quite surprising - far fewer allocations than I expected. Maybe I'm measuring the wrong thing somehow (I'm not really familiar with ltrace). Let's take a look at what ktrace has to say about these as a sanity check. ktrace -tc -f kt.fgetln.out grep b foo ktrace -tc -f kt.getline.out grep.getline b foo ktrace -tc -f kt.fgetln.small.out grep.small b foo ktrace -tc -f kt.getline.small.out grep.small.getline b foo rush /tmp % kdump -f kt.fgetln.out|grep -c 'CALL *mmap.*MAP_ANON' 27886 rush /tmp % kdump -f kt.getline.out|grep -c 'CALL *mmap.*MAP_ANON' 52 rush /tmp % kdump -f kt.fgetln.small.out|grep -c 'CALL *mmap.*MAP_ANON' 27883 rush /tmp % kdump -f kt.getline.small.out|grep -c 'CALL *mmap.*MAP_ANON' 51 So, yes, the fgetln case for this file is indeed hitting MAP_ANON mmaps way more often. But now I'm also a bit confused - why does the non-SMALL case also call MAP_ANON mmap so many times? It's supposed to just memory-map the file. 41564 grep CALL mmap(0,0x80008000,0x1<PROT_READ>,0x2<MAP_PRIVATE>,0,0) 41564 grep RET mmap -1 errno 22 Invalid argument wait - fd 0? It turns out that mmfile.c passes mmf->fd to mmap, but does not actually initialize it. That's my fault, sorry - I apparently broke it in the previous commit to grep (although I don't know why I wouldn't get a maybe-uninitialized warning from this; perhaps it's because of the grep_malloc wrapper?). Let's fix it: diff --git a/usr.bin/grep/mmfile.c b/usr.bin/grep/mmfile.c index fa5a0ea7752..d866dd3977b 100644 --- a/usr.bin/grep/mmfile.c +++ b/usr.bin/grep/mmfile.c @@ -49,6 +49,7 @@ mmopen(int fd, struct stat *st) if (st->st_size > SIZE_MAX) /* too big to mmap */ goto ouch; mmf->len = (size_t)st->st_size; + mmf->fd = fd; mmf->base = mmap(NULL, mmf->len, PROT_READ, MAP_PRIVATE, mmf->fd, (off_t)0); if (mmf->base == MAP_FAILED) goto ouch; and then rerun the ktrace tests. The naive 'time' test shows a clear improvement now that we are actually memory mapping the file again: ktrace -tc -f kt.fgetln.out grep b foo 11.43s user 1.58s system 100% cpu 13.003 total ktrace -tc -f kt.getline.out grep.getline b foo 12.40s user 1.67s system 99% cpu 14.207 total ktrace -tc -f kt.fgetln.small.out grep.small b foo 4.05s user 5.13s system 45% cpu 20.196 total ktrace -tc -f kt.getline.small.out grep.small.getline b foo 3.34s user 2.84s system 33% cpu 18.387 total rush /tmp 1 % kdump -f kt.fgetln.out|grep -c 'CALL *mmap.*MAP_ANON' 39 rush /tmp % kdump -f kt.getline.out|grep -c 'CALL *mmap.*MAP_ANON' 39 rush /tmp % kdump -f kt.fgetln.small.out|grep -c 'CALL *mmap.*MAP_ANON' 27884 rush /tmp % kdump -f kt.getline.small.out|grep -c 'CALL *mmap.*MAP_ANON' 48 These numbers match my expectations - fgetln() does way more allocations, since this file is quite pathological for its allocation strategy. I was going to do another test on a large source tree (like Linux) with grep -r, but it seems I don't know how to use ltrace properly, so I don't think the results would be accurate (ktrace'ing for mmap seemed like an ok idea in this case, because the line length is always increasing - I assumed malloc would always have to ask the kernel for more memory. But that is clearly not true for actual source files). In summary: sorry for the bug I introduced - please apply the diff above to fix memory mapping files in grep. As for the fgetln->getline conversion, I think this result should show that fgetln is not particularly smart about how it's calling mmap -- getline seems better. -- Lauri Tirkkonen | lotheac @ IRCnet