On Thu, Jan 24 2019 17:03:57 -0700, Theo de Raadt wrote:
> Scott Cheloha <scottchel...@gmail.com> wrote:
> 
> > > On Jan 24, 2019, at 06:19, Lauri Tirkkonen <loth...@iki.fi> wrote:
> > > 
> > > [...]
> > >  
> > > I haven't done any actual measurements though, so it's possible my
> > > reading is wrong.
> > 
> > Is there a "grepbench" or canonical corpus we can use to get a 95%
> > CI on this and future changes?  People talk about grep(1) perf. like
> > CPU time and memory use, there ought to be something better than
> > "let's search the Linux kernel source for a string that isn't there."
> 
> A study of call patterns with ltrace might help

Ok, so let's do some measurements. First, I compiled four copies of
grep:
        /usr/bin/grep                   # no modification; uses fgetln
        /usr/bin/grep.small             # as above, with -DSMALL
        /usr/bin/grep.getline           # uses getline through my diff
        /usr/bin/grep.small.getline     # as above, with -DSMALL

First let's look at a somewhat pathological case: a ~2GB file where each
line is longer than the previous line, by one byte.

        cd /tmp
        python3 -c 'for i in range(65536): print("a"*i)' > foo
        ltrace -u libc -f fgetln.out grep b foo
        ltrace -u libc -f getline.out grep.getline b foo
        ltrace -u libc -f fgetln.small.out grep.small b foo
        ltrace -u libc -f getline.small.out grep.small.getline b foo

side note: my shell reported the following times for the above commands.
these are not conclusive but it is very interesting that if we disable
the mmap path (which supposedly is an optimisation), the process
finishes quite a bit faster, with a lot less user time spent than the
non-SMALL version, regardless of whether we use fgetln or getline...

ltrace -u libc -f fgetln.out grep b foo  12.24s user 5.26s system 62% cpu 
27.797 total
ltrace -u libc -f getline.out grep.getline b foo  13.45s user 3.67s system 64% 
cpu 26.361 total
ltrace -u libc -f fgetln.small.out grep.small b foo  3.62s user 5.00s system 
42% cpu 20.418 total
ltrace -u libc -f getline.small.out grep.small.getline b foo  3.79s user 3.25s 
system 35% cpu 19.726 total

Since both /usr/bin/grep and /usr/bin/grep.getline are able to use
mmap() on this file, I expect no difference in the number of allocations
they make. ltrace agrees:

        rush /tmp % kdump -f getline.out|egrep -c 
'"(malloc|calloc|realloc|recalloc)'
        9
        rush /tmp % kdump -f fgetln.out|egrep -c 
'"(malloc|calloc|realloc|recalloc)' 
        9

So how about the -DSMALL versions? These should be more allocations.

        rush /tmp % kdump -f fgetln.small.out|egrep -c 
'"(malloc|calloc|realloc|recalloc)'
        12
        rush /tmp % kdump -f getline.small.out|egrep -c 
'"(malloc|calloc|realloc|recalloc)'
        12

Well, that's actually quite surprising - far fewer allocations than I
expected. Maybe I'm measuring the wrong thing somehow (I'm not really
familiar with ltrace). Let's take a look at what ktrace has to say about
these as a sanity check.

        ktrace -tc -f kt.fgetln.out grep b foo
        ktrace -tc -f kt.getline.out grep.getline b foo
        ktrace -tc -f kt.fgetln.small.out grep.small b foo
        ktrace -tc -f kt.getline.small.out grep.small.getline b foo

        rush /tmp % kdump -f kt.fgetln.out|grep -c 'CALL *mmap.*MAP_ANON'
        27886
        rush /tmp % kdump -f kt.getline.out|grep -c 'CALL *mmap.*MAP_ANON'
        52
        rush /tmp % kdump -f kt.fgetln.small.out|grep -c 'CALL *mmap.*MAP_ANON'
        27883
        rush /tmp % kdump -f kt.getline.small.out|grep -c 'CALL *mmap.*MAP_ANON'
        51

So, yes, the fgetln case for this file is indeed hitting MAP_ANON mmaps
way more often. But now I'm also a bit confused - why does the non-SMALL
case also call MAP_ANON mmap so many times? It's supposed to just
memory-map the file.

         41564 grep     CALL  
mmap(0,0x80008000,0x1<PROT_READ>,0x2<MAP_PRIVATE>,0,0)
         41564 grep     RET   mmap -1 errno 22 Invalid argument

wait - fd 0? It turns out that mmfile.c passes mmf->fd to mmap, but does
not actually initialize it. That's my fault, sorry - I apparently broke
it in the previous commit to grep (although I don't know why I wouldn't
get a maybe-uninitialized warning from this; perhaps it's because of the
grep_malloc wrapper?). Let's fix it:

diff --git a/usr.bin/grep/mmfile.c b/usr.bin/grep/mmfile.c
index fa5a0ea7752..d866dd3977b 100644
--- a/usr.bin/grep/mmfile.c
+++ b/usr.bin/grep/mmfile.c
@@ -49,6 +49,7 @@ mmopen(int fd, struct stat *st)
        if (st->st_size > SIZE_MAX) /* too big to mmap */
                goto ouch;
        mmf->len = (size_t)st->st_size;
+       mmf->fd = fd;
        mmf->base = mmap(NULL, mmf->len, PROT_READ, MAP_PRIVATE, mmf->fd, 
(off_t)0);
        if (mmf->base == MAP_FAILED)
                goto ouch;

and then rerun the ktrace tests. The naive 'time' test shows a clear
improvement now that we are actually memory mapping the file again:

ktrace -tc -f kt.fgetln.out grep b foo  11.43s user 1.58s system 100% cpu 
13.003 total
ktrace -tc -f kt.getline.out grep.getline b foo  12.40s user 1.67s system 99% 
cpu 14.207 total
ktrace -tc -f kt.fgetln.small.out grep.small b foo  4.05s user 5.13s system 45% 
cpu 20.196 total 
ktrace -tc -f kt.getline.small.out grep.small.getline b foo  3.34s user 2.84s 
system 33% cpu 18.387 total

        rush /tmp 1 % kdump -f kt.fgetln.out|grep -c 'CALL *mmap.*MAP_ANON'
        39
        rush /tmp % kdump -f kt.getline.out|grep -c 'CALL *mmap.*MAP_ANON'
        39
        rush /tmp % kdump -f kt.fgetln.small.out|grep -c 'CALL *mmap.*MAP_ANON'
        27884
        rush /tmp % kdump -f kt.getline.small.out|grep -c 'CALL *mmap.*MAP_ANON'
        48

These numbers match my expectations - fgetln() does way more
allocations, since this file is quite pathological for its allocation
strategy.

I was going to do another test on a large source tree (like Linux) with
grep -r, but it seems I don't know how to use ltrace properly, so I
don't think the results would be accurate (ktrace'ing for mmap seemed
like an ok idea in this case, because the line length is always
increasing - I assumed malloc would always have to ask the kernel for
more memory. But that is clearly not true for actual source files).

In summary: sorry for the bug I introduced - please apply the diff above
to fix memory mapping files in grep. As for the fgetln->getline
conversion, I think this result should show that fgetln is not
particularly smart about how it's calling mmap -- getline seems better.

-- 
Lauri Tirkkonen | lotheac @ IRCnet

Reply via email to