On Tue, 20 Jul 2010 13:34:12 -0500
Dave Goodell <[email protected]> wrote:

> Hi Edwin,

Hi,

Thanks for the quick reply.

> 
> On Jul 20, 2010, at 10:32 AM CDT, Török Edwin wrote:
> 
> > While it is possible to teach valgrind about ClamAV's pools (see
> > attached mpool.patch for example), I think valgrind will be more
> > effective if I just disable ClamAV's pools (and let it use malloc).
> 
> This sounds like a practical solution to me.  There is a
> RUNNING_ON_VALGRIND client request so that you can do this at runtime
> instead of having to reconfigure/recompile.

I could use that and some env var maybe to switch allocators when
debugging. It is easier than rebuilding/switching build tree.
I don't want to do that always because:
 - valgrind would not be testing the same code that is used in releases
 - sometimes bugs depend on memory layout/alignment, and they only occur
   with mempool, not with libc allocator. It might be a bug in the pool
   allocator itself.

> 
> > The problem is that ClamAV's allocator doesn't have redzones, and
> > sometimes it allocates more memory than strictly requested.
> > All this makes it hard for valgrind to detect off-by-one errors,
> > even if I teach it about the pools.
> > 
> > Here is a rough idea of how ClamAV's pools look like:
> > mmap chunk_0 = | pool header | allocatable space |
> > mmap chunk_1 .. chunk_N = | block header | allocatable space |
> > 
> > Each allocation block looks like this:
> > | padding1| information about allocation | allocated data |
> > padding2 |
> > 
> > Freed data is reusable, so freed blocks look like:
> > | ptr next free block | undefined data |
> > 
> > Also reallocs will reuse the allocation block
> > (when shrinking).
> > 
> > What I did was mark the entire allocation block with
> > VALGRIND_MEMPOOL_ALLOC, however this has several shortcomings.
> > 1. Code can corrupt the allocation information (located just before
> > the allocated data), I could fix this by marking them as NOACCESS
> > (ideal would be readonly but valgrind doesn't have that notion).
> 
> I do wish that memcheck had a readonly status, but that doesn't seem
> to be what you want here.  It should be erroneous for the client code
> to read or write the bookkeeping information.  NOACCESS is exactly
> what you want, even if it does make the allocator code a bit more
> complicated.

My problem with NOACCESS is that I don't know if it is NOACCESS because
that memory zone wasn't mapped/allocated, or because it is the
bookkeeping info. With readonly at least I'd know the address was valid
at some point, for some purpose. But you are right, not ideal either,
it might be some totally unrelated memory region that happens to be
readonly.

> 
> > 2. If I fix 1), then I get warnings in the allocator itself (when it
> > wants to reuse freed data, or when it wants to realloc), I can fix
> > by marking the memory zones I want to access as UNDEFINED, however
> > this leads to problem 3
> 
> This is a little tedious, but not really that bad.  Depending on what
> you want to do, marking it DEFINED might make more sense.
> 
> > 3. If I get passed an invalid pointer (one that the pool allocator
> > didn't allocate for example), then if I mark my allocation
> > information block as UNDEFINED I actually mark random memory and
> > valgrind can't help me
> 
> Can't you tell if the given pointer is valid based on the address
> value?
>  Don't you track the valid address ranges anyway in order to
> do cleanup at the end?

I could check that it is within the address ranges allocated by the
pool. I can't tell if it is the start of an allocation, or some value
in the middle, or before an allocation.

I might be able to detect that the bookkeeping information is corrupt
in some cases, but not always.

Hmm, if I set the bookeeping info (and padding) as NOACCESS, then I
should be able to check that the memory range where I think the
bookkeeping information is, is marked NOACCESS, and that the pointer
itself is DEF/UNDEF.
Is there a way to do that quitely (without spamming valgrind's log)?
VALGRIND_CHECK_MEM_IS_ADDRESSABLE docs say it spams the logs,
maybe I can use VALGRIND_GET_VBITS? Will have to do some experiments.

Not strictly related to this discussion, but could I mark all
anonymous mmaps (not used by the pool related) with MALLOCLIKE_BLOCK,
and munmaps with FREELIKE_BLOCK to detect anon-map leaks?
I think valgrind doesn't track mmap leaks on its own.

> 
> Otherwise, how about "magic number" in the allocation header that you
> can assert() is correct after you mark it DEFINED?  Do you actually
> need the code to keep operating after the client code passes the
> allocator an invalid pointer?

I have a #define for that already :)
But there were bugs that didn't show up when I define it (because it
increases size of bookkeeping info, which changes all the
paddings/alignments). 
I actually hide the bookkeeping info in the padding sometimes, in best
case scenario the overhead is only 2 bytes.

> 
> > 4. Realloc doesn't know the size of original allocation, it only
> > knows size of original allocation + padding2. So again imprecision
> > and lack of redzone at end of allocation. Realloc sometimes needs
> > to memcpy() so that would need me to temporarely mark the data as
> > accessible
> 
> Again, tedious but not particularly difficult.  Just mark
> (allocation+padding2) as DEFINED before reuse.  You don't need to
> restore the NOACCESS status on the original region of padding2 during
> a realloc, right?  Then the new region of padding2 (post-resize) can
> be marked NOACCESS.

I need to temporarely restore during memcpy(), but that shouldn't be
too hard, assuming I know the pointer is valid.

> 
> > All this is complicated and error prone (I might call valgrind with
> > the wrong information).
> > 
> > I think a simple client request addition would solve this:
> > VALGRIND_MEMPOOL_ALLOCATIONSIZE(pool, addr) -> size of allocation if
> > addr was declared with VALGRIND_MEMPOOL_ALLOC(pool, addr, size), or
> > 0 if addr belongs to a different pool, or no pool at all (it can be
> > noaccess/defined/undefined doesn't matter)
> > 
> > Then I could implement this in ClamAV like so:
> > - mark the paddings and allocator's allocation information as
> > NOACCESS after I fill it, and mark as allocated just exactly what
> >   my mpool_malloc() call requested
> > - when I free/realloc I query valgrind whether the pointer is an
> >   allocation base for the specified pool. If it is, I temporarely
> > mark the private allocation information readable and defined, I
> >   read/modify it, then mark it as NOACCESS again. For realloc I
> > would know the exact allocation size and I would memcpy() only up
> > to that amount
> > 
> > Actually I could implement this already by abusing the meaning of
> > MPOOL_CREATE. 
> > I could pretend each allocation is a pool in itself, mark only the
> > real allocation with MEMPOOL_ALLOC(), and use MEMPOOL_EXISTS() to
> > check whether the pointer really belongs to my pool, and is the
> > start of an allocation. However this might create millions of pool,
> > which I guess is not good for valgrind's performance.
> 
> I don't know what the performance implications of this strategy would
> be.  Perhaps someone else on the list can comment.  However, it's not
> clear that this approach requires any less work than the more
> conventional approach without the new client request.
> 
> -Dave
> 


------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Valgrind-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to