On Monday 03 December 2007 20:29, Dave Nomura wrote:
> So you tracked down these unitialized values down to the strxxx
> functions defined in ld.so and Valgrind normally intercepts these calls
> because Memcheck can't handle the sorts of code that is generated for
> these routines?
Correct.
> Is it possible to teach Memcheck to deal with these optimizations?
>
> Steve Munroe, the author of those optimized strxxx functions, tells me
> that the kinds of optimizations done for these routines are going to
> start appearing in other library routines, and possibly in generated
> object code so the problem is going to become more pervasive.
You're in the land of difficult tradeoffs. A lot of effort has
already been applied here.
All these optimised, vectorised (effectively) string ops rely on two
techniques:
(1) using properties of carry-chain propagation in addition/subtraction
so as find out whether any byte in a word is zero, and if so
which one
(2) reading (traditional C-style zero-terminated) strings using
aligned word reads, rather than byte reads
(1) fools Memcheck's normal handling of definedness tracking for
adds/subtracts, causing it to believe the result of the add/subtract
is completely undefined, when it isn't really. In fact Memcheck
can and sometimes does generate a more exact interpretation, which
does handle this case correctly.
The problem is deciding when to apply it. The standard analysis
costs about 3 insns in the generated code, and the exact analysis
more than 10 insns (+ more registers). Applying the expensive case
throughout would cause significant slowdowns to the 99.99% of code
fragments for which the standard handling is perfectly adequate.
(2) causes Memcheck to report invalid address errors for the partial
word loads covering the zero terminating bytes at the end of
strings. You can stop it complaining about this by giving
--partial-loads-ok=yes, but that could cause genuine errors to
be missed. Said flag is not enabled by default.
I realise that (2) is "perfectly safe" in that the word-sized loads
are naturally aligned and so cannot possibly cause any page faults
that would not otherwise occur. Nevertheless, any way you slice it,
ISO C/C++ says that reading memory outside of allocated blocks
counts as undefined behaviour (IIUC), and that's precisely what
Memcheck aims to report.
We have never claimed that Memcheck is suitable for code compiled at
-O2 and above. -O is the max recommended level. I would advocate the
following:
* do not allow gcc to inline stringops at -O, only at -O2 and above
* do not strip all symbol names off ld.so
In short there's a conflict between optimising the hell out of stringops
and having enough visibility for reliable debugging. Given the above
constraints I don't see how you can have your cake and eat it.
Note that none of the above is PPC specific -- it also applies to
x86/amd64. I'm not sure why these problems appear more acute on ppc
-- it may be some interaction between the carry chain propagation
games and the fact that ppc is bigendian.
J
-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell. From the desktop to the data center, Linux is going
mainstream. Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
_______________________________________________
Valgrind-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/valgrind-developers