> What I did do is binary search on:
> ----- tsvd3.cpp
>   // Elliminate spurious valgrind uninitialized errors
> #if 1
>   for( int iii=38; iii<lwork; ++iii ) work[iii]=123.456;
> #endif
> -----
> I see no complaints when starting the loop at iii=1,2,4,8,16,32;
> then errors at 64,48,40; no complaint at 36; errors at 38;
> no complaint at 37.  Hmmm...

First, whenever you are faced with murky malloc, then you should enlist help.
The glibc library provides some debugging aids which are quite inexpensive.
They are so cheap that I use them all the time, for all processes.
Put this in $HOME/.bash_profile, or feed it directly to your shell, etc.:
   # http://udrepper.livejournal.com/11429.html
   export MALLOC_PERTURB_=$(($RANDOM % 255 + 1))
   echo 1>&2 MALLOC_PERTURB_=$MALLOC_PERTURB_  " # $HOME/.bash_profile"
This will cause all bytes in newly malloc()ed areas to be set to the
same random byte.  [Or, specify a constant such as
   export MALLOC_PERTURB_=0xF5
When running under valgrind, then the low-level interception of malloc()
and the careful watching by memcheck will supersede MALLOC_PERTURB_.]

Continuing after the binary search, I tried:
----- tsvd3.cpp
  // Elliminate spurious valgrind uninitialized errors
#if 1
  for( int iii=38; iii<lwork; ++iii ) work[iii]=123.456;
  for( int iii= 1; iii<=  36; ++iii ) work[iii]=123.456;
#endif
-----
which leaves only work[37] uninit. Running this under valgrind
generates complaints from memcheck; the first is:
-----
lwork_q= 108
lwork= 108
==23901== Conditional jump or move depends on uninitialised value(s)
==23901==    at 0x5498486: dnrm2_ 
(/usr/src/debug/lapack-3.4.2/BLAS/SRC/dnrm2.f:94)
==23901==    by 0x4E27E27: dlarfg_ (in /usr/lib64/atlas/liblapack.so.3.0)
==23901==    by 0x4DACD89: dgelq2_ (in /usr/lib64/atlas/liblapack.so.3.0)
==23901==    by 0x4DAD457: dgelqf_ (in /usr/lib64/atlas/liblapack.so.3.0)
==23901==    by 0x4DBA96B: dgesdd_ (in /usr/lib64/atlas/liblapack.so.3.0)
==23901==    by 0x4018F7: main 
(/bigdata/home/jreiser/valgrind-fortran/tsvd3.cpp:62)
==23901==  Uninitialised value was created by a heap allocation
==23901==    at 0x4A07C84: operator new[](unsigned long) 
(/builddir/build/BUILD/valgrind-3.8.1/coregrind/m_replacemalloc/vg_replace_malloc.c:363)
==23901==    by 0x40180F: main 
(/bigdata/home/jreiser/valgrind-fortran/tsvd3.cpp:52)
-----
Now we know that exactly one 8-byte 'double' uninit at work[37] will trigger 
the complaints.
This aligned 8-byte region is small enough that we can take advantage of 
debugging hardware
in x86 chips.

So now I run directly under gdb (without valgrind), put a breakpoint just after
the code which leaves work[37] uninit, and plant a hardware 'read' watchpoint 
on &work[37]:
(gdb) b tsvd3.cpp:58
(gdb) run
Breakpoint 2, main () at tsvd3.cpp:62
(gdb) p &work[37]
$1 = (double *) 0x606178
(gdb) rwatch *(double *)0x606178
Hardware read watchpoint 3: *(double *)0x606178
(gdb) continue


Lo and behold, work[37] is fetched and used.  That is, there is a real error:
Hardware read watchpoint 3: *(double *)0x606178

Value = -1.6882786079646144e+260
0x000000000040154a in scal_generic<int, double, double> (n=0x3,
    alpha=@0x7ffff7ca6be0: 0, y=0x606170, incY=0x1) at gemv2.cpp:17
17              y[iY] *= alpha;
(gdb) x/12i $pc-0x18
   0x401532 <scal_generic<int, double, double>(int, double const&, double*, 
int)+54>:   mov    -0x8(%rbp),%eax
   0x401535 <scal_generic<int, double, double>(int, double const&, double*, 
int)+57>:   cltq
   0x401537 <scal_generic<int, double, double>(int, double const&, double*, 
int)+59>:   lea    0x0(,%rax,8),%rcx
   0x40153f <scal_generic<int, double, double>(int, double const&, double*, 
int)+67>:   mov    -0x28(%rbp),%rax
   0x401543 <scal_generic<int, double, double>(int, double const&, double*, 
int)+71>:   add    %rcx,%rax
   0x401546 <scal_generic<int, double, double>(int, double const&, double*, 
int)+74>:   movsd  (%rax),%xmm1   ### the fetch of uninit
=> 0x40154a <scal_generic<int, double, double>(int, double const&, double*, 
int)+78>:   mov    -0x20(%rbp),%rax
   0x40154e <scal_generic<int, double, double>(int, double const&, double*, 
int)+82>:   movsd  (%rax),%xmm0
   0x401552 <scal_generic<int, double, double>(int, double const&, double*, 
int)+86>:   mulsd  %xmm1,%xmm0   ### the use of uninit
   0x401556 <scal_generic<int, double, double>(int, double const&, double*, 
int)+90>:   movsd  %xmm0,(%rdx)
   0x40155a <scal_generic<int, double, double>(int, double const&, double*, 
int)+94>:   addl   $0x1,-0x4(%rbp)
   0x40155e <scal_generic<int, double, double>(int, double const&, double*, 
int)+98>:   mov    -0x18(%rbp),%eax

(gdb) p $rax
$2 = 0x606178  ### yes, it is &work[37]
(gdb) x/2xw $rax  ### and those bytes are uninit
0x606178:       0xf5f5f5f5      0xf5f5f5f5    ### The pattern for uninit set by 
MALLOC_PERTURB_
(gdb) bt
#0  0x000000000040154a in scal_generic<int, double, double> (n=0x3,
    alpha=@0x7ffff7ca6be0: 0, y=0x606170, incY=0x1) at gemv2.cpp:17
#1  0x00000000004011f8 in gemv_generic<int, double, double, double, double, 
double> (order=RowMajor, transA=Trans, conjX=NoTrans, m=0x8, n=0x3,
    alpha=@0x7ffff7ca6bc8: 1, A=0x7fffffffde48, ldA=0x4, x=0x7fffffffde40,
    incX=0x4, beta=@0x7ffff7ca6be0: 0, y=0x606170, incY=0x1) at gemv2.cpp:108
#2  0x0000000000400e05 in gemv_generic<int, double, double, double, double, 
double> (order=ColMajor, transA=Trans, conjX=NoTrans, m=0x3, n=0x8,
    alpha=@0x7ffff7ca6bc8: 1, A=0x7fffffffde48, ldA=0x4, x=0x7fffffffde40,
    incX=0x4, beta=@0x7ffff7ca6be0: 0, y=0x606170, incY=0x1) at gemv2.cpp:58
#3  0x0000000000400d4e in gemv<int, double, double, double, double, double> (
    order=ColMajor, trans=NoTrans, m=0x3, n=0x8, alpha=@0x7ffff7ca6bc8: 1,
    A=0x7fffffffde48, ldA=0x4, x=0x7fffffffde40, incX=0x4,
    beta=@0x7ffff7ca6be0: 0, y=0x606170, incY=0x1) at gemv2.cpp:156
#4  0x0000000000400cc8 in dgemv_ (TRANS=0x7ffff7ca6be8 "No transpose",
    M=0x7fffffffd938, N=0x7fffffffd93c, ALPHA=0x7ffff7ca6bc8,
    _A=0x7fffffffde48, LDA=0x7fffffffdf58, X=0x7fffffffde40,
    INCX=0x7fffffffdf58, BETA=0x7ffff7ca6be0, Y=0x606170, INCY=0x7ffff7ca6bdc)
    at gemv2.cpp:204
#5  0x00007ffff797f3fb in dlarf_ () from /usr/lib64/atlas/liblapack.so.3
#6  0x00007ffff7906e1f in dgelq2_ () from /usr/lib64/atlas/liblapack.so.3
#7  0x00007ffff7907458 in dgelqf_ () from /usr/lib64/atlas/liblapack.so.3
#8  0x00007ffff791496c in dgesdd_ () from /usr/lib64/atlas/liblapack.so.3
#9  0x00000000004018f8 in main () at tsvd3.cpp:62

So there is the [a] real error.  Apologize to memcheck, and fix your bug.


------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Valgrind-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to