Hi Brock, On Monday 26 October 2009 03:23:42 pm Brock Palen wrote: > Is there a large overhead for --enable-debug --enable-memchecker? > > reading: > http://www.open-mpi.org/faq/?category=debugging > > It sounds like there is and there isn't, what should I expect if we > build all of our mpi libraries with those options, when we run normally: > > mpirun ./myexe > > vs using a library that was not built with those options? This may be too verbose an answer ,-)
Now while --enable-debug adds quite a bit of overhead, due to various internal runtime checks being introduced into code-path (e.g. for every opal-object checks of a magic-id, whether this really is a proper object, checking the reference-counter and keeping the src-file and line-number of the constructor). How "bad" --enable-debug is really depends on Your communication pattern and the setup, e.g. shared memory communication latency suffers most. To make usage of memchecker and the best of valgrind, You don't actually need --enable-debug, depending on Your setup: - For user-apps debugging (checking, whether buffers given to MPI are initialized, whether data returned by MPI may be accessed, etc.) The user app of course should be compiled with debugging on ("-g"). - To get valgrind-output of OMPI-internal data-structures including source- location of undefined memory of You'd want to compile OMPI with --enable-debug (or at least with -g and without optimization) and furthermore define OMPI_WANT_MEMCHECKER_MPI_OBJECTS in ompi/include/ompi/memchecker to check the initialization of OMPI's MPI_Comm/datatypes and others. This however is mostly for OMPI-developers.. Per overhead: - The latency of running an application with libmpi compiled with memchecker when _not_ running under valgrind (3-6% over IB-DDR using IMB), while bandwidth is hardly influenced. - When doing the OMPI-internal MPI-object checking, it _does_ become very costly due to the many client-requests issued using valgrind's API (but as noted this is for OMPI-developers, anyway). Please see http://www.open-mpi.org/papers/parco-2007/ for more information. With the NPB benchmark, we did not find any performance implications with the instrumentation added when not run under valgrind. Now when running the application under valgrind, the expected slow-down of the valgrind's memcheck come into effect... So, the most flexible way is to provide two versions and let users decide per modulefile with a verbose proc ModulesHelp... With best regards, Rainer -- ------------------------------------------------------------------------ Rainer Keller, PhD Tel: +1 (865) 241-6293 Oak Ridge National Lab Fax: +1 (865) 241-4811 PO Box 2008 MS 6164 Email: kel...@ornl.gov Oak Ridge, TN 37831-2008 AIM/Skype: rusraink