This is not an MPI problem. You will likely find StackOverflow to be a more effective way to get support on C++ issues.
Jeff On Wed, Apr 3, 2019 at 8:47 AM Zhen Wang <tod...@gmail.com> wrote: > Joseph, > > Thanks for your response. I'm no expert on Linux so please bear with me. > If I understand correctly, using malloc instead of resize should allow me > to handle out of memory error properly, but I still see abnormal > termination (code is attached). > > I have more questions. > > 1. If the problem is overcommit, (meaning not related to MP I suppose)I, > why don't I see it if only MPI 0 calls resize? MPI 0 still overcommits and > bac_alloc is caught. > > 2. In resize, if the returned pointer is null, should it throw some kind > of error so the caller could catch and handle that? I don't know the > implementation but simply exiting doesn't seem a good idea. > > Thanks. > > Best regards, > Zhen > > > On Wed, Apr 3, 2019 at 10:02 AM Joseph Schuchart <schuch...@hlrs.de> > wrote: > >> Zhen, >> >> The "problem" you're running into is memory overcommit [1]. The system >> will happily hand you a pointer to memory upon calling malloc without >> actually allocating the pages (that's the first step in >> std::vector::resize) and then terminate your application as soon as it >> tries to actually allocate them if the system runs out of memory. This >> happens in std::vector::resize too, which sets each entry in the vector >> to it's initial value. There is no way you can catch that. You might >> want to try to disable overcommit in the kernel and see if >> std::vector::resize throws an exception because malloc fails. >> >> HTH, >> Joseph >> >> [1] https://www.kernel.org/doc/Documentation/vm/overcommit-accounting >> >> On 4/3/19 3:26 PM, Zhen Wang wrote: >> > Hi, >> > >> > I have difficulty catching std::bac_alloc in an MPI environment. The >> > code is attached. I'm uisng gcc 6.3 on SUSE Linux Enterprise Server 11 >> > (x86_64). OpenMPI is built from source. The commands are as follows: >> > >> > *Build* >> > g++ -I<openmpi-4.0.0-opt/include> -L<openmpi-4.0.0-opt/lib> -lmpi >> memory.cpp >> > >> > *Run* >> > <openmpi-4.0.0-opt/bin/mpiexec> -n 2 a.out >> > >> > *Output* >> > 0 >> > 0 >> > 1 >> > 1 >> > >> -------------------------------------------------------------------------- >> > Primary job terminated normally, but 1 process returned >> > a non-zero exit code. Per user-direction, the job has been aborted. >> > >> -------------------------------------------------------------------------- >> > >> -------------------------------------------------------------------------- >> > mpiexec noticed that process rank 0 with PID 0 on node cdcebus114qa05 >> > exited on signal 9 (Killed). >> > >> -------------------------------------------------------------------------- >> > >> > >> > If I uncomment the line //if (rank == 0), i.e., only rank 0 allocates >> > memory, I'm able to catch bad_alloc as I expected. It seems that I am >> > misunderstanding something. Could you please help? Thanks a lot. >> > >> > >> > >> > Best regards, >> > Zhen >> > >> > _______________________________________________ >> > users mailing list >> > users@lists.open-mpi.org >> > https://lists.open-mpi.org/mailman/listinfo/users >> > >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users -- Jeff Hammond jeff.scie...@gmail.com http://jeffhammond.github.io/
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users