This is not an MPI problem.  You will likely find StackOverflow to be a
more effective way to get support on C++ issues.

Jeff

On Wed, Apr 3, 2019 at 8:47 AM Zhen Wang <tod...@gmail.com> wrote:

> Joseph,
>
> Thanks for your response. I'm no expert on Linux so please bear with me.
> If I understand correctly, using malloc instead of resize should allow me
> to handle out of memory error properly, but I still see abnormal
> termination (code is attached).
>
> I have more questions.
>
> 1. If the problem is overcommit, (meaning not related to MP I suppose)I,
> why don't I see it if only MPI 0 calls resize? MPI 0 still overcommits and
> bac_alloc is caught.
>
> 2. In resize, if the returned pointer is null, should it throw some kind
> of error so the caller could catch and handle that? I don't know the
> implementation but simply exiting doesn't seem a good idea.
>
> Thanks.
>
> Best regards,
> Zhen
>
>
> On Wed, Apr 3, 2019 at 10:02 AM Joseph Schuchart <schuch...@hlrs.de>
> wrote:
>
>> Zhen,
>>
>> The "problem" you're running into is memory overcommit [1]. The system
>> will happily hand you a pointer to memory upon calling malloc without
>> actually allocating the pages (that's the first step in
>> std::vector::resize) and then terminate your application as soon as it
>> tries to actually allocate them if the system runs out of memory. This
>> happens in std::vector::resize too, which sets each entry in the vector
>> to it's initial value. There is no way you can catch that. You might
>> want to try to disable overcommit in the kernel and see if
>> std::vector::resize throws an exception because malloc fails.
>>
>> HTH,
>> Joseph
>>
>> [1] https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
>>
>> On 4/3/19 3:26 PM, Zhen Wang wrote:
>> > Hi,
>> >
>> > I have difficulty catching std::bac_alloc in an MPI environment. The
>> > code is attached. I'm uisng gcc 6.3 on SUSE Linux Enterprise Server 11
>> > (x86_64). OpenMPI is built from source. The commands are as follows:
>> >
>> > *Build*
>> > g++ -I<openmpi-4.0.0-opt/include> -L<openmpi-4.0.0-opt/lib> -lmpi
>> memory.cpp
>> >
>> > *Run*
>> > <openmpi-4.0.0-opt/bin/mpiexec> -n 2 a.out
>> >
>> > *Output*
>> > 0
>> > 0
>> > 1
>> > 1
>> >
>> --------------------------------------------------------------------------
>> > Primary job  terminated normally, but 1 process returned
>> > a non-zero exit code. Per user-direction, the job has been aborted.
>> >
>> --------------------------------------------------------------------------
>> >
>> --------------------------------------------------------------------------
>> > mpiexec noticed that process rank 0 with PID 0 on node cdcebus114qa05
>> > exited on signal 9 (Killed).
>> >
>> --------------------------------------------------------------------------
>> >
>> >
>> > If I uncomment the line //if (rank == 0), i.e., only rank 0 allocates
>> > memory, I'm able to catch bad_alloc as I expected. It seems that I am
>> > misunderstanding something. Could you please help? Thanks a lot.
>> >
>> >
>> >
>> > Best regards,
>> > Zhen
>> >
>> > _______________________________________________
>> > users mailing list
>> > users@lists.open-mpi.org
>> > https://lists.open-mpi.org/mailman/listinfo/users
>> >
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users



-- 
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to