> On Jun 19, 2019, at 5:05 PM, Joshua Ladd <[email protected]> wrote:
>
> Hi, Noam
>
> Can you try your original command line with the following addition:
>
> mpirun —mca pml ucx —mca btl ^vader,tcp,openib -mca osc ucx
>
> I think we're seeing some conflict between UCX PML and UCT OSC.
I did this, although meanwhile I also did a clean compile (to add some
debugging statements) and switched from running on 1 node (36 cores) to 2
nodes. The problem is slightly different, but still similar. Now the memory
doesn’t continue to expand until it runs out. Instead, one node (the head
node?) is using 55 GB, while the other is using only 23 GB. The latter value
(23 GB) is consistent with the usage from ps or top (36 * 640 MB/proc). When I
kill the job, the node that used to use 55 GB goes down to 34 GB (with nothing
running), and the other is down to about 1 GB.
Noam
____________
||
|U.S. NAVAL|
|_RESEARCH_|
LABORATORY
Noam Bernstein, Ph.D.
Center for Materials Physics and Technology
U.S. Naval Research Laboratory
T +1 202 404 8628 F +1 202 404 7546
https://www.nrl.navy.mil <https://www.nrl.navy.mil/>
_______________________________________________
users mailing list
[email protected]
https://lists.open-mpi.org/mailman/listinfo/users