subject:"\[OMPI devel\] 1.8.2rc4 problem\: only 32 out of 48 cores are working"

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-25 Thread Andrej Prsa

Hi Jeff, My apologies for the delay in replying, I was flying back from the UK to the States, but now I'm here and I can provide a more timely response. > I confirm that the hwloc message you sent (and your posts to the > hwloc-users list) indicate that hwloc is getting confused by a buggy > BIOS

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-23 Thread Jeff Squyres (jsquyres)

The video you provided was most helpful -- thank you! I confirm that the hwloc message you sent (and your posts to the hwloc-users list) indicate that hwloc is getting confused by a buggy BIOS, but it's only dealing with the L3 cache, and that shouldn't affect the binding that OMPI is doing. C

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-22 Thread Andrej Prsa

Hi again, I generated a video that demonstrates the problem; for brevity I did not run a full process, but I'm providing the timing below. If you'd like me to record a full process, just let me know -- but as I said in my previous email, 32 procs drop to 1 after about a minute and the computation

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-22 Thread Andrej Prsa

Hi Ralph, Chris, You guys are both correct: (1) The output that I passed along /is/ exemplary of only 32 processors running (provided htop reports things correctly). The job I submitted is the exact same process called 48 times (well, np times), so all procs should take about the same

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Ralph Castain

Ah, that sheds some light. There is indeed a significant change between earlier releases and the 1.8.1 and above that might explain what he is seeing. Specifically, we no longer hammer the cpu while in MPI_Finalize. So if 16 of the procs are finishing early (which the output would suggest), then

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Christopher Samuel

On 22/08/14 10:43, Ralph Castain wrote: > From your earlier concerns, I would have expected only to find 32 of > them running. Was that not the case in this run? As I understand it in his original email he mentioned that with 1.6.5 all 48 processes were running at 100% CPU and was wondering if th

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Ralph Castain

I think maybe I'm misunderstanding something. This shows that all 48 procs ran and terminated normally. From your earlier concerns, I would have expected only to find 32 of them running. Was that not the case in this run? On Aug 21, 2014, at 4:57 PM, Andrej Prsa wrote: > Whoops, jumped the g

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Andrej Prsa

Whoops, jumped the gun there before the process finished. I'm attaching the new stderr output. > Hmmm...that's even weirder. It thinks it is going to start 48 procs, > and the binding pattern even looks right. > > Hate to keep bothering you, but could you ensure this is a debug > build (i.e., was

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Andrej Prsa

> Hate to keep bothering you, but could you ensure this is a debug > build (i.e., was configured with --enable-debug), and then set -mca > odls_base_verbose 5 --leave-session-attached on the cmd line? No bother at all -- would love to help. I recompiled 1.8.2rc4 with debug and issued: /usr/local/

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Ralph Castain

Hmmm...that's even weirder. It thinks it is going to start 48 procs, and the binding pattern even looks right. Hate to keep bothering you, but could you ensure this is a debug build (i.e., was configured with --enable-debug), and then set -mca odls_base_verbose 5 --leave-session-attached on the

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Andrej Prsa

> How odd - can you run it with --display-devel-map and send that > along? It will give us a detailed statement of where it thinks > everything should run. Sure thing -- please find it attached. Cheers, Andrej test.std.bz2 Description: application/bzip

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Ralph Castain

How odd - can you run it with --display-devel-map and send that along? It will give us a detailed statement of where it thinks everything should run. On Aug 21, 2014, at 2:49 PM, Andrej Prsa wrote: > Hi Ralph, > > Thanks for your reply! > >> One thing you might want to try: add this to your

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Andrej Prsa

Hi Ralph, Thanks for your reply! > One thing you might want to try: add this to your mpirun cmd line: > > --display-allocation > > This will tell you how many slots we think we've been given on your > cluster. I tried that using 1.8.2rc4, this is what I get: == ALLOCATED

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Ralph Castain

One thing you might want to try: add this to your mpirun cmd line: --display-allocation This will tell you how many slots we think we've been given on your cluster. On Aug 21, 2014, at 12:50 PM, Ralph Castain wrote: > Starting early in the 1.7 series, we began to bind procs by default to cores

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Ralph Castain

Starting early in the 1.7 series, we began to bind procs by default to cores when -np <= 2, and to sockets if np > 2. Is it possible this is what you are seeing? On Aug 21, 2014, at 12:45 PM, Andrej Prsa wrote: > Dear devels, > > I have been trying out 1.8.2rcs recently and found a show-stop

[OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

2014-08-21 Thread Andrej Prsa

Dear devels, I have been trying out 1.8.2rcs recently and found a show-stopping problem on our cluster. Running any job with any number of processors larger than 32 will always employ only 32 cores per node (our nodes have 48 cores). We are seeing identical behavior with 1.8.2rc4, 1.8.2rc2, and 1.

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

Re: [OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

[OMPI devel] 1.8.2rc4 problem: only 32 out of 48 cores are working

16 matches

Site Navigation

Mail list logo

Footer information