Hi Ralph,

I checked -cpus-per-proc in openmpi-1.7.4a1r29646.
It works well as I want to do, which can adjust nprocs
of each nodes dividing by number of threads.

I think my problem is solved so far using -cpus-per-proc,
thank you very mush.

Regarding oversbuscribed problem, I checked NPROCS was really 8
by outputing the number.

echo mpirun -machinefile pbs_hosts -np $NPROCS -report-bindings -bind-to
core Myprog
mpirun -machinefile pbs_hosts -np $NPROCS -report-bindings -bind-to core

mpirun -machinefile pbs_hosts -np 8 -report-bindings -bind-to core Myprog
All nodes which are allocated for this job are already filled.

By the way, how did you verify the problem.
It looks like for me that you run the job directly from cmd line.

[rhc@bend001 svn-trunk]$ mpirun -n 3 --bind-to core --cpus-per-proc 4
--report-bindings -hostfile hosts hostname

In my environment, such a direct run without Torque script also works fine.
Anyway, as I already told you, my problem itself was solved. So I think the
priority to check is very low.


> FWIW: I verified that this works fine under a slurm allocation of 2
nodes, each with 12 slots. I filled the node without getting an
"oversbuscribed" error message
> [rhc@bend001 svn-trunk]$ mpirun -n 3 --bind-to core --cpus-per-proc 4
--report-bindings -hostfile hosts hostname
> [bend001:24318] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0
[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]]:
> [bend001:24318] MCW rank 1 bound to socket 0[core 4[hwt 0-1]], socket 0
[core 5[hwt 0-1]], socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]]:
> [bend001:24318] MCW rank 2 bound to socket 1[core 8[hwt 0-1]], socket 1
[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]]:
> bend001
> bend001
> bend001
> where
> [rhc@bend001 svn-trunk]$ cat hosts
> bend001 slots=12
> The only way I get the "out of resources" error is if I ask for more
processes than I have slots - i.e., I give it the hosts file as shown, but
ask for 13 or more processes.
> BTW: note one important issue with cpus-per-proc, as shown above. Because
I specified 4 cpus/proc, and my sockets each have 6 cpus, one of my procs
wound up being split across the two sockets (2
> cores on each). That's about the worst situation you can have.
> So a word of caution: it is up to the user to ensure that the mapping is
"good". We just do what you asked us to do.
> On Nov 13, 2013, at 8:30 PM, Ralph Castain <r...@open-mpi.org> wrote:
> Guess I don't see why modifying the allocation is required - we have
mapping options that should support such things. If you specify the total
number of procs you want, and cpus-per-proc=4, it should
> do the same thing I would think. You'd get 2 procs on the 8 slot nodes, 8
on the 32 proc nodes, and up to 6 on the 64 slot nodes (since you specified
np=16). So I guess I don't understand the issue.
> Regardless, if NPROCS=8 (and you verified that by printing it out, not
just assuming wc -l got that value), then it shouldn't think it is
oversubscribed. I'll take a look under a slurm allocation as
> that is all I can access.
> On Nov 13, 2013, at 7:23 PM, tmish...@jcity.maeda.co.jp wrote:
> Our cluster consists of three types of nodes. They have 8, 32
> and 64 slots respectively. Since the performance of each core is
> almost same, mixed use of these nodes is possible.
> Furthremore, in this case, for hybrid application with openmpi+openmp,
> the modification of hostfile is necesarry as follows:
> #PBS -l nodes=1:ppn=32+4:ppn=8
> export OMP_NUM_THREADS=4
> modify $PBS_NODEFILE pbs_hosts # 64 lines are condensed to 16 lines
> mpirun -hostfile pbs_hosts -np 16 -cpus-per-proc 4 -x OMP_NUM_THREADS
> Myprog
> That's why I want to do that.
> Of course I know, If I quit mixed use, -npernode is better for this
> purpose.
> (The script I showed you first is just a simplified one to clarify the
> problem.)
> tmishima
> Why do it the hard way? I'll look at the FAQ because that definitely
> isn't a recommended thing to do - better to use -host to specify the
> subset, or just specify the desired mapping using all the
> various mappers we provide.
> On Nov 13, 2013, at 6:39 PM, tmish...@jcity.maeda.co.jp wrote:
> Sorry for cross-post.
> Nodefile is very simple which consists of 8 lines:
> node08
> node08
> node08
> node08
> node08
> node08
> node08
> node08
> Therefore, NPROCS=8
> My aim is to modify the allocation as you pointed out. According to
> Openmpi
> FAQ,
> proper subset of the hosts allocated to the Torque / PBS Pro job should
> be
> allowed.
> tmishima
> Please - can you answer my question on script2? What is the value of
> Why would you want to do it this way? Are you planning to modify the
> allocation?? That generally is a bad idea as it can confuse the system
> On Nov 13, 2013, at 5:55 PM, tmish...@jcity.maeda.co.jp wrote:
> Since what I really want is to run script2 correctly, please let us
> concentrate script2.
> I'm not an expert of the inside of openmpi. What I can do is just
> obsabation
> from the outside. I doubt these lines are strange, especially the
> last
> one.
> [node08.cluster:26952] mca:rmaps:rr: mapping job [56581,1]
> [node08.cluster:26952] [[56581,0],0] Starting with 1 nodes in list
> [node08.cluster:26952] [[56581,0],0] Filtering thru apps
> [node08.cluster:26952] [[56581,0],0] Retained 1 nodes in list
> [node08.cluster:26952] [[56581,0],0] Removing node node08 slots 0
> inuse
> 0
> These lines come from this part of orte_rmaps_base_get_target_nodes
> in rmaps_base_support_fns.c:
>     } else if (node->slots <= node->slots_inuse &&
>         /* remove the node as fully used */
> orte_rmaps_base_framework.framework_output,
>                              "%s Removing node %s slots %d inuse
> %d",
>                              ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
>                              node->name, node->slots, node->
> slots_inuse));
>         opal_list_remove_item(allocated_nodes, item);
>         OBJ_RELEASE(item);  /* "un-retain" it */
> I wonder why node->slots and node->slots_inuse is 0, which I can read
> from the above line "Removing node node08 slots 0 inuse 0".
> Or I'm not sure but
> "else if (node->slots <= node->slots_inuse &&" should be
> "else if (node->slots < node->slots_inuse &&" ?
> tmishima
> On Nov 13, 2013, at 4:43 PM, tmish...@jcity.maeda.co.jp wrote:
> Yes, the node08 has 8 slots but the process I run is also 8.
> #PBS -l nodes=node08:ppn=8
> Therefore, I think it should allow this allocation. Is that right?
> Correct
> My question is why scritp1 works and script2 does not. They are
> almost same.
> #PBS -l nodes=node08:ppn=8
> export OMP_NUM_THREADS=1
> cp $PBS_NODEFILE pbs_hosts
> NPROCS=`wc -l < pbs_hosts`
> mpirun -report-bindings -bind-to core Myprog
> mpirun -machinefile pbs_hosts -np ${NPROCS} -report-bindings
> -bind-to
> core
> This version is not only reading the PBS allocation, but also
> invoking
> the hostfile filter on top of it. Different code path. I'll take a
> look
> -
> it should still match up assuming NPROCS=8. Any
> possibility that it is a different number? I don't recall, but isn't
> there some extra lines in the nodefile - e.g., comments?
> Myprog
> tmishima
> I guess here's my confusion. If you are using only one node, and
> that
> node has 8 allocated slots, then we will not allow you to run more
> than
> 8
> processes on that node unless you specifically provide
> the --oversubscribe flag. This is because you are operating in a
> managed
> environment (in this case, under Torque), and so we treat the
> allocation as
> "mandatory" by default.
> I suspect that is the issue here, in which case the system is
> behaving
> as
> it should.
> Is the above accurate?
> On Nov 13, 2013, at 4:11 PM, Ralph Castain <r...@open-mpi.org>
> wrote:
> It has nothing to do with LAMA as you aren't using that mapper.
> How many nodes are in this allocation?
> On Nov 13, 2013, at 4:06 PM, tmish...@jcity.maeda.co.jp wrote:
> Hi Ralph, this is an additional information.
> Here is the main part of output by adding "-mca
> rmaps_base_verbose
> 50".
> [node08.cluster:26952] [[56581,0],0] plm:base:setup_vm
> [node08.cluster:26952] [[56581,0],0] plm:base:setup_vm creating
> map
> [node08.cluster:26952] [[56581,0],0] plm:base:setup_vm only HNP
> in
> allocation
> [node08.cluster:26952] mca:rmaps: mapping job [56581,1]
> [node08.cluster:26952] mca:rmaps: creating new map for job
> [56581,1]
> [node08.cluster:26952] mca:rmaps:ppr: job [56581,1] not using
> ppr
> mapper
> [node08.cluster:26952] [[56581,0],0] rmaps:seq mapping job
> [56581,1]
> [node08.cluster:26952] mca:rmaps:seq: job [56581,1] not using
> seq
> mapper
> [node08.cluster:26952] mca:rmaps:resilient: cannot perform
> initial
> map
> of
> job [56581,1] - no fault groups
> [node08.cluster:26952] mca:rmaps:mindist: job [56581,1] not
> using
> mindist
> mapper
> [node08.cluster:26952] mca:rmaps:rr: mapping job [56581,1]
> [node08.cluster:26952] [[56581,0],0] Starting with 1 nodes in
> list
> [node08.cluster:26952] [[56581,0],0] Filtering thru apps
> [node08.cluster:26952] [[56581,0],0] Retained 1 nodes in list
> [node08.cluster:26952] [[56581,0],0] Removing node node08 slots
> 0
> inuse 0
> From this result, I guess it's related to oversubscribe.
> So I added "-oversubscribe" and rerun, then it worked well as
> show
> below:
> [node08.cluster:27019] [[56774,0],0] Starting with 1 nodes in
> list
> [node08.cluster:27019] [[56774,0],0] Filtering thru apps
> [node08.cluster:27019] [[56774,0],0] Retained 1 nodes in list
> [node08.cluster:27019] AVAILABLE NODES FOR MAPPING:
> [node08.cluster:27019]     node: node08 daemon: 0
> [node08.cluster:27019] [[56774,0],0] Starting bookmark at node
> node08
> [node08.cluster:27019] [[56774,0],0] Starting at node node08
> [node08.cluster:27019] mca:rmaps:rr: mapping by slot for job
> [56774,1]
> slots 1 num_procs 8
> [node08.cluster:27019] mca:rmaps:rr:slot working node node08
> [node08.cluster:27019] mca:rmaps:rr:slot node node08 is full -
> skipping
> [node08.cluster:27019] mca:rmaps:rr:slot job [56774,1] is
> oversubscribed -
> performing second pass
> [node08.cluster:27019] mca:rmaps:rr:slot working node node08
> [node08.cluster:27019] mca:rmaps:rr:slot adding up to 8 procs to
> node
> node08
> [node08.cluster:27019] mca:rmaps:base: computing vpids by slot
> for
> job
> [56774,1]
> [node08.cluster:27019] mca:rmaps:base: assigning rank 0 to node
> node08
> [node08.cluster:27019] mca:rmaps:base: assigning rank 1 to node
> node08
> [node08.cluster:27019] mca:rmaps:base: assigning rank 2 to node
> node08
> [node08.cluster:27019] mca:rmaps:base: assigning rank 3 to node
> node08
> [node08.cluster:27019] mca:rmaps:base: assigning rank 4 to node
> node08
> [node08.cluster:27019] mca:rmaps:base: assigning rank 5 to node
> node08
> [node08.cluster:27019] mca:rmaps:base: assigning rank 6 to node
> node08
> [node08.cluster:27019] mca:rmaps:base: assigning rank 7 to node
> node08
> I think something is wrong with treatment of oversubscription,
> which
> might
> be
> related to "#3893: LAMA mapper has problems"
> tmishima
> Hmmm...looks like we aren't getting your allocation. Can you
> rerun
> and
> add -mca ras_base_verbose 50?
> On Nov 12, 2013, at 11:30 PM, tmish...@jcity.maeda.co.jp wrote:
> Hi Ralph,
> Here is the output of "-mca plm_base_verbose 5".
> [node08.cluster:23573] mca:base:select:(  plm) Querying
> component
> [rsh]
> [node08.cluster:23573] [[INVALID],INVALID] plm:rsh_lookup on
> agent /usr/bin/rsh path NULL
> [node08.cluster:23573] mca:base:select:(  plm) Query of
> component
> [rsh]
> set
> priority to 10
> [node08.cluster:23573] mca:base:select:(  plm) Querying
> component
> [slurm]
> [node08.cluster:23573] mca:base:select:(  plm) Skipping
> component
> [slurm].
> Query failed to return a module
> [node08.cluster:23573] mca:base:select:(  plm) Querying
> component
> [tm]
> [node08.cluster:23573] mca:base:select:(  plm) Query of
> component
> [tm]
> set
> priority to 75
> [node08.cluster:23573] mca:base:select:(  plm) Selected
> component
> [tm]
> [node08.cluster:23573] plm:base:set_hnp_name: initial bias
> 23573
> nodename
> hash 85176670
> [node08.cluster:23573] plm:base:set_hnp_name: final jobfam
> 59480
> [node08.cluster:23573] [[59480,0],0] plm:base:receive start
> comm
> [node08.cluster:23573] [[59480,0],0] plm:base:setup_job
> [node08.cluster:23573] [[59480,0],0] plm:base:setup_vm
> [node08.cluster:23573] [[59480,0],0] plm:base:setup_vm
> creating
> map
> [node08.cluster:23573] [[59480,0],0] plm:base:setup_vm only
> in
> allocation
> All nodes which are allocated for this job are already filled.
> Here, openmpi's configuration is as follows:
> ./configure \
> --prefix=/home/mishima/opt/mpi/openmpi-1.7.4a1-pgi13.10 \
> --with-tm \
> --with-verbs \
> --disable-ipv6 \
> --disable-vt \
> --enable-debug \
> CC=pgcc CFLAGS="-tp k8-64e" \
> CXX=pgCC CXXFLAGS="-tp k8-64e" \
> F77=pgfortran FFLAGS="-tp k8-64e" \
> FC=pgfortran FCFLAGS="-tp k8-64e"
> Hi Ralph,
> Okey, I can help you. Please give me some time to report the
> output.
> Tetsuya Mishima
> I can try, but I have no way of testing Torque any more - so
> all
> I
> can
> do
> is a code review. If you can build --enable-debug and add
> -mca
> plm_base_verbose 5 to your cmd line, I'd appreciate seeing
> the
> output.
> On Nov 12, 2013, at 9:58 PM, tmish...@jcity.maeda.co.jp
> wrote:
> Hi Ralph,
> Thank you for your quick response.
> I'd like to report one more regressive issue about Torque
> support
> of
> openmpi-1.7.4a1r29646, which might be related to "#3893:
> mapper
> has problems" I reported a few days ago.
> The script below does not work with openmpi-1.7.4a1r29646,
> although it worked with openmpi-1.7.3 as I told you before.
> #!/bin/sh
> #PBS -l nodes=node08:ppn=8
> export OMP_NUM_THREADS=1
> cp $PBS_NODEFILE pbs_hosts
> NPROCS=`wc -l < pbs_hosts`
> mpirun -machinefile pbs_hosts -np ${NPROCS}
> -report-bindings
> -bind-to
> core
> Myprog
> If I drop "-machinefile pbs_hosts -np ${NPROCS} ", then it
> works
> fine.
> Since this happens without lama request, I guess it's not
> the
> problem
> in lama itself. Anyway, please look into this issue as
> well.
> Regards,
> Tetsuya Mishima
> Done - thanks!
> On Nov 12, 2013, at 7:35 PM, tmish...@jcity.maeda.co.jp
> wrote:
> Dear openmpi developers,
> I got a segmentation fault in traial use of
> openmpi-1.7.4a1r29646
> built
> by
> PGI13.10 as shown below:
> [mishima@manage testbed-openmpi-1.7.3]$ mpirun -np 4
> -cpus-per-proc
> 2
> -report-bindings mPre
> [manage.cluster:23082] MCW rank 2 bound to socket 0[core
> 4
> [hwt
> 0]],
> socket
> 0[core 5[hwt 0]]: [././././B/B][./././././.]
> [manage.cluster:23082] MCW rank 3 bound to socket 1[core
> 6
> [hwt
> 0]],
> socket
> 1[core 7[hwt 0]]: [./././././.][B/B/./././.]
> [manage.cluster:23082] MCW rank 0 bound to socket 0[core
> 0
> [hwt
> 0]],
> socket
> 0[core 1[hwt 0]]: [B/B/./././.][./././././.]
> [manage.cluster:23082] MCW rank 1 bound to socket 0[core
> 2
> [hwt
> 0]],
> socket
> 0[core 3[hwt 0]]: [././B/B/./.][./././././.]
> [manage:23082] *** Process received signal ***
> [manage:23082] Signal: Segmentation fault (11)
> [manage:23082] Signal code: Address not mapped (1)
> [manage:23082] Failing at address: 0x34
> [manage:23082] *** End of error message ***
> Segmentation fault (core dumped)
> [mishima@manage testbed-openmpi-1.7.3]$ gdb mpirun
> core.23082
> GNU gdb (GDB) CentOS (7.0.1-42.el5.centos.1)
> Copyright (C) 2009 Free Software Foundation, Inc.
> ...
> Core was generated by `mpirun -np 4 -cpus-per-proc 2
> -report-bindings
> mPre'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x00002b5f861c9c4f in recv_connect>>>
> (mod=0x5f861ca20b00007f,
> sd=32767,
> hdr=0x1ca20b00007fff25) at ./oob_tcp.c:631
> 631             peer = OBJ_NEW(mca_oob_tcp_peer_t);
> (gdb) where
> #0  0x00002b5f861c9c4f in recv_connect
> (mod=0x5f861ca20b00007f,
> sd=32767,
> hdr=0x1ca20b00007fff25) at ./oob_tcp.c:631
> #1  0x00002b5f861ca20b in recv_handler (sd=1778385023,
> flags=32767,
> cbdata=0x8eb06a00007fff25) at ./oob_tcp.c:760
> #2  0x00002b5f848eb06a in
> event_process_active_single_queue
> (base=0x5f848eb27000007f, activeq=0x848eb27000007fff)
> at ./event.c:1366
> #3  0x00002b5f848eb270 in event_process_active
> (base=0x5f848eb84900007f)
> at ./event.c:1435
> #4  0x00002b5f848eb849 in
> opal_libevent2021_event_base_loop
> (base=0x4077a000007f, flags=32767) at ./event.c:1645
> #5  0x00000000004077a0 in orterun (argc=7,
> argv=0x7fff25bbd4a8)
> at ./orterun.c:1030
> #6  0x00000000004067fb in main (argc=7,
> argv=0x7fff25bbd4a8)
> at ./main.c:13
> (gdb) quit
> The line 627 in orte/mca/oob/tcp/oob_tcp.c is apparently
> unnecessary,
> which
> causes the segfault.
> 624      /* lookup the corresponding process
> */>>>>>>>>>>>>> 625      peer = mca_oob_tcp_peer_lookup(mod, &hdr->
> origin);
> 626      if (NULL == peer) {
> 627          ui64 = (uint64_t*)(&peer->name);
> 628          opal_output_verbose(OOB_TCP_DEBUG_CONNECT,
> orte_oob_base_framework.framework_output,
> 629                              "%s
> mca_oob_tcp_recv_connect:
> connection from new peer",
> 630                              ORTE_NAME_PRINT
> 631          peer = OBJ_NEW(mca_oob_tcp_peer_t);
> 632          peer->mod = mod;
> 633          peer->name = hdr->origin;
> 634          peer->state = MCA_OOB_TCP_ACCEPTING;
> 635          ui64 = (uint64_t*)(&peer->name);
> 636          if (OPAL_SUCCESS !=
> opal_hash_table_set_value_uint64
> (&mod->
> peers, (*ui64), peer)) {
> 637              OBJ_RELEASE(peer);
> 638              return;
> 639          }
> Please fix this mistake in the next release.
> Regards,
> Tetsuya Mishima
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list>> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org

> users mailing list
> users@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to