It looks like the rankfile "*" syntax was broke between version r22761 and r23214. So, it looks like a regression to me. Ethan is looking into trying to narrow this down more.

--td

Ralph Castain wrote:
I would have to look at the code, but I suspect it doesn't handle "*". Could be 
upgraded to do so, but that would depend on the relevant developer to do so :-)


On Jun 9, 2010, at 10:16 AM, Grzegorz Maj wrote:

Thanks a lot, it works fine for me.
But going back to my problems - is it some bug in open-mpi or I should
use "slot=*" option in some other way?

2010/6/9 Ralph Castain <r...@open-mpi.org>:
I would recommend using the sequential mapper instead:

mpirun -mca rmaps seq

You can then just list your hosts in your hostfile, and we will put the ranks 
sequentially on those hosts. So you get something like this

host01  <= rank0
host01  <= rank1
host02  <= rank2
host03  <= rank3
host01  <= rank4

Ralph

On Jun 9, 2010, at 4:39 AM, Grzegorz Maj wrote:

In my previous mail I said that slot=0-3 would be a solution.
Unfortunately it gives me exactly the same segfault as in case with
*:*

2010/6/9 Grzegorz Maj <ma...@wp.pl>:
Hi,
I'd like mpirun to run tasks with specific ranks on specific hosts,
but I don't want to provide any particular sockets/slots/cores.
The following example uses just one host, but generally I'll use more.
In my hostfile I just have:

root@host01 slots=4

I was playing with my rankfile to achieve what I've mentioned, but I
always get some problems.

1) With rankfile like:
rank 0=host01 slot=*
rank 1=host01 slot=*
rank 2=host01 slot=*
rank 3=host01 slot=*

I get:

--------------------------------------------------------------------------
We were unable to successfully process/set the requested processor
affinity settings:

Specified slot list: *
Error: Error

This could mean that a non-existent processor was specified, or
that the specification had improper syntax.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun was unable to start the specified application as it encountered an error:

Error name: Error
Node: host01

when attempting to start process rank 0.
--------------------------------------------------------------------------
[host01:13715] Rank 0: PAFFINITY cannot get physical processor id for
logical processor 4


I think it tries to find processor #4, bug there are only 0-3

2) With rankfile like:
rank 0=host01 slot=*:*
rank 1=host01 slot=*:*
rank 2=host01 slot=*:*
rank 3=host01 slot=*:*

Everything looks well, i.e. my programs are spread across 4 processors.
But when running MPI program as follows:

MPI::Init(argc, argv);
fprintf(stderr, "after init %d\n", MPI::Is_initialized());
nprocs_mpi = MPI::COMM_WORLD.Get_size();
fprintf(stderr, "won't get here\n");

I get:

after init 1
[host01:14348] *** Process received signal ***
[host01:14348] Signal: Segmentation fault (11)
[host01:14348] Signal code: Address not mapped (1)
[host01:14348] Failing at address: 0x8
[host01:14348] [ 0] [0xffffe410]
[host01:14348] [ 1] p(_ZNK3MPI4Comm8Get_sizeEv+0x19) [0x8051299]
[host01:14348] [ 2] p(main+0x86) [0x804ee4e]
[host01:14348] [ 3] /lib/libc.so.6(__libc_start_main+0xe5) [0x4180b5c5]
[host01:14348] [ 4] p(__gxx_personality_v0+0x125) [0x804ecc1]
[host01:14348] *** End of error message ***

I'm using OPEN MPI v. 1.4.2 (downloaded yesterday).
In my rankfile I really want to write something like slot=*. I know
slot=0-3 would be a solution, but when generating rankfile I may not
be sure how many processors are there available on a particular host.

Any help would be appreciated.

Regards,
Grzegorz Maj

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>

Reply via email to