Hmmm....I -think- this will work, but I cannot guarantee it:

1. launch one process (can just be a spinner) using mpirun that includes the 
following option:

mpirun -report-uri file

where file is some filename that mpirun can create and insert its contact info 
into it. This can be a relative or absolute path. This process must remain 
alive throughout your application - doesn't matter what it does. It's purpose 
is solely to keep mpirun alive.

2. set OMPI_MCA_dpm_orte_server=FILE:file in your environment, where "file" is 
the filename given above. This will tell your processes how to find mpirun, 
which is acting as a meeting place to handle the connect/accept operations

Now run your processes, and have them connect/accept to each other.

The reason I cannot guarantee this will work is that these processes will all 
have the same rank && name since they all start as singletons. Hence, 
connect/accept is likely to fail.

But it -might- work, so you might want to give it a try.

On Apr 23, 2010, at 8:10 AM, Grzegorz Maj wrote:

> To be more precise: by 'server process' I mean some process that I
> could run once on my system and it could help in creating those
> groups.
> My typical scenario is:
> 1. run N separate processes, each without mpirun
> 2. connect them into MPI group
> 3. do some job
> 4. exit all N processes
> 5. goto 1
> 
> 2010/4/23 Grzegorz Maj <ma...@wp.pl>:
>> Thank you Ralph for your explanation.
>> And, apart from that descriptors' issue, is there any other way to
>> solve my problem, i.e. to run separately a number of processes,
>> without mpirun and then to collect them into an MPI intracomm group?
>> If I for example would need to run some 'server process' (even using
>> mpirun) for this task, that's OK. Any ideas?
>> 
>> Thanks,
>> Grzegorz Maj
>> 
>> 
>> 2010/4/18 Ralph Castain <r...@open-mpi.org>:
>>> Okay, but here is the problem. If you don't use mpirun, and are not 
>>> operating in an environment we support for "direct" launch (i.e., starting 
>>> processes outside of mpirun), then every one of those processes thinks it 
>>> is a singleton - yes?
>>> 
>>> What you may not realize is that each singleton immediately fork/exec's an 
>>> orted daemon that is configured to behave just like mpirun. This is 
>>> required in order to support MPI-2 operations such as MPI_Comm_spawn, 
>>> MPI_Comm_connect/accept, etc.
>>> 
>>> So if you launch 64 processes that think they are singletons, then you have 
>>> 64 copies of orted running as well. This eats up a lot of file descriptors, 
>>> which is probably why you are hitting this 65 process limit - your system 
>>> is probably running out of file descriptors. You might check you system 
>>> limits and see if you can get them revised upward.
>>> 
>>> 
>>> On Apr 17, 2010, at 4:24 PM, Grzegorz Maj wrote:
>>> 
>>>> Yes, I know. The problem is that I need to use some special way for
>>>> running my processes provided by the environment in which I'm working
>>>> and unfortunately I can't use mpirun.
>>>> 
>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>:
>>>>> Guess I don't understand why you can't use mpirun - all it does is start 
>>>>> things, provide a means to forward io, etc. It mainly sits there quietly 
>>>>> without using any cpu unless required to support the job.
>>>>> 
>>>>> Sounds like it would solve your problem. Otherwise, I know of no way to 
>>>>> get all these processes into comm_world.
>>>>> 
>>>>> 
>>>>> On Apr 17, 2010, at 2:27 PM, Grzegorz Maj wrote:
>>>>> 
>>>>>> Hi,
>>>>>> I'd like to dynamically create a group of processes communicating via
>>>>>> MPI. Those processes need to be run without mpirun and create
>>>>>> intracommunicator after the startup. Any ideas how to do this
>>>>>> efficiently?
>>>>>> I came up with a solution in which the processes are connecting one by
>>>>>> one using MPI_Comm_connect, but unfortunately all the processes that
>>>>>> are already in the group need to call MPI_Comm_accept. This means that
>>>>>> when the n-th process wants to connect I need to collect all the n-1
>>>>>> processes on the MPI_Comm_accept call. After I run about 40 processes
>>>>>> every subsequent call takes more and more time, which I'd like to
>>>>>> avoid.
>>>>>> Another problem in this solution is that when I try to connect 66-th
>>>>>> process the root of the existing group segfaults on MPI_Comm_accept.
>>>>>> Maybe it's my bug, but it's weird as everything works fine for at most
>>>>>> 65 processes. Is there any limitation I don't know about?
>>>>>> My last question is about MPI_COMM_WORLD. When I run my processes
>>>>>> without mpirun their MPI_COMM_WORLD is the same as MPI_COMM_SELF. Is
>>>>>> there any way to change MPI_COMM_WORLD and set it to the
>>>>>> intracommunicator that I've created?
>>>>>> 
>>>>>> Thanks,
>>>>>> Grzegorz Maj
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> 
>>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to