Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Grzegorz Maj Tue, 27 Jul 2010 11:30:25 -0400

So now I have a new question.
When I run my server and a lot of clients on the same machine,
everything looks fine.


But when I try to run the clients on several machines the most
frequent scenario is:
* server is stared on machine A
* X (= 1, 4, 10, ..) clients are started on machine B and they connect
successfully
* the first client starting on machine C connects successfully to the
server, but the whole grid hangs on MPI_Comm_merge (all the processes
from intercommunicator get there).

As I said it's the most frequent scenario. Sometimes I can connect the
clients from several machines. Sometimes it hangs (always on
MPI_Comm_merge) when connecting the clients from machine B.
The interesting thing is, that if before MPI_Comm_merge I send a dummy
message on the intercommunicator from process rank 0 in one group to
process rank 0 in the other one, it will not hang on MPI_Comm_merge.

I've tried both versions with and without the first patch (ompi-server
as orted) but it doesn't change the behavior.

I've attached gdb to my server, this is bt:
#0  0xffffe410 in __kernel_vsyscall ()
#1  0x00637afc in sched_yield () from /lib/libc.so.6
#2  0xf7e8ce31 in opal_progress () at ../../opal/runtime/opal_progress.c:220
#3  0xf7f60ad4 in opal_condition_wait (c=0xf7fd7dc0, m=0xf7fd7e00) at
../../opal/threads/condition.h:99
#4  0xf7f60dee in ompi_request_default_wait_all (count=2,
requests=0xff8d7754, statuses=0x0) at
../../ompi/request/req_wait.c:262
#5  0xf7d3e221 in mca_coll_inter_allgatherv_inter (sbuf=0xff8d7824,
scount=1, sdtype=0x8049200, rbuf=0xff8d77e0, rcounts=0x9783df8,
disps=0x9755520, rdtype=0x8049200, comm=0x978c2a8, module=0x9794b08)
    at ../../../../../ompi/mca/coll/inter/coll_inter_allgatherv.c:127
#6  0xf7f4c615 in ompi_comm_determine_first (intercomm=0x978c2a8,
high=0) at ../../ompi/communicator/comm.c:1199
#7  0xf7f8d1d9 in PMPI_Intercomm_merge (intercomm=0x978c2a8, high=0,
newcomm=0xff8d78c0) at pintercomm_merge.c:84
#8  0x0804893c in main (argc=Cannot access memory at address 0xf
) at server.c:50

And this is bt from one of the clients:
#0  0xffffe410 in __kernel_vsyscall ()
#1  0x0064993b in poll () from /lib/libc.so.6
#2  0xf7de027f in poll_dispatch (base=0x8643fb8, arg=0x86442d8,
tv=0xff82299c) at ../../../opal/event/poll.c:168
#3  0xf7dde4b2 in opal_event_base_loop (base=0x8643fb8, flags=2) at
../../../opal/event/event.c:807
#4  0xf7dde34f in opal_event_loop (flags=2) at ../../../opal/event/event.c:730
#5  0xf7dcfc77 in opal_progress () at ../../opal/runtime/opal_progress.c:189
#6  0xf7ea80b8 in opal_condition_wait (c=0xf7f25160, m=0xf7f251a0) at
../../opal/threads/condition.h:99
#7  0xf7ea7ff3 in ompi_request_wait_completion (req=0x8686680) at
../../ompi/request/request.h:375
#8  0xf7ea7ef1 in ompi_request_default_wait (req_ptr=0xff822ae8,
status=0x0) at ../../ompi/request/req_wait.c:37
#9  0xf7c663a6 in ompi_coll_tuned_bcast_intra_generic
(buffer=0xff822d20, original_count=1, datatype=0x868bd00, root=0,
comm=0x86aa7f8, module=0x868b700, count_by_segment=1, tree=0x868b3d8)
    at ../../../../../ompi/mca/coll/tuned/coll_tuned_bcast.c:237
#10 0xf7c668ea in ompi_coll_tuned_bcast_intra_binomial
(buffer=0xff822d20, count=1, datatype=0x868bd00, root=0,
comm=0x86aa7f8, module=0x868b700, segsize=0)
    at ../../../../../ompi/mca/coll/tuned/coll_tuned_bcast.c:368
#11 0xf7c5af12 in ompi_coll_tuned_bcast_intra_dec_fixed
(buff=0xff822d20, count=1, datatype=0x868bd00, root=0, comm=0x86aa7f8,
module=0x868b700)
    at ../../../../../ompi/mca/coll/tuned/coll_tuned_decision_fixed.c:256
#12 0xf7c73269 in mca_coll_sync_bcast (buff=0xff822d20, count=1,
datatype=0x868bd00, root=0, comm=0x86aa7f8, module=0x86aaa28) at
../../../../../ompi/mca/coll/sync/coll_sync_bcast.c:44
#13 0xf7c80381 in mca_coll_inter_allgatherv_inter (sbuf=0xff822d64,
scount=0, sdtype=0x8049400, rbuf=0xff822d20, rcounts=0x868a188,
disps=0x868abb8, rdtype=0x8049400, comm=0x86aa300,
    module=0x86aae18) at
../../../../../ompi/mca/coll/inter/coll_inter_allgatherv.c:134
#14 0xf7e9398f in ompi_comm_determine_first (intercomm=0x86aa300,
high=0) at ../../ompi/communicator/comm.c:1199
#15 0xf7ed7833 in PMPI_Intercomm_merge (intercomm=0x86aa300, high=0,
newcomm=0xff8241d0) at pintercomm_merge.c:84
#16 0x08048afd in main (argc=943274038, argv=0x33393133) at client.c:47



What do you think may cause the problem?


2010/7/26 Ralph Castain <r...@open-mpi.org>:
> No problem at all - glad it works!
>
> On Jul 26, 2010, at 7:58 AM, Grzegorz Maj wrote:
>
>> Hi,
>> I'm very sorry, but the problem was on my side. My installation
>> process was not always taking the newest sources of openmpi. In this
>> case it hasn't installed the version with the latest patch. Now I
>> think everything works fine - I could run over 130 processes with no
>> problems.
>> I'm sorry again that I've wasted your time. And thank you for the patch.
>>
>> 2010/7/21 Ralph Castain <r...@open-mpi.org>:
>>> We're having some problem replicating this once my patches are applied. Can 
>>> you send us your configure cmd? Just the output from "head config.log" will 
>>> do for now.
>>>
>>> Thanks!
>>>
>>> On Jul 20, 2010, at 9:09 AM, Grzegorz Maj wrote:
>>>
>>>> My start script looks almost exactly the same as the one published by
>>>> Edgar, ie. the processes are starting one by one with no delay.
>>>>
>>>> 2010/7/20 Ralph Castain <r...@open-mpi.org>:
>>>>> Grzegorz: something occurred to me. When you start all these processes, 
>>>>> how are you staggering their wireup? Are they flooding us, or are you 
>>>>> time-shifting them a little?
>>>>>
>>>>>
>>>>> On Jul 19, 2010, at 10:32 AM, Edgar Gabriel wrote:
>>>>>
>>>>>> Hm, so I am not sure how to approach this. First of all, the test case
>>>>>> works for me. I used up to 80 clients, and for both optimized and
>>>>>> non-optimized compilation. I ran the tests with trunk (not with 1.4
>>>>>> series, but the communicator code is identical in both cases). Clearly,
>>>>>> the patch from Ralph is necessary to make it work.
>>>>>>
>>>>>> Additionally, I went through the communicator creation code for dynamic
>>>>>> communicators trying to find spots that could create problems. The only
>>>>>> place that I found the number 64 appear is the fortran-to-c mapping
>>>>>> arrays (e.g. for communicators), where the initial size of the table is
>>>>>> 64. I looked twice over the pointer-array code to see whether we could
>>>>>> have a problem their (since it is a key-piece of the cid allocation code
>>>>>> for communicators), but I am fairly confident that it is correct.
>>>>>>
>>>>>> Note, that we have other (non-dynamic tests), were comm_set is called
>>>>>> 100,000 times, and the code per se does not seem to have a problem due
>>>>>> to being called too often. So I am not sure what else to look at.
>>>>>>
>>>>>> Edgar
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 7/13/2010 8:42 PM, Ralph Castain wrote:
>>>>>>> As far as I can tell, it appears the problem is somewhere in our 
>>>>>>> communicator setup. The people knowledgeable on that area are going to 
>>>>>>> look into it later this week.
>>>>>>>
>>>>>>> I'm creating a ticket to track the problem and will copy you on it.
>>>>>>>
>>>>>>>
>>>>>>> On Jul 13, 2010, at 6:57 AM, Ralph Castain wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On Jul 13, 2010, at 3:36 AM, Grzegorz Maj wrote:
>>>>>>>>
>>>>>>>>> Bad news..
>>>>>>>>> I've tried the latest patch with and without the prior one, but it
>>>>>>>>> hasn't changed anything. I've also tried using the old code but with
>>>>>>>>> the OMPI_DPM_BASE_MAXJOBIDS constant changed to 80, but it also didn't
>>>>>>>>> help.
>>>>>>>>> While looking through the sources of openmpi-1.4.2 I couldn't find any
>>>>>>>>> call of the function ompi_dpm_base_mark_dyncomm.
>>>>>>>>
>>>>>>>> It isn't directly called - it shows in ompi_comm_set as 
>>>>>>>> ompi_dpm.mark_dyncomm. You were definitely overrunning that array, but 
>>>>>>>> I guess something else is also being hit. Have to look further...
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2010/7/12 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>> Just so you don't have to wait for 1.4.3 release, here is the patch 
>>>>>>>>>> (doesn't include the prior patch).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Jul 12, 2010, at 12:13 PM, Grzegorz Maj wrote:
>>>>>>>>>>
>>>>>>>>>>> 2010/7/12 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>> Dug around a bit and found the problem!!
>>>>>>>>>>>>
>>>>>>>>>>>> I have no idea who or why this was done, but somebody set a limit 
>>>>>>>>>>>> of 64 separate jobids in the dynamic init called by ompi_comm_set, 
>>>>>>>>>>>> which builds the intercommunicator. Unfortunately, they hard-wired 
>>>>>>>>>>>> the array size, but never check that size before adding to it.
>>>>>>>>>>>>
>>>>>>>>>>>> So after 64 calls to connect_accept, you are overwriting other 
>>>>>>>>>>>> areas of the code. As you found, hitting 66 causes it to segfault.
>>>>>>>>>>>>
>>>>>>>>>>>> I'll fix this on the developer's trunk (I'll also add that 
>>>>>>>>>>>> original patch to it). Rather than my searching this thread in 
>>>>>>>>>>>> detail, can you remind me what version you are using so I can 
>>>>>>>>>>>> patch it too?
>>>>>>>>>>>
>>>>>>>>>>> I'm using 1.4.2
>>>>>>>>>>> Thanks a lot and I'm looking forward for the patch.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your patience with this!
>>>>>>>>>>>> Ralph
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Jul 12, 2010, at 7:20 AM, Grzegorz Maj wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> 1024 is not the problem: changing it to 2048 hasn't change 
>>>>>>>>>>>>> anything.
>>>>>>>>>>>>> Following your advice I've run my process using gdb. 
>>>>>>>>>>>>> Unfortunately I
>>>>>>>>>>>>> didn't get anything more than:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>>>>>>>>> [Switching to Thread 0xf7e4c6c0 (LWP 20246)]
>>>>>>>>>>>>> 0xf7f39905 in ompi_comm_set () from 
>>>>>>>>>>>>> /home/gmaj/openmpi/lib/libmpi.so.0
>>>>>>>>>>>>>
>>>>>>>>>>>>> (gdb) bt
>>>>>>>>>>>>> #0  0xf7f39905 in ompi_comm_set () from 
>>>>>>>>>>>>> /home/gmaj/openmpi/lib/libmpi.so.0
>>>>>>>>>>>>> #1  0xf7e3ba95 in connect_accept () from
>>>>>>>>>>>>> /home/gmaj/openmpi/lib/openmpi/mca_dpm_orte.so
>>>>>>>>>>>>> #2  0xf7f62013 in PMPI_Comm_connect () from 
>>>>>>>>>>>>> /home/gmaj/openmpi/lib/libmpi.so.0
>>>>>>>>>>>>> #3  0x080489ed in main (argc=825832753, argv=0x34393638) at 
>>>>>>>>>>>>> client.c:43
>>>>>>>>>>>>>
>>>>>>>>>>>>> What's more: when I've added a breakpoint on ompi_comm_set in 66th
>>>>>>>>>>>>> process and stepped a couple of instructions, one of the other
>>>>>>>>>>>>> processes crashed (as usualy on ompi_comm_set) earlier than 66th 
>>>>>>>>>>>>> did.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Finally I decided to recompile openmpi using -g flag for gcc. In 
>>>>>>>>>>>>> this
>>>>>>>>>>>>> case the 66 processes issue has gone! I was running my 
>>>>>>>>>>>>> applications
>>>>>>>>>>>>> exactly the same way as previously (even without recompilation) 
>>>>>>>>>>>>> and
>>>>>>>>>>>>> I've run successfully over 130 processes.
>>>>>>>>>>>>> When switching back to the openmpi compilation without -g it 
>>>>>>>>>>>>> again segfaults.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any ideas? I'm really confused.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2010/7/7 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>>>> I would guess the #files limit of 1024. However, if it behaves 
>>>>>>>>>>>>>> the same way when spread across multiple machines, I would 
>>>>>>>>>>>>>> suspect it is somewhere in your program itself. Given that the 
>>>>>>>>>>>>>> segfault is in your process, can you use gdb to look at the core 
>>>>>>>>>>>>>> file and see where and why it fails?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Jul 7, 2010, at 10:17 AM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2010/7/7 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Jul 6, 2010, at 8:48 AM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Ralph,
>>>>>>>>>>>>>>>>> sorry for the late response, but I couldn't find free time to 
>>>>>>>>>>>>>>>>> play
>>>>>>>>>>>>>>>>> with this. Finally I've applied the patch you prepared. I've 
>>>>>>>>>>>>>>>>> launched
>>>>>>>>>>>>>>>>> my processes in the way you've described and I think it's 
>>>>>>>>>>>>>>>>> working as
>>>>>>>>>>>>>>>>> you expected. None of my processes runs the orted daemon and 
>>>>>>>>>>>>>>>>> they can
>>>>>>>>>>>>>>>>> perform MPI operations. Unfortunately I'm still hitting the 65
>>>>>>>>>>>>>>>>> processes issue :(
>>>>>>>>>>>>>>>>> Maybe I'm doing something wrong.
>>>>>>>>>>>>>>>>> I attach my source code. If anybody could have a look on 
>>>>>>>>>>>>>>>>> this, I would
>>>>>>>>>>>>>>>>> be grateful.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> When I run that code with clients_count <= 65 everything 
>>>>>>>>>>>>>>>>> works fine:
>>>>>>>>>>>>>>>>> all the processes create a common grid, exchange some 
>>>>>>>>>>>>>>>>> information and
>>>>>>>>>>>>>>>>> disconnect.
>>>>>>>>>>>>>>>>> When I set clients_count > 65 the 66th process crashes on
>>>>>>>>>>>>>>>>> MPI_Comm_connect (segmentation fault).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I didn't have time to check the code, but my guess is that you 
>>>>>>>>>>>>>>>> are still hitting some kind of file descriptor or other limit. 
>>>>>>>>>>>>>>>> Check to see what your limits are - usually "ulimit" will tell 
>>>>>>>>>>>>>>>> you.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My limitations are:
>>>>>>>>>>>>>>> time(seconds)        unlimited
>>>>>>>>>>>>>>> file(blocks)         unlimited
>>>>>>>>>>>>>>> data(kb)             unlimited
>>>>>>>>>>>>>>> stack(kb)            10240
>>>>>>>>>>>>>>> coredump(blocks)     0
>>>>>>>>>>>>>>> memory(kb)           unlimited
>>>>>>>>>>>>>>> locked memory(kb)    64
>>>>>>>>>>>>>>> process              200704
>>>>>>>>>>>>>>> nofiles              1024
>>>>>>>>>>>>>>> vmemory(kb)          unlimited
>>>>>>>>>>>>>>> locks                unlimited
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Which one do you think could be responsible for that?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I was trying to run all the 66 processes on one machine or 
>>>>>>>>>>>>>>> spread them
>>>>>>>>>>>>>>> across several machines and it always crashes the same way on 
>>>>>>>>>>>>>>> the 66th
>>>>>>>>>>>>>>> process.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Another thing I would like to know is if it's normal that any 
>>>>>>>>>>>>>>>>> of my
>>>>>>>>>>>>>>>>> processes when calling MPI_Comm_connect or MPI_Comm_accept 
>>>>>>>>>>>>>>>>> when the
>>>>>>>>>>>>>>>>> other side is not ready, is eating up a full CPU available.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes - the waiting process is polling in a tight loop waiting 
>>>>>>>>>>>>>>>> for the connection to be made.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Any help would be appreciated,
>>>>>>>>>>>>>>>>> Grzegorz Maj
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2010/4/24 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>>>>>>>> Actually, OMPI is distributed with a daemon that does pretty 
>>>>>>>>>>>>>>>>>> much what you
>>>>>>>>>>>>>>>>>> want. Checkout "man ompi-server". I originally wrote that 
>>>>>>>>>>>>>>>>>> code to support
>>>>>>>>>>>>>>>>>> cross-application MPI publish/subscribe operations, but we 
>>>>>>>>>>>>>>>>>> can utilize it
>>>>>>>>>>>>>>>>>> here too. Have to blame me for not making it more publicly 
>>>>>>>>>>>>>>>>>> known.
>>>>>>>>>>>>>>>>>> The attached patch upgrades ompi-server and modifies the 
>>>>>>>>>>>>>>>>>> singleton startup
>>>>>>>>>>>>>>>>>> to provide your desired support. This solution works in the 
>>>>>>>>>>>>>>>>>> following
>>>>>>>>>>>>>>>>>> manner:
>>>>>>>>>>>>>>>>>> 1. launch "ompi-server -report-uri <filename>". This starts 
>>>>>>>>>>>>>>>>>> a persistent
>>>>>>>>>>>>>>>>>> daemon called "ompi-server" that acts as a rendezvous point 
>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> independently started applications.  The problem with 
>>>>>>>>>>>>>>>>>> starting different
>>>>>>>>>>>>>>>>>> applications and wanting them to MPI connect/accept lies in 
>>>>>>>>>>>>>>>>>> the need to have
>>>>>>>>>>>>>>>>>> the applications find each other. If they can't discover 
>>>>>>>>>>>>>>>>>> contact info for
>>>>>>>>>>>>>>>>>> the other app, then they can't wire up their interconnects. 
>>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>> "ompi-server" tool provides that rendezvous point. I don't 
>>>>>>>>>>>>>>>>>> like that
>>>>>>>>>>>>>>>>>> comm_accept segfaulted - should have just error'd out.
>>>>>>>>>>>>>>>>>> 2. set OMPI_MCA_orte_server=file:<filename>" in the 
>>>>>>>>>>>>>>>>>> environment where you
>>>>>>>>>>>>>>>>>> will start your processes. This will allow your singleton 
>>>>>>>>>>>>>>>>>> processes to find
>>>>>>>>>>>>>>>>>> the ompi-server. I automatically also set the envar to 
>>>>>>>>>>>>>>>>>> connect the MPI
>>>>>>>>>>>>>>>>>> publish/subscribe system for you.
>>>>>>>>>>>>>>>>>> 3. run your processes. As they think they are singletons, 
>>>>>>>>>>>>>>>>>> they will detect
>>>>>>>>>>>>>>>>>> the presence of the above envar and automatically connect 
>>>>>>>>>>>>>>>>>> themselves to the
>>>>>>>>>>>>>>>>>> "ompi-server" daemon. This provides each process with the 
>>>>>>>>>>>>>>>>>> ability to perform
>>>>>>>>>>>>>>>>>> any MPI-2 operation.
>>>>>>>>>>>>>>>>>> I tested this on my machines and it worked, so hopefully it 
>>>>>>>>>>>>>>>>>> will meet your
>>>>>>>>>>>>>>>>>> needs. You only need to run one "ompi-server" period, so 
>>>>>>>>>>>>>>>>>> long as you locate
>>>>>>>>>>>>>>>>>> it where all of the processes can find the contact file and 
>>>>>>>>>>>>>>>>>> can open a TCP
>>>>>>>>>>>>>>>>>> socket to the daemon. There is a way to knit multiple 
>>>>>>>>>>>>>>>>>> ompi-servers into a
>>>>>>>>>>>>>>>>>> broader network (e.g., to connect processes that cannot 
>>>>>>>>>>>>>>>>>> directly access a
>>>>>>>>>>>>>>>>>> server due to network segmentation), but it's a tad tricky - 
>>>>>>>>>>>>>>>>>> let me know if
>>>>>>>>>>>>>>>>>> you require it and I'll try to help.
>>>>>>>>>>>>>>>>>> If you have trouble wiring them all into a single 
>>>>>>>>>>>>>>>>>> communicator, you might
>>>>>>>>>>>>>>>>>> ask separately about that and see if one of our MPI experts 
>>>>>>>>>>>>>>>>>> can provide
>>>>>>>>>>>>>>>>>> advice (I'm just the RTE grunt).
>>>>>>>>>>>>>>>>>> HTH - let me know how this works for you and I'll 
>>>>>>>>>>>>>>>>>> incorporate it into future
>>>>>>>>>>>>>>>>>> OMPI releases.
>>>>>>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Apr 24, 2010, at 1:49 AM, Krzysztof Zarzycki wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Ralph,
>>>>>>>>>>>>>>>>>> I'm Krzysztof and I'm working with Grzegorz Maj on this our 
>>>>>>>>>>>>>>>>>> small
>>>>>>>>>>>>>>>>>> project/experiment.
>>>>>>>>>>>>>>>>>> We definitely would like to give your patch a try. But could 
>>>>>>>>>>>>>>>>>> you please
>>>>>>>>>>>>>>>>>> explain your solution a little more?
>>>>>>>>>>>>>>>>>> You still would like to start one mpirun per mpi grid, and 
>>>>>>>>>>>>>>>>>> then have
>>>>>>>>>>>>>>>>>> processes started by us to join the MPI comm?
>>>>>>>>>>>>>>>>>> It is a good solution of course.
>>>>>>>>>>>>>>>>>> But it would be especially preferable to have one daemon 
>>>>>>>>>>>>>>>>>> running
>>>>>>>>>>>>>>>>>> persistently on our "entry" machine that can handle several 
>>>>>>>>>>>>>>>>>> mpi grid starts.
>>>>>>>>>>>>>>>>>> Can your patch help us this way too?
>>>>>>>>>>>>>>>>>> Thanks for your help!
>>>>>>>>>>>>>>>>>> Krzysztof
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 24 April 2010 03:51, Ralph Castain <r...@open-mpi.org> 
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In thinking about this, my proposed solution won't entirely 
>>>>>>>>>>>>>>>>>>> fix the
>>>>>>>>>>>>>>>>>>> problem - you'll still wind up with all those daemons. I 
>>>>>>>>>>>>>>>>>>> believe I can
>>>>>>>>>>>>>>>>>>> resolve that one as well, but it would require a patch.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Would you like me to send you something you could try? 
>>>>>>>>>>>>>>>>>>> Might take a couple
>>>>>>>>>>>>>>>>>>> of iterations to get it right...
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Apr 23, 2010, at 12:12 PM, Ralph Castain wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hmmm....I -think- this will work, but I cannot guarantee 
>>>>>>>>>>>>>>>>>>>> it:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 1. launch one process (can just be a spinner) using mpirun 
>>>>>>>>>>>>>>>>>>>> that includes
>>>>>>>>>>>>>>>>>>>> the following option:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> mpirun -report-uri file
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> where file is some filename that mpirun can create and 
>>>>>>>>>>>>>>>>>>>> insert its
>>>>>>>>>>>>>>>>>>>> contact info into it. This can be a relative or absolute 
>>>>>>>>>>>>>>>>>>>> path. This process
>>>>>>>>>>>>>>>>>>>> must remain alive throughout your application - doesn't 
>>>>>>>>>>>>>>>>>>>> matter what it does.
>>>>>>>>>>>>>>>>>>>> It's purpose is solely to keep mpirun alive.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2. set OMPI_MCA_dpm_orte_server=FILE:file in your 
>>>>>>>>>>>>>>>>>>>> environment, where
>>>>>>>>>>>>>>>>>>>> "file" is the filename given above. This will tell your 
>>>>>>>>>>>>>>>>>>>> processes how to
>>>>>>>>>>>>>>>>>>>> find mpirun, which is acting as a meeting place to handle 
>>>>>>>>>>>>>>>>>>>> the connect/accept
>>>>>>>>>>>>>>>>>>>> operations
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Now run your processes, and have them connect/accept to 
>>>>>>>>>>>>>>>>>>>> each other.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The reason I cannot guarantee this will work is that these 
>>>>>>>>>>>>>>>>>>>> processes
>>>>>>>>>>>>>>>>>>>> will all have the same rank && name since they all start 
>>>>>>>>>>>>>>>>>>>> as singletons.
>>>>>>>>>>>>>>>>>>>> Hence, connect/accept is likely to fail.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> But it -might- work, so you might want to give it a try.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Apr 23, 2010, at 8:10 AM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> To be more precise: by 'server process' I mean some 
>>>>>>>>>>>>>>>>>>>>> process that I
>>>>>>>>>>>>>>>>>>>>> could run once on my system and it could help in creating 
>>>>>>>>>>>>>>>>>>>>> those
>>>>>>>>>>>>>>>>>>>>> groups.
>>>>>>>>>>>>>>>>>>>>> My typical scenario is:
>>>>>>>>>>>>>>>>>>>>> 1. run N separate processes, each without mpirun
>>>>>>>>>>>>>>>>>>>>> 2. connect them into MPI group
>>>>>>>>>>>>>>>>>>>>> 3. do some job
>>>>>>>>>>>>>>>>>>>>> 4. exit all N processes
>>>>>>>>>>>>>>>>>>>>> 5. goto 1
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 2010/4/23 Grzegorz Maj <ma...@wp.pl>:
>>>>>>>>>>>>>>>>>>>>>> Thank you Ralph for your explanation.
>>>>>>>>>>>>>>>>>>>>>> And, apart from that descriptors' issue, is there any 
>>>>>>>>>>>>>>>>>>>>>> other way to
>>>>>>>>>>>>>>>>>>>>>> solve my problem, i.e. to run separately a number of 
>>>>>>>>>>>>>>>>>>>>>> processes,
>>>>>>>>>>>>>>>>>>>>>> without mpirun and then to collect them into an MPI 
>>>>>>>>>>>>>>>>>>>>>> intracomm group?
>>>>>>>>>>>>>>>>>>>>>> If I for example would need to run some 'server process' 
>>>>>>>>>>>>>>>>>>>>>> (even using
>>>>>>>>>>>>>>>>>>>>>> mpirun) for this task, that's OK. Any ideas?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>> Grzegorz Maj
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>>>>>>>>>>>>> Okay, but here is the problem. If you don't use mpirun, 
>>>>>>>>>>>>>>>>>>>>>>> and are not
>>>>>>>>>>>>>>>>>>>>>>> operating in an environment we support for "direct" 
>>>>>>>>>>>>>>>>>>>>>>> launch (i.e., starting
>>>>>>>>>>>>>>>>>>>>>>> processes outside of mpirun), then every one of those 
>>>>>>>>>>>>>>>>>>>>>>> processes thinks it is
>>>>>>>>>>>>>>>>>>>>>>> a singleton - yes?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> What you may not realize is that each singleton 
>>>>>>>>>>>>>>>>>>>>>>> immediately
>>>>>>>>>>>>>>>>>>>>>>> fork/exec's an orted daemon that is configured to 
>>>>>>>>>>>>>>>>>>>>>>> behave just like mpirun.
>>>>>>>>>>>>>>>>>>>>>>> This is required in order to support MPI-2 operations 
>>>>>>>>>>>>>>>>>>>>>>> such as
>>>>>>>>>>>>>>>>>>>>>>> MPI_Comm_spawn, MPI_Comm_connect/accept, etc.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> So if you launch 64 processes that think they are 
>>>>>>>>>>>>>>>>>>>>>>> singletons, then
>>>>>>>>>>>>>>>>>>>>>>> you have 64 copies of orted running as well. This eats 
>>>>>>>>>>>>>>>>>>>>>>> up a lot of file
>>>>>>>>>>>>>>>>>>>>>>> descriptors, which is probably why you are hitting this 
>>>>>>>>>>>>>>>>>>>>>>> 65 process limit -
>>>>>>>>>>>>>>>>>>>>>>> your system is probably running out of file 
>>>>>>>>>>>>>>>>>>>>>>> descriptors. You might check you
>>>>>>>>>>>>>>>>>>>>>>> system limits and see if you can get them revised 
>>>>>>>>>>>>>>>>>>>>>>> upward.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Apr 17, 2010, at 4:24 PM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Yes, I know. The problem is that I need to use some 
>>>>>>>>>>>>>>>>>>>>>>>> special way for
>>>>>>>>>>>>>>>>>>>>>>>> running my processes provided by the environment in 
>>>>>>>>>>>>>>>>>>>>>>>> which I'm
>>>>>>>>>>>>>>>>>>>>>>>> working
>>>>>>>>>>>>>>>>>>>>>>>> and unfortunately I can't use mpirun.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>>>>>>>>>>>>>>> Guess I don't understand why you can't use mpirun - 
>>>>>>>>>>>>>>>>>>>>>>>>> all it does is
>>>>>>>>>>>>>>>>>>>>>>>>> start things, provide a means to forward io, etc. It 
>>>>>>>>>>>>>>>>>>>>>>>>> mainly sits there
>>>>>>>>>>>>>>>>>>>>>>>>> quietly without using any cpu unless required to 
>>>>>>>>>>>>>>>>>>>>>>>>> support the job.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Sounds like it would solve your problem. Otherwise, I 
>>>>>>>>>>>>>>>>>>>>>>>>> know of no
>>>>>>>>>>>>>>>>>>>>>>>>> way to get all these processes into comm_world.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Apr 17, 2010, at 2:27 PM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to dynamically create a group of processes 
>>>>>>>>>>>>>>>>>>>>>>>>>> communicating
>>>>>>>>>>>>>>>>>>>>>>>>>> via
>>>>>>>>>>>>>>>>>>>>>>>>>> MPI. Those processes need to be run without mpirun 
>>>>>>>>>>>>>>>>>>>>>>>>>> and create
>>>>>>>>>>>>>>>>>>>>>>>>>> intracommunicator after the startup. Any ideas how 
>>>>>>>>>>>>>>>>>>>>>>>>>> to do this
>>>>>>>>>>>>>>>>>>>>>>>>>> efficiently?
>>>>>>>>>>>>>>>>>>>>>>>>>> I came up with a solution in which the processes are 
>>>>>>>>>>>>>>>>>>>>>>>>>> connecting
>>>>>>>>>>>>>>>>>>>>>>>>>> one by
>>>>>>>>>>>>>>>>>>>>>>>>>> one using MPI_Comm_connect, but unfortunately all 
>>>>>>>>>>>>>>>>>>>>>>>>>> the processes
>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>> are already in the group need to call 
>>>>>>>>>>>>>>>>>>>>>>>>>> MPI_Comm_accept. This means
>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>> when the n-th process wants to connect I need to 
>>>>>>>>>>>>>>>>>>>>>>>>>> collect all the
>>>>>>>>>>>>>>>>>>>>>>>>>> n-1
>>>>>>>>>>>>>>>>>>>>>>>>>> processes on the MPI_Comm_accept call. After I run 
>>>>>>>>>>>>>>>>>>>>>>>>>> about 40
>>>>>>>>>>>>>>>>>>>>>>>>>> processes
>>>>>>>>>>>>>>>>>>>>>>>>>> every subsequent call takes more and more time, 
>>>>>>>>>>>>>>>>>>>>>>>>>> which I'd like to
>>>>>>>>>>>>>>>>>>>>>>>>>> avoid.
>>>>>>>>>>>>>>>>>>>>>>>>>> Another problem in this solution is that when I try 
>>>>>>>>>>>>>>>>>>>>>>>>>> to connect
>>>>>>>>>>>>>>>>>>>>>>>>>> 66-th
>>>>>>>>>>>>>>>>>>>>>>>>>> process the root of the existing group segfaults on
>>>>>>>>>>>>>>>>>>>>>>>>>> MPI_Comm_accept.
>>>>>>>>>>>>>>>>>>>>>>>>>> Maybe it's my bug, but it's weird as everything 
>>>>>>>>>>>>>>>>>>>>>>>>>> works fine for at
>>>>>>>>>>>>>>>>>>>>>>>>>> most
>>>>>>>>>>>>>>>>>>>>>>>>>> 65 processes. Is there any limitation I don't know 
>>>>>>>>>>>>>>>>>>>>>>>>>> about?
>>>>>>>>>>>>>>>>>>>>>>>>>> My last question is about MPI_COMM_WORLD. When I run 
>>>>>>>>>>>>>>>>>>>>>>>>>> my processes
>>>>>>>>>>>>>>>>>>>>>>>>>> without mpirun their MPI_COMM_WORLD is the same as 
>>>>>>>>>>>>>>>>>>>>>>>>>> MPI_COMM_SELF.
>>>>>>>>>>>>>>>>>>>>>>>>>> Is
>>>>>>>>>>>>>>>>>>>>>>>>>> there any way to change MPI_COMM_WORLD and set it to 
>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>> intracommunicator that I've created?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>> Grzegorz Maj
>>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> <client.c><server.c>_______________________________________________
>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

Reply via email to