2010/7/12 Ralph Castain <r...@open-mpi.org>: > Dug around a bit and found the problem!! > > I have no idea who or why this was done, but somebody set a limit of 64 > separate jobids in the dynamic init called by ompi_comm_set, which builds the > intercommunicator. Unfortunately, they hard-wired the array size, but never > check that size before adding to it. > > So after 64 calls to connect_accept, you are overwriting other areas of the > code. As you found, hitting 66 causes it to segfault. > > I'll fix this on the developer's trunk (I'll also add that original patch to > it). Rather than my searching this thread in detail, can you remind me what > version you are using so I can patch it too?
I'm using 1.4.2 Thanks a lot and I'm looking forward for the patch. > > Thanks for your patience with this! > Ralph > > > On Jul 12, 2010, at 7:20 AM, Grzegorz Maj wrote: > >> 1024 is not the problem: changing it to 2048 hasn't change anything. >> Following your advice I've run my process using gdb. Unfortunately I >> didn't get anything more than: >> >> Program received signal SIGSEGV, Segmentation fault. >> [Switching to Thread 0xf7e4c6c0 (LWP 20246)] >> 0xf7f39905 in ompi_comm_set () from /home/gmaj/openmpi/lib/libmpi.so.0 >> >> (gdb) bt >> #0 0xf7f39905 in ompi_comm_set () from /home/gmaj/openmpi/lib/libmpi.so.0 >> #1 0xf7e3ba95 in connect_accept () from >> /home/gmaj/openmpi/lib/openmpi/mca_dpm_orte.so >> #2 0xf7f62013 in PMPI_Comm_connect () from >> /home/gmaj/openmpi/lib/libmpi.so.0 >> #3 0x080489ed in main (argc=825832753, argv=0x34393638) at client.c:43 >> >> What's more: when I've added a breakpoint on ompi_comm_set in 66th >> process and stepped a couple of instructions, one of the other >> processes crashed (as usualy on ompi_comm_set) earlier than 66th did. >> >> Finally I decided to recompile openmpi using -g flag for gcc. In this >> case the 66 processes issue has gone! I was running my applications >> exactly the same way as previously (even without recompilation) and >> I've run successfully over 130 processes. >> When switching back to the openmpi compilation without -g it again segfaults. >> >> Any ideas? I'm really confused. >> >> >> >> 2010/7/7 Ralph Castain <r...@open-mpi.org>: >>> I would guess the #files limit of 1024. However, if it behaves the same way >>> when spread across multiple machines, I would suspect it is somewhere in >>> your program itself. Given that the segfault is in your process, can you >>> use gdb to look at the core file and see where and why it fails? >>> >>> On Jul 7, 2010, at 10:17 AM, Grzegorz Maj wrote: >>> >>>> 2010/7/7 Ralph Castain <r...@open-mpi.org>: >>>>> >>>>> On Jul 6, 2010, at 8:48 AM, Grzegorz Maj wrote: >>>>> >>>>>> Hi Ralph, >>>>>> sorry for the late response, but I couldn't find free time to play >>>>>> with this. Finally I've applied the patch you prepared. I've launched >>>>>> my processes in the way you've described and I think it's working as >>>>>> you expected. None of my processes runs the orted daemon and they can >>>>>> perform MPI operations. Unfortunately I'm still hitting the 65 >>>>>> processes issue :( >>>>>> Maybe I'm doing something wrong. >>>>>> I attach my source code. If anybody could have a look on this, I would >>>>>> be grateful. >>>>>> >>>>>> When I run that code with clients_count <= 65 everything works fine: >>>>>> all the processes create a common grid, exchange some information and >>>>>> disconnect. >>>>>> When I set clients_count > 65 the 66th process crashes on >>>>>> MPI_Comm_connect (segmentation fault). >>>>> >>>>> I didn't have time to check the code, but my guess is that you are still >>>>> hitting some kind of file descriptor or other limit. Check to see what >>>>> your limits are - usually "ulimit" will tell you. >>>> >>>> My limitations are: >>>> time(seconds) unlimited >>>> file(blocks) unlimited >>>> data(kb) unlimited >>>> stack(kb) 10240 >>>> coredump(blocks) 0 >>>> memory(kb) unlimited >>>> locked memory(kb) 64 >>>> process 200704 >>>> nofiles 1024 >>>> vmemory(kb) unlimited >>>> locks unlimited >>>> >>>> Which one do you think could be responsible for that? >>>> >>>> I was trying to run all the 66 processes on one machine or spread them >>>> across several machines and it always crashes the same way on the 66th >>>> process. >>>> >>>>> >>>>>> >>>>>> Another thing I would like to know is if it's normal that any of my >>>>>> processes when calling MPI_Comm_connect or MPI_Comm_accept when the >>>>>> other side is not ready, is eating up a full CPU available. >>>>> >>>>> Yes - the waiting process is polling in a tight loop waiting for the >>>>> connection to be made. >>>>> >>>>>> >>>>>> Any help would be appreciated, >>>>>> Grzegorz Maj >>>>>> >>>>>> >>>>>> 2010/4/24 Ralph Castain <r...@open-mpi.org>: >>>>>>> Actually, OMPI is distributed with a daemon that does pretty much what >>>>>>> you >>>>>>> want. Checkout "man ompi-server". I originally wrote that code to >>>>>>> support >>>>>>> cross-application MPI publish/subscribe operations, but we can utilize >>>>>>> it >>>>>>> here too. Have to blame me for not making it more publicly known. >>>>>>> The attached patch upgrades ompi-server and modifies the singleton >>>>>>> startup >>>>>>> to provide your desired support. This solution works in the following >>>>>>> manner: >>>>>>> 1. launch "ompi-server -report-uri <filename>". This starts a persistent >>>>>>> daemon called "ompi-server" that acts as a rendezvous point for >>>>>>> independently started applications. The problem with starting different >>>>>>> applications and wanting them to MPI connect/accept lies in the need to >>>>>>> have >>>>>>> the applications find each other. If they can't discover contact info >>>>>>> for >>>>>>> the other app, then they can't wire up their interconnects. The >>>>>>> "ompi-server" tool provides that rendezvous point. I don't like that >>>>>>> comm_accept segfaulted - should have just error'd out. >>>>>>> 2. set OMPI_MCA_orte_server=file:<filename>" in the environment where >>>>>>> you >>>>>>> will start your processes. This will allow your singleton processes to >>>>>>> find >>>>>>> the ompi-server. I automatically also set the envar to connect the MPI >>>>>>> publish/subscribe system for you. >>>>>>> 3. run your processes. As they think they are singletons, they will >>>>>>> detect >>>>>>> the presence of the above envar and automatically connect themselves to >>>>>>> the >>>>>>> "ompi-server" daemon. This provides each process with the ability to >>>>>>> perform >>>>>>> any MPI-2 operation. >>>>>>> I tested this on my machines and it worked, so hopefully it will meet >>>>>>> your >>>>>>> needs. You only need to run one "ompi-server" period, so long as you >>>>>>> locate >>>>>>> it where all of the processes can find the contact file and can open a >>>>>>> TCP >>>>>>> socket to the daemon. There is a way to knit multiple ompi-servers into >>>>>>> a >>>>>>> broader network (e.g., to connect processes that cannot directly access >>>>>>> a >>>>>>> server due to network segmentation), but it's a tad tricky - let me >>>>>>> know if >>>>>>> you require it and I'll try to help. >>>>>>> If you have trouble wiring them all into a single communicator, you >>>>>>> might >>>>>>> ask separately about that and see if one of our MPI experts can provide >>>>>>> advice (I'm just the RTE grunt). >>>>>>> HTH - let me know how this works for you and I'll incorporate it into >>>>>>> future >>>>>>> OMPI releases. >>>>>>> Ralph >>>>>>> >>>>>>> >>>>>>> On Apr 24, 2010, at 1:49 AM, Krzysztof Zarzycki wrote: >>>>>>> >>>>>>> Hi Ralph, >>>>>>> I'm Krzysztof and I'm working with Grzegorz Maj on this our small >>>>>>> project/experiment. >>>>>>> We definitely would like to give your patch a try. But could you please >>>>>>> explain your solution a little more? >>>>>>> You still would like to start one mpirun per mpi grid, and then have >>>>>>> processes started by us to join the MPI comm? >>>>>>> It is a good solution of course. >>>>>>> But it would be especially preferable to have one daemon running >>>>>>> persistently on our "entry" machine that can handle several mpi grid >>>>>>> starts. >>>>>>> Can your patch help us this way too? >>>>>>> Thanks for your help! >>>>>>> Krzysztof >>>>>>> >>>>>>> On 24 April 2010 03:51, Ralph Castain <r...@open-mpi.org> wrote: >>>>>>>> >>>>>>>> In thinking about this, my proposed solution won't entirely fix the >>>>>>>> problem - you'll still wind up with all those daemons. I believe I can >>>>>>>> resolve that one as well, but it would require a patch. >>>>>>>> >>>>>>>> Would you like me to send you something you could try? Might take a >>>>>>>> couple >>>>>>>> of iterations to get it right... >>>>>>>> >>>>>>>> On Apr 23, 2010, at 12:12 PM, Ralph Castain wrote: >>>>>>>> >>>>>>>>> Hmmm....I -think- this will work, but I cannot guarantee it: >>>>>>>>> >>>>>>>>> 1. launch one process (can just be a spinner) using mpirun that >>>>>>>>> includes >>>>>>>>> the following option: >>>>>>>>> >>>>>>>>> mpirun -report-uri file >>>>>>>>> >>>>>>>>> where file is some filename that mpirun can create and insert its >>>>>>>>> contact info into it. This can be a relative or absolute path. This >>>>>>>>> process >>>>>>>>> must remain alive throughout your application - doesn't matter what >>>>>>>>> it does. >>>>>>>>> It's purpose is solely to keep mpirun alive. >>>>>>>>> >>>>>>>>> 2. set OMPI_MCA_dpm_orte_server=FILE:file in your environment, where >>>>>>>>> "file" is the filename given above. This will tell your processes how >>>>>>>>> to >>>>>>>>> find mpirun, which is acting as a meeting place to handle the >>>>>>>>> connect/accept >>>>>>>>> operations >>>>>>>>> >>>>>>>>> Now run your processes, and have them connect/accept to each other. >>>>>>>>> >>>>>>>>> The reason I cannot guarantee this will work is that these processes >>>>>>>>> will all have the same rank && name since they all start as >>>>>>>>> singletons. >>>>>>>>> Hence, connect/accept is likely to fail. >>>>>>>>> >>>>>>>>> But it -might- work, so you might want to give it a try. >>>>>>>>> >>>>>>>>> On Apr 23, 2010, at 8:10 AM, Grzegorz Maj wrote: >>>>>>>>> >>>>>>>>>> To be more precise: by 'server process' I mean some process that I >>>>>>>>>> could run once on my system and it could help in creating those >>>>>>>>>> groups. >>>>>>>>>> My typical scenario is: >>>>>>>>>> 1. run N separate processes, each without mpirun >>>>>>>>>> 2. connect them into MPI group >>>>>>>>>> 3. do some job >>>>>>>>>> 4. exit all N processes >>>>>>>>>> 5. goto 1 >>>>>>>>>> >>>>>>>>>> 2010/4/23 Grzegorz Maj <ma...@wp.pl>: >>>>>>>>>>> Thank you Ralph for your explanation. >>>>>>>>>>> And, apart from that descriptors' issue, is there any other way to >>>>>>>>>>> solve my problem, i.e. to run separately a number of processes, >>>>>>>>>>> without mpirun and then to collect them into an MPI intracomm group? >>>>>>>>>>> If I for example would need to run some 'server process' (even using >>>>>>>>>>> mpirun) for this task, that's OK. Any ideas? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Grzegorz Maj >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>> Okay, but here is the problem. If you don't use mpirun, and are not >>>>>>>>>>>> operating in an environment we support for "direct" launch (i.e., >>>>>>>>>>>> starting >>>>>>>>>>>> processes outside of mpirun), then every one of those processes >>>>>>>>>>>> thinks it is >>>>>>>>>>>> a singleton - yes? >>>>>>>>>>>> >>>>>>>>>>>> What you may not realize is that each singleton immediately >>>>>>>>>>>> fork/exec's an orted daemon that is configured to behave just like >>>>>>>>>>>> mpirun. >>>>>>>>>>>> This is required in order to support MPI-2 operations such as >>>>>>>>>>>> MPI_Comm_spawn, MPI_Comm_connect/accept, etc. >>>>>>>>>>>> >>>>>>>>>>>> So if you launch 64 processes that think they are singletons, then >>>>>>>>>>>> you have 64 copies of orted running as well. This eats up a lot of >>>>>>>>>>>> file >>>>>>>>>>>> descriptors, which is probably why you are hitting this 65 process >>>>>>>>>>>> limit - >>>>>>>>>>>> your system is probably running out of file descriptors. You might >>>>>>>>>>>> check you >>>>>>>>>>>> system limits and see if you can get them revised upward. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Apr 17, 2010, at 4:24 PM, Grzegorz Maj wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Yes, I know. The problem is that I need to use some special way >>>>>>>>>>>>> for >>>>>>>>>>>>> running my processes provided by the environment in which I'm >>>>>>>>>>>>> working >>>>>>>>>>>>> and unfortunately I can't use mpirun. >>>>>>>>>>>>> >>>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>> Guess I don't understand why you can't use mpirun - all it does >>>>>>>>>>>>>> is >>>>>>>>>>>>>> start things, provide a means to forward io, etc. It mainly sits >>>>>>>>>>>>>> there >>>>>>>>>>>>>> quietly without using any cpu unless required to support the job. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sounds like it would solve your problem. Otherwise, I know of no >>>>>>>>>>>>>> way to get all these processes into comm_world. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Apr 17, 2010, at 2:27 PM, Grzegorz Maj wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> I'd like to dynamically create a group of processes >>>>>>>>>>>>>>> communicating >>>>>>>>>>>>>>> via >>>>>>>>>>>>>>> MPI. Those processes need to be run without mpirun and create >>>>>>>>>>>>>>> intracommunicator after the startup. Any ideas how to do this >>>>>>>>>>>>>>> efficiently? >>>>>>>>>>>>>>> I came up with a solution in which the processes are connecting >>>>>>>>>>>>>>> one by >>>>>>>>>>>>>>> one using MPI_Comm_connect, but unfortunately all the processes >>>>>>>>>>>>>>> that >>>>>>>>>>>>>>> are already in the group need to call MPI_Comm_accept. This >>>>>>>>>>>>>>> means >>>>>>>>>>>>>>> that >>>>>>>>>>>>>>> when the n-th process wants to connect I need to collect all the >>>>>>>>>>>>>>> n-1 >>>>>>>>>>>>>>> processes on the MPI_Comm_accept call. After I run about 40 >>>>>>>>>>>>>>> processes >>>>>>>>>>>>>>> every subsequent call takes more and more time, which I'd like >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>> avoid. >>>>>>>>>>>>>>> Another problem in this solution is that when I try to connect >>>>>>>>>>>>>>> 66-th >>>>>>>>>>>>>>> process the root of the existing group segfaults on >>>>>>>>>>>>>>> MPI_Comm_accept. >>>>>>>>>>>>>>> Maybe it's my bug, but it's weird as everything works fine for >>>>>>>>>>>>>>> at >>>>>>>>>>>>>>> most >>>>>>>>>>>>>>> 65 processes. Is there any limitation I don't know about? >>>>>>>>>>>>>>> My last question is about MPI_COMM_WORLD. When I run my >>>>>>>>>>>>>>> processes >>>>>>>>>>>>>>> without mpirun their MPI_COMM_WORLD is the same as >>>>>>>>>>>>>>> MPI_COMM_SELF. >>>>>>>>>>>>>>> Is >>>>>>>>>>>>>>> there any way to change MPI_COMM_WORLD and set it to the >>>>>>>>>>>>>>> intracommunicator that I've created? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Grzegorz Maj >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> users mailing list >>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> users mailing list >>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>> <client.c><server.c>_______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > >