Bad news.. I've tried the latest patch with and without the prior one, but it hasn't changed anything. I've also tried using the old code but with the OMPI_DPM_BASE_MAXJOBIDS constant changed to 80, but it also didn't help. While looking through the sources of openmpi-1.4.2 I couldn't find any call of the function ompi_dpm_base_mark_dyncomm.
2010/7/12 Ralph Castain <r...@open-mpi.org>: > Just so you don't have to wait for 1.4.3 release, here is the patch (doesn't > include the prior patch). > > > > > On Jul 12, 2010, at 12:13 PM, Grzegorz Maj wrote: > >> 2010/7/12 Ralph Castain <r...@open-mpi.org>: >>> Dug around a bit and found the problem!! >>> >>> I have no idea who or why this was done, but somebody set a limit of 64 >>> separate jobids in the dynamic init called by ompi_comm_set, which builds >>> the intercommunicator. Unfortunately, they hard-wired the array size, but >>> never check that size before adding to it. >>> >>> So after 64 calls to connect_accept, you are overwriting other areas of the >>> code. As you found, hitting 66 causes it to segfault. >>> >>> I'll fix this on the developer's trunk (I'll also add that original patch >>> to it). Rather than my searching this thread in detail, can you remind me >>> what version you are using so I can patch it too? >> >> I'm using 1.4.2 >> Thanks a lot and I'm looking forward for the patch. >> >>> >>> Thanks for your patience with this! >>> Ralph >>> >>> >>> On Jul 12, 2010, at 7:20 AM, Grzegorz Maj wrote: >>> >>>> 1024 is not the problem: changing it to 2048 hasn't change anything. >>>> Following your advice I've run my process using gdb. Unfortunately I >>>> didn't get anything more than: >>>> >>>> Program received signal SIGSEGV, Segmentation fault. >>>> [Switching to Thread 0xf7e4c6c0 (LWP 20246)] >>>> 0xf7f39905 in ompi_comm_set () from /home/gmaj/openmpi/lib/libmpi.so.0 >>>> >>>> (gdb) bt >>>> #0 0xf7f39905 in ompi_comm_set () from /home/gmaj/openmpi/lib/libmpi.so.0 >>>> #1 0xf7e3ba95 in connect_accept () from >>>> /home/gmaj/openmpi/lib/openmpi/mca_dpm_orte.so >>>> #2 0xf7f62013 in PMPI_Comm_connect () from >>>> /home/gmaj/openmpi/lib/libmpi.so.0 >>>> #3 0x080489ed in main (argc=825832753, argv=0x34393638) at client.c:43 >>>> >>>> What's more: when I've added a breakpoint on ompi_comm_set in 66th >>>> process and stepped a couple of instructions, one of the other >>>> processes crashed (as usualy on ompi_comm_set) earlier than 66th did. >>>> >>>> Finally I decided to recompile openmpi using -g flag for gcc. In this >>>> case the 66 processes issue has gone! I was running my applications >>>> exactly the same way as previously (even without recompilation) and >>>> I've run successfully over 130 processes. >>>> When switching back to the openmpi compilation without -g it again >>>> segfaults. >>>> >>>> Any ideas? I'm really confused. >>>> >>>> >>>> >>>> 2010/7/7 Ralph Castain <r...@open-mpi.org>: >>>>> I would guess the #files limit of 1024. However, if it behaves the same >>>>> way when spread across multiple machines, I would suspect it is somewhere >>>>> in your program itself. Given that the segfault is in your process, can >>>>> you use gdb to look at the core file and see where and why it fails? >>>>> >>>>> On Jul 7, 2010, at 10:17 AM, Grzegorz Maj wrote: >>>>> >>>>>> 2010/7/7 Ralph Castain <r...@open-mpi.org>: >>>>>>> >>>>>>> On Jul 6, 2010, at 8:48 AM, Grzegorz Maj wrote: >>>>>>> >>>>>>>> Hi Ralph, >>>>>>>> sorry for the late response, but I couldn't find free time to play >>>>>>>> with this. Finally I've applied the patch you prepared. I've launched >>>>>>>> my processes in the way you've described and I think it's working as >>>>>>>> you expected. None of my processes runs the orted daemon and they can >>>>>>>> perform MPI operations. Unfortunately I'm still hitting the 65 >>>>>>>> processes issue :( >>>>>>>> Maybe I'm doing something wrong. >>>>>>>> I attach my source code. If anybody could have a look on this, I would >>>>>>>> be grateful. >>>>>>>> >>>>>>>> When I run that code with clients_count <= 65 everything works fine: >>>>>>>> all the processes create a common grid, exchange some information and >>>>>>>> disconnect. >>>>>>>> When I set clients_count > 65 the 66th process crashes on >>>>>>>> MPI_Comm_connect (segmentation fault). >>>>>>> >>>>>>> I didn't have time to check the code, but my guess is that you are >>>>>>> still hitting some kind of file descriptor or other limit. Check to see >>>>>>> what your limits are - usually "ulimit" will tell you. >>>>>> >>>>>> My limitations are: >>>>>> time(seconds) unlimited >>>>>> file(blocks) unlimited >>>>>> data(kb) unlimited >>>>>> stack(kb) 10240 >>>>>> coredump(blocks) 0 >>>>>> memory(kb) unlimited >>>>>> locked memory(kb) 64 >>>>>> process 200704 >>>>>> nofiles 1024 >>>>>> vmemory(kb) unlimited >>>>>> locks unlimited >>>>>> >>>>>> Which one do you think could be responsible for that? >>>>>> >>>>>> I was trying to run all the 66 processes on one machine or spread them >>>>>> across several machines and it always crashes the same way on the 66th >>>>>> process. >>>>>> >>>>>>> >>>>>>>> >>>>>>>> Another thing I would like to know is if it's normal that any of my >>>>>>>> processes when calling MPI_Comm_connect or MPI_Comm_accept when the >>>>>>>> other side is not ready, is eating up a full CPU available. >>>>>>> >>>>>>> Yes - the waiting process is polling in a tight loop waiting for the >>>>>>> connection to be made. >>>>>>> >>>>>>>> >>>>>>>> Any help would be appreciated, >>>>>>>> Grzegorz Maj >>>>>>>> >>>>>>>> >>>>>>>> 2010/4/24 Ralph Castain <r...@open-mpi.org>: >>>>>>>>> Actually, OMPI is distributed with a daemon that does pretty much >>>>>>>>> what you >>>>>>>>> want. Checkout "man ompi-server". I originally wrote that code to >>>>>>>>> support >>>>>>>>> cross-application MPI publish/subscribe operations, but we can >>>>>>>>> utilize it >>>>>>>>> here too. Have to blame me for not making it more publicly known. >>>>>>>>> The attached patch upgrades ompi-server and modifies the singleton >>>>>>>>> startup >>>>>>>>> to provide your desired support. This solution works in the following >>>>>>>>> manner: >>>>>>>>> 1. launch "ompi-server -report-uri <filename>". This starts a >>>>>>>>> persistent >>>>>>>>> daemon called "ompi-server" that acts as a rendezvous point for >>>>>>>>> independently started applications. The problem with starting >>>>>>>>> different >>>>>>>>> applications and wanting them to MPI connect/accept lies in the need >>>>>>>>> to have >>>>>>>>> the applications find each other. If they can't discover contact info >>>>>>>>> for >>>>>>>>> the other app, then they can't wire up their interconnects. The >>>>>>>>> "ompi-server" tool provides that rendezvous point. I don't like that >>>>>>>>> comm_accept segfaulted - should have just error'd out. >>>>>>>>> 2. set OMPI_MCA_orte_server=file:<filename>" in the environment where >>>>>>>>> you >>>>>>>>> will start your processes. This will allow your singleton processes >>>>>>>>> to find >>>>>>>>> the ompi-server. I automatically also set the envar to connect the MPI >>>>>>>>> publish/subscribe system for you. >>>>>>>>> 3. run your processes. As they think they are singletons, they will >>>>>>>>> detect >>>>>>>>> the presence of the above envar and automatically connect themselves >>>>>>>>> to the >>>>>>>>> "ompi-server" daemon. This provides each process with the ability to >>>>>>>>> perform >>>>>>>>> any MPI-2 operation. >>>>>>>>> I tested this on my machines and it worked, so hopefully it will meet >>>>>>>>> your >>>>>>>>> needs. You only need to run one "ompi-server" period, so long as you >>>>>>>>> locate >>>>>>>>> it where all of the processes can find the contact file and can open >>>>>>>>> a TCP >>>>>>>>> socket to the daemon. There is a way to knit multiple ompi-servers >>>>>>>>> into a >>>>>>>>> broader network (e.g., to connect processes that cannot directly >>>>>>>>> access a >>>>>>>>> server due to network segmentation), but it's a tad tricky - let me >>>>>>>>> know if >>>>>>>>> you require it and I'll try to help. >>>>>>>>> If you have trouble wiring them all into a single communicator, you >>>>>>>>> might >>>>>>>>> ask separately about that and see if one of our MPI experts can >>>>>>>>> provide >>>>>>>>> advice (I'm just the RTE grunt). >>>>>>>>> HTH - let me know how this works for you and I'll incorporate it into >>>>>>>>> future >>>>>>>>> OMPI releases. >>>>>>>>> Ralph >>>>>>>>> >>>>>>>>> >>>>>>>>> On Apr 24, 2010, at 1:49 AM, Krzysztof Zarzycki wrote: >>>>>>>>> >>>>>>>>> Hi Ralph, >>>>>>>>> I'm Krzysztof and I'm working with Grzegorz Maj on this our small >>>>>>>>> project/experiment. >>>>>>>>> We definitely would like to give your patch a try. But could you >>>>>>>>> please >>>>>>>>> explain your solution a little more? >>>>>>>>> You still would like to start one mpirun per mpi grid, and then have >>>>>>>>> processes started by us to join the MPI comm? >>>>>>>>> It is a good solution of course. >>>>>>>>> But it would be especially preferable to have one daemon running >>>>>>>>> persistently on our "entry" machine that can handle several mpi grid >>>>>>>>> starts. >>>>>>>>> Can your patch help us this way too? >>>>>>>>> Thanks for your help! >>>>>>>>> Krzysztof >>>>>>>>> >>>>>>>>> On 24 April 2010 03:51, Ralph Castain <r...@open-mpi.org> wrote: >>>>>>>>>> >>>>>>>>>> In thinking about this, my proposed solution won't entirely fix the >>>>>>>>>> problem - you'll still wind up with all those daemons. I believe I >>>>>>>>>> can >>>>>>>>>> resolve that one as well, but it would require a patch. >>>>>>>>>> >>>>>>>>>> Would you like me to send you something you could try? Might take a >>>>>>>>>> couple >>>>>>>>>> of iterations to get it right... >>>>>>>>>> >>>>>>>>>> On Apr 23, 2010, at 12:12 PM, Ralph Castain wrote: >>>>>>>>>> >>>>>>>>>>> Hmmm....I -think- this will work, but I cannot guarantee it: >>>>>>>>>>> >>>>>>>>>>> 1. launch one process (can just be a spinner) using mpirun that >>>>>>>>>>> includes >>>>>>>>>>> the following option: >>>>>>>>>>> >>>>>>>>>>> mpirun -report-uri file >>>>>>>>>>> >>>>>>>>>>> where file is some filename that mpirun can create and insert its >>>>>>>>>>> contact info into it. This can be a relative or absolute path. This >>>>>>>>>>> process >>>>>>>>>>> must remain alive throughout your application - doesn't matter what >>>>>>>>>>> it does. >>>>>>>>>>> It's purpose is solely to keep mpirun alive. >>>>>>>>>>> >>>>>>>>>>> 2. set OMPI_MCA_dpm_orte_server=FILE:file in your environment, where >>>>>>>>>>> "file" is the filename given above. This will tell your processes >>>>>>>>>>> how to >>>>>>>>>>> find mpirun, which is acting as a meeting place to handle the >>>>>>>>>>> connect/accept >>>>>>>>>>> operations >>>>>>>>>>> >>>>>>>>>>> Now run your processes, and have them connect/accept to each other. >>>>>>>>>>> >>>>>>>>>>> The reason I cannot guarantee this will work is that these processes >>>>>>>>>>> will all have the same rank && name since they all start as >>>>>>>>>>> singletons. >>>>>>>>>>> Hence, connect/accept is likely to fail. >>>>>>>>>>> >>>>>>>>>>> But it -might- work, so you might want to give it a try. >>>>>>>>>>> >>>>>>>>>>> On Apr 23, 2010, at 8:10 AM, Grzegorz Maj wrote: >>>>>>>>>>> >>>>>>>>>>>> To be more precise: by 'server process' I mean some process that I >>>>>>>>>>>> could run once on my system and it could help in creating those >>>>>>>>>>>> groups. >>>>>>>>>>>> My typical scenario is: >>>>>>>>>>>> 1. run N separate processes, each without mpirun >>>>>>>>>>>> 2. connect them into MPI group >>>>>>>>>>>> 3. do some job >>>>>>>>>>>> 4. exit all N processes >>>>>>>>>>>> 5. goto 1 >>>>>>>>>>>> >>>>>>>>>>>> 2010/4/23 Grzegorz Maj <ma...@wp.pl>: >>>>>>>>>>>>> Thank you Ralph for your explanation. >>>>>>>>>>>>> And, apart from that descriptors' issue, is there any other way to >>>>>>>>>>>>> solve my problem, i.e. to run separately a number of processes, >>>>>>>>>>>>> without mpirun and then to collect them into an MPI intracomm >>>>>>>>>>>>> group? >>>>>>>>>>>>> If I for example would need to run some 'server process' (even >>>>>>>>>>>>> using >>>>>>>>>>>>> mpirun) for this task, that's OK. Any ideas? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Grzegorz Maj >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>> Okay, but here is the problem. If you don't use mpirun, and are >>>>>>>>>>>>>> not >>>>>>>>>>>>>> operating in an environment we support for "direct" launch >>>>>>>>>>>>>> (i.e., starting >>>>>>>>>>>>>> processes outside of mpirun), then every one of those processes >>>>>>>>>>>>>> thinks it is >>>>>>>>>>>>>> a singleton - yes? >>>>>>>>>>>>>> >>>>>>>>>>>>>> What you may not realize is that each singleton immediately >>>>>>>>>>>>>> fork/exec's an orted daemon that is configured to behave just >>>>>>>>>>>>>> like mpirun. >>>>>>>>>>>>>> This is required in order to support MPI-2 operations such as >>>>>>>>>>>>>> MPI_Comm_spawn, MPI_Comm_connect/accept, etc. >>>>>>>>>>>>>> >>>>>>>>>>>>>> So if you launch 64 processes that think they are singletons, >>>>>>>>>>>>>> then >>>>>>>>>>>>>> you have 64 copies of orted running as well. This eats up a lot >>>>>>>>>>>>>> of file >>>>>>>>>>>>>> descriptors, which is probably why you are hitting this 65 >>>>>>>>>>>>>> process limit - >>>>>>>>>>>>>> your system is probably running out of file descriptors. You >>>>>>>>>>>>>> might check you >>>>>>>>>>>>>> system limits and see if you can get them revised upward. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Apr 17, 2010, at 4:24 PM, Grzegorz Maj wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yes, I know. The problem is that I need to use some special way >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>> running my processes provided by the environment in which I'm >>>>>>>>>>>>>>> working >>>>>>>>>>>>>>> and unfortunately I can't use mpirun. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>: >>>>>>>>>>>>>>>> Guess I don't understand why you can't use mpirun - all it >>>>>>>>>>>>>>>> does is >>>>>>>>>>>>>>>> start things, provide a means to forward io, etc. It mainly >>>>>>>>>>>>>>>> sits there >>>>>>>>>>>>>>>> quietly without using any cpu unless required to support the >>>>>>>>>>>>>>>> job. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Sounds like it would solve your problem. Otherwise, I know of >>>>>>>>>>>>>>>> no >>>>>>>>>>>>>>>> way to get all these processes into comm_world. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Apr 17, 2010, at 2:27 PM, Grzegorz Maj wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> I'd like to dynamically create a group of processes >>>>>>>>>>>>>>>>> communicating >>>>>>>>>>>>>>>>> via >>>>>>>>>>>>>>>>> MPI. Those processes need to be run without mpirun and create >>>>>>>>>>>>>>>>> intracommunicator after the startup. Any ideas how to do this >>>>>>>>>>>>>>>>> efficiently? >>>>>>>>>>>>>>>>> I came up with a solution in which the processes are >>>>>>>>>>>>>>>>> connecting >>>>>>>>>>>>>>>>> one by >>>>>>>>>>>>>>>>> one using MPI_Comm_connect, but unfortunately all the >>>>>>>>>>>>>>>>> processes >>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>> are already in the group need to call MPI_Comm_accept. This >>>>>>>>>>>>>>>>> means >>>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>> when the n-th process wants to connect I need to collect all >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> n-1 >>>>>>>>>>>>>>>>> processes on the MPI_Comm_accept call. After I run about 40 >>>>>>>>>>>>>>>>> processes >>>>>>>>>>>>>>>>> every subsequent call takes more and more time, which I'd >>>>>>>>>>>>>>>>> like to >>>>>>>>>>>>>>>>> avoid. >>>>>>>>>>>>>>>>> Another problem in this solution is that when I try to connect >>>>>>>>>>>>>>>>> 66-th >>>>>>>>>>>>>>>>> process the root of the existing group segfaults on >>>>>>>>>>>>>>>>> MPI_Comm_accept. >>>>>>>>>>>>>>>>> Maybe it's my bug, but it's weird as everything works fine >>>>>>>>>>>>>>>>> for at >>>>>>>>>>>>>>>>> most >>>>>>>>>>>>>>>>> 65 processes. Is there any limitation I don't know about? >>>>>>>>>>>>>>>>> My last question is about MPI_COMM_WORLD. When I run my >>>>>>>>>>>>>>>>> processes >>>>>>>>>>>>>>>>> without mpirun their MPI_COMM_WORLD is the same as >>>>>>>>>>>>>>>>> MPI_COMM_SELF. >>>>>>>>>>>>>>>>> Is >>>>>>>>>>>>>>>>> there any way to change MPI_COMM_WORLD and set it to the >>>>>>>>>>>>>>>>> intracommunicator that I've created? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Grzegorz Maj >>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> users mailing list >>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>> <client.c><server.c>_______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >