Sorry, I don't understand, how can I try the fortran from macports??. 2009/5/6 Luis Vitorio Cargnini <lvcargn...@gmail.com>
> This problem is occuring because the fortran wasn't compiled with the debug > symbols: > warning: Could not find object file > "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udiv_w_sdiv_s.o" - no > debug information available for > "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c". > > Is the same problem for who is using LLVM in Xcode, there is no debug > symbols to create a debug release, try create a release and see if it will > compile at all and try the fortran from macports it will works smoothly. > > > Le 09-05-05 à 17:33, Jeff Squyres a écrit : > > > I agree; that is a bummer. :-( >> >> Warner -- do you have any advice here, perchance? >> >> >> On May 4, 2009, at 7:26 PM, Vicente Puig wrote: >> >> But it doesn't work well. >>> >>> For example, I am trying to debug a program, "floyd" in this case, and >>> when I make a breakpoint: >>> >>> No line 26 in file "../../../gcc-4.2-20060805/libgfortran/fmain.c". >>> >>> I am getting disappointed and frustrated that I can not work well with >>> openmpi in my Mac. There should be a was to make it run in Xcode, uff... >>> >>> 2009/5/4 Jeff Squyres <jsquy...@cisco.com> >>> I get those as well. I believe that they are (annoying but) harmless -- >>> an artifact of how the freeware gcc/gofrtran that I use was built. >>> >>> >>> >>> On May 4, 2009, at 1:47 PM, Vicente Puig wrote: >>> >>> Maybe I had to open a new thread, but if you have any idea why I receive >>> it when I use gdb for debugging an openmpi program: >>> >>> warning: Could not find object file >>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_umoddi3_s.o" - no debug >>> information available for >>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c". >>> >>> >>> warning: Could not find object file >>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udiv_w_sdiv_s.o" - no >>> debug information available for >>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c". >>> >>> >>> warning: Could not find object file >>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/_udivmoddi4_s.o" - no >>> debug information available for >>> "../../../gcc-4.3-20071026/libgcc/../gcc/libgcc2.c". >>> >>> >>> warning: Could not find object file >>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2_s.o" - no debug >>> information available for >>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2.c". >>> >>> >>> warning: Could not find object file >>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-dw2-fde-darwin_s.o" >>> - no debug information available for >>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-dw2-fde-darwin.c". >>> >>> >>> warning: Could not find object file >>> "/Users/admin/build/i386-apple-darwin9.0.0/libgcc/unwind-c_s.o" - no debug >>> information available for >>> "../../../gcc-4.3-20071026/libgcc/../gcc/unwind-c.c". >>> ....... >>> >>> >>> >>> There is no 'admin' so I don't know why it happen. It works well with a C >>> program. >>> >>> Any idea??. >>> >>> Thanks. >>> >>> >>> Vincent >>> >>> >>> >>> >>> >>> 2009/5/4 Vicente Puig <vpui...@gmail.com> >>> I can run openmpi perfectly with command line, but I wanted a graphic >>> interface for debugging because I was having problems. >>> >>> Thanks anyway. >>> >>> Vincent >>> >>> 2009/5/4 Warner Yuen <wy...@apple.com> >>> >>> Admittedly, I don't use Xcode to build Open MPI either. >>> >>> You can just compile Open MPI from the command line and install >>> everything in /usr/local/. Make sure that gfortran is set in your path and >>> you should just be able to do a './configure --prefix=/usr/local' >>> >>> After the installation, just make sure that your path is set correctly >>> when you go to use the newly installed Open MPI. If you don't set your path, >>> it will always default to using the version of OpenMPI that ships with >>> Leopard. >>> >>> >>> Warner Yuen >>> Scientific Computing >>> Consulting Engineer >>> Apple, Inc. >>> email: wy...@apple.com >>> Tel: 408.718.2859 >>> >>> >>> >>> >>> On May 4, 2009, at 9:13 AM, users-requ...@open-mpi.org wrote: >>> >>> Send users mailing list submissions to >>> us...@open-mpi.org >>> >>> To subscribe or unsubscribe via the World Wide Web, visit >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> or, via email, send a message with subject or body 'help' to >>> users-requ...@open-mpi.org >>> >>> You can reach the person managing the list at >>> users-ow...@open-mpi.org >>> >>> When replying, please edit your Subject line so it is more specific >>> than "Re: Contents of users digest..." >>> >>> >>> Today's Topics: >>> >>> 1. Re: How do I compile OpenMPI in Xcode 3.1 (Vicente Puig) >>> >>> >>> ---------------------------------------------------------------------- >>> >>> Message: 1 >>> Date: Mon, 4 May 2009 18:13:45 +0200 >>> From: Vicente Puig <vpui...@gmail.com> >>> Subject: Re: [OMPI users] How do I compile OpenMPI in Xcode 3.1 >>> To: Open MPI Users <us...@open-mpi.org> >>> Message-ID: >>> <3e9a21680905040913u3f36d3c9rdcd3413bfdcd...@mail.gmail.com> >>> Content-Type: text/plain; charset="iso-8859-1" >>> >>> If I can not make it work with Xcode, which one could I use?, which one >>> do >>> you use to compile and debug OpenMPI?. >>> Thanks >>> >>> Vincent >>> >>> >>> 2009/5/4 Jeff Squyres <jsquy...@cisco.com> >>> >>> Open MPI comes pre-installed in Leopard; as Warner noted, since Leopard >>> doesn't ship with a Fortran compiler, the Open MPI that Apple ships has >>> non-functional mpif77 and mpif90 wrapper compilers. >>> >>> So the Open MPI that you installed manually will use your Fortran >>> compilers, and therefore will have functional mpif77 and mpif90 wrapper >>> compilers. Hence, you probably need to be sure to use the "right" >>> wrapper >>> compilers. It looks like you specified the full path specified to >>> ExecPath, >>> so I'm not sure why Xcode wouldn't work with that (like I mentioned, I >>> unfortunately don't use Xcode myself, so I don't know why that wouldn't >>> work). >>> >>> >>> >>> >>> On May 4, 2009, at 11:53 AM, Vicente wrote: >>> >>> Yes, I already have gfortran compiler on /usr/local/bin, the same path >>> as my mpif90 compiler. But I've seen when I use the mpif90 on /usr/bin >>> and on /Developer/usr/bin says it: >>> >>> "Unfortunately, this installation of Open MPI was not compiled with >>> Fortran 90 support. As such, the mpif90 compiler is non-functional." >>> >>> >>> That should be the problem, I will have to change the path to use the >>> gfortran I have installed. >>> How could I do it? (Sorry, I am beginner) >>> >>> Thanks. >>> >>> >>> El 04/05/2009, a las 17:38, Warner Yuen escribi?: >>> >>> Have you installed a Fortran compiler? Mac OS X's developer tools do >>> not come with a Fortran compiler, so you'll need to install one if >>> you haven't already done so. I routinely use the Intel IFORT >>> compilers with success. However, I hear many good things about the >>> gfortran compilers on Mac OS X, you can't beat the price of gfortran! >>> >>> >>> Warner Yuen >>> Scientific Computing >>> Consulting Engineer >>> Apple, Inc. >>> email: wy...@apple.com >>> Tel: 408.718.2859 >>> >>> >>> >>> >>> On May 4, 2009, at 7:28 AM, users-requ...@open-mpi.org wrote: >>> >>> Send users mailing list submissions to >>> us...@open-mpi.org >>> >>> To subscribe or unsubscribe via the World Wide Web, visit >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> or, via email, send a message with subject or body 'help' to >>> users-requ...@open-mpi.org >>> >>> You can reach the person managing the list at >>> users-ow...@open-mpi.org >>> >>> When replying, please edit your Subject line so it is more specific >>> than "Re: Contents of users digest..." >>> >>> >>> Today's Topics: >>> >>> 1. How do I compile OpenMPI in Xcode 3.1 (Vicente) >>> 2. Re: 1.3.1 -rf rankfile behaviour ?? (Ralph Castain) >>> >>> >>> ---------------------------------------------------------------------- >>> >>> Message: 1 >>> Date: Mon, 4 May 2009 16:12:44 +0200 >>> From: Vicente <vpui...@gmail.com> >>> Subject: [OMPI users] How do I compile OpenMPI in Xcode 3.1 >>> To: us...@open-mpi.org >>> Message-ID: <1c2c0085-940f-43bb-910f-975871ae2...@gmail.com> >>> Content-Type: text/plain; charset="windows-1252"; Format="flowed"; >>> DelSp="yes" >>> >>> Hi, I've seen the FAQ "How do I use Open MPI wrapper compilers in >>> Xcode", but it's only for MPICC. I am using MPIF90, so I did the >>> same, >>> but changing MPICC for MPIF90, and also the path, but it did not >>> work. >>> >>> Building target ?fortran? of project ?fortran? with configuration >>> ?Debug? >>> >>> >>> Checking Dependencies >>> Invalid value 'MPIF90' for GCC_VERSION >>> >>> >>> The file "MPIF90.cpcompspec" looks like this: >>> >>> 1 /** >>> 2 Xcode Coompiler Specification for MPIF90 >>> 3 >>> 4 */ >>> 5 >>> 6 { Type = Compiler; >>> 7 Identifier = com.apple.compilers.mpif90; >>> 8 BasedOn = com.apple.compilers.gcc.4_0; >>> 9 Name = "MPIF90"; >>> 10 Version = "Default"; >>> 11 Description = "MPI GNU C/C++ Compiler 4.0"; >>> 12 ExecPath = "/usr/local/bin/mpif90"; // This gets >>> converted to the g++ variant automatically >>> 13 PrecompStyle = pch; >>> 14 } >>> >>> and is located in "/Developer/Library/Xcode/Plug-ins" >>> >>> and when I do mpif90 -v on terminal it works well: >>> >>> Using built-in specs. >>> Target: i386-apple-darwin8.10.1 >>> Configured with: /tmp/gfortran-20090321/ibin/../gcc/configure -- >>> prefix=/usr/local/gfortran --enable-languages=c,fortran --with-gmp=/ >>> tmp/gfortran-20090321/gfortran_libs --enable-bootstrap >>> Thread model: posix >>> gcc version 4.4.0 20090321 (experimental) [trunk revision 144983] >>> (GCC) >>> >>> >>> Any idea?? >>> >>> Thanks. >>> >>> Vincent >>> -------------- next part -------------- >>> HTML attachment scrubbed and removed >>> >>> ------------------------------ >>> >>> Message: 2 >>> Date: Mon, 4 May 2009 08:28:26 -0600 >>> From: Ralph Castain <r...@open-mpi.org> >>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? >>> To: Open MPI Users <us...@open-mpi.org> >>> Message-ID: >>> <71d2d8cc0905040728h2002f4d7s4c49219eee29e...@mail.gmail.com> >>> Content-Type: text/plain; charset="iso-8859-1" >>> >>> Unfortunately, I didn't write any of that code - I was just fixing >>> the >>> mapper so it would properly map the procs. From what I can tell, >>> the proper >>> things are happening there. >>> >>> I'll have to dig into the code that specifically deals with parsing >>> the >>> results to bind the processes. Afraid that will take awhile longer >>> - pretty >>> dark in that hole. >>> >>> >>> On Mon, May 4, 2009 at 8:04 AM, Geoffroy Pignot >>> <geopig...@gmail.com> wrote: >>> >>> Hi, >>> >>> So, there are no more crashes with my "crazy" mpirun command. But >>> the >>> paffinity feature seems to be broken. Indeed I am not able to pin my >>> processes. >>> >>> Simple test with a program using your plpa library : >>> >>> r011n006% cat hostf >>> r011n006 slots=4 >>> >>> r011n006% cat rankf >>> rank 0=r011n006 slot=0 ----> bind to CPU 0 , exact ? >>> >>> r011n006% /tmp/HALMPI/openmpi-1.4a/bin/mpirun --hostfile hostf -- >>> rankfile >>> rankf --wdir /tmp -n 1 a.out >>> PLPA Number of processors online: 4 >>> PLPA Number of processor sockets: 2 >>> PLPA Socket 0 (ID 0): 2 cores >>> PLPA Socket 1 (ID 3): 2 cores >>> >>> Ctrl+Z >>> r011n006%bg >>> >>> r011n006% ps axo stat,user,psr,pid,pcpu,comm | grep gpignot >>> R+ gpignot 3 9271 97.8 a.out >>> >>> In fact whatever the slot number I put in my rankfile , a.out >>> always runs >>> on the CPU 3. I was looking for it on CPU 0 accordind to my >>> cpuinfo file >>> (see below) >>> The result is the same if I try another syntax (rank 0=r011n006 >>> slot=0:0 >>> bind to socket 0 - core 0 , exact ? ) >>> >>> Thanks in advance >>> >>> Geoffroy >>> >>> PS: I run on rhel5 >>> >>> r011n006% uname -a >>> Linux r011n006 2.6.18-92.1.1NOMAP32.el5 #1 SMP Sat Mar 15 01:46:39 >>> CDT 2008 >>> x86_64 x86_64 x86_64 GNU/Linux >>> >>> My configure is : >>> ./configure --prefix=/tmp/openmpi-1.4a --libdir='${exec_prefix}/ >>> lib64' >>> --disable-dlopen --disable-mpi-cxx --enable-heterogeneous >>> >>> >>> r011n006% cat /proc/cpuinfo >>> processor : 0 >>> vendor_id : GenuineIntel >>> cpu family : 6 >>> model : 15 >>> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz >>> stepping : 6 >>> cpu MHz : 2660.007 >>> cache size : 4096 KB >>> physical id : 0 >>> siblings : 2 >>> core id : 0 >>> cpu cores : 2 >>> fpu : yes >>> fpu_exception : yes >>> cpuid level : 10 >>> wp : yes >>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr >>> pge mca >>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall >>> nx lm >>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm >>> bogomips : 5323.68 >>> clflush size : 64 >>> cache_alignment : 64 >>> address sizes : 36 bits physical, 48 bits virtual >>> power management: >>> >>> processor : 1 >>> vendor_id : GenuineIntel >>> cpu family : 6 >>> model : 15 >>> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz >>> stepping : 6 >>> cpu MHz : 2660.007 >>> cache size : 4096 KB >>> physical id : 3 >>> siblings : 2 >>> core id : 0 >>> cpu cores : 2 >>> fpu : yes >>> fpu_exception : yes >>> cpuid level : 10 >>> wp : yes >>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr >>> pge mca >>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall >>> nx lm >>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm >>> bogomips : 5320.03 >>> clflush size : 64 >>> cache_alignment : 64 >>> address sizes : 36 bits physical, 48 bits virtual >>> power management: >>> >>> processor : 2 >>> vendor_id : GenuineIntel >>> cpu family : 6 >>> model : 15 >>> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz >>> stepping : 6 >>> cpu MHz : 2660.007 >>> cache size : 4096 KB >>> physical id : 0 >>> siblings : 2 >>> core id : 1 >>> cpu cores : 2 >>> fpu : yes >>> fpu_exception : yes >>> cpuid level : 10 >>> wp : yes >>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr >>> pge mca >>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall >>> nx lm >>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm >>> bogomips : 5319.39 >>> clflush size : 64 >>> cache_alignment : 64 >>> address sizes : 36 bits physical, 48 bits virtual >>> power management: >>> >>> processor : 3 >>> vendor_id : GenuineIntel >>> cpu family : 6 >>> model : 15 >>> model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz >>> stepping : 6 >>> cpu MHz : 2660.007 >>> cache size : 4096 KB >>> physical id : 3 >>> siblings : 2 >>> core id : 1 >>> cpu cores : 2 >>> fpu : yes >>> fpu_exception : yes >>> cpuid level : 10 >>> wp : yes >>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr >>> pge mca >>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall >>> nx lm >>> constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm >>> bogomips : 5320.03 >>> clflush size : 64 >>> cache_alignment : 64 >>> address sizes : 36 bits physical, 48 bits virtual >>> power management: >>> >>> >>> ------------------------------ >>> >>> Message: 2 >>> Date: Mon, 4 May 2009 04:45:57 -0600 >>> From: Ralph Castain <r...@open-mpi.org> >>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? >>> To: Open MPI Users <us...@open-mpi.org> >>> Message-ID: <d01d7b16-4b47-46f3-ad41-d1a90b2e4...@open-mpi.org> >>> >>> Content-Type: text/plain; charset="us-ascii"; Format="flowed"; >>> DelSp="yes" >>> >>> My apologies - I wasn't clear enough. You need a tarball from >>> r21111 >>> or greater...such as: >>> >>> http://www.open-mpi.org/nightly/trunk/openmpi-1.4a1r21142.tar.gz >>> >>> HTH >>> Ralph >>> >>> >>> On May 4, 2009, at 2:14 AM, Geoffroy Pignot wrote: >>> >>> Hi , >>> >>> I got the openmpi-1.4a1r21095.tar.gz tarball, but unfortunately my >>> command doesn't work >>> >>> cat rankf: >>> rank 0=node1 slot=* >>> rank 1=node2 slot=* >>> >>> cat hostf: >>> node1 slots=2 >>> node2 slots=2 >>> >>> mpirun --rankfile rankf --hostfile hostf --host node1 -n 1 >>> hostname : --host node2 -n 1 hostname >>> >>> Error, invalid rank (1) in the rankfile (rankf) >>> >>> >>> >>> >>> -------------------------------------------------------------------------- >>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in >>> file >>> rmaps_rank_file.c at line 403 >>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in >>> file >>> base/rmaps_base_map_job.c at line 86 >>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in >>> file >>> base/plm_base_launch_support.c at line 86 >>> [r011n006:28986] [[45541,0],0] ORTE_ERROR_LOG: Bad parameter in >>> file >>> plm_rsh_module.c at line 1016 >>> >>> >>> Ralph, could you tell me if my command syntax is correct or >>> not ? if >>> not, give me the expected one ? >>> >>> Regards >>> >>> Geoffroy >>> >>> >>> >>> >>> 2009/4/30 Geoffroy Pignot <geopig...@gmail.com> >>> Immediately Sir !!! :) >>> >>> Thanks again Ralph >>> >>> Geoffroy >>> >>> >>> >>> >>> >>> ------------------------------ >>> >>> Message: 2 >>> Date: Thu, 30 Apr 2009 06:45:39 -0600 >>> From: Ralph Castain <r...@open-mpi.org> >>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? >>> To: Open MPI Users <us...@open-mpi.org> >>> Message-ID: >>> <71d2d8cc0904300545v61a42fe1k50086d2704d0f...@mail.gmail.com> >>> Content-Type: text/plain; charset="iso-8859-1" >>> >>> I believe this is fixed now in our development trunk - you can >>> download any >>> tarball starting from last night and give it a try, if you like. >>> Any >>> feedback would be appreciated. >>> >>> Ralph >>> >>> >>> On Apr 14, 2009, at 7:57 AM, Ralph Castain wrote: >>> >>> Ah now, I didn't say it -worked-, did I? :-) >>> >>> Clearly a bug exists in the program. I'll try to take a look at it >>> (if Lenny >>> doesn't get to it first), but it won't be until later in the week. >>> >>> On Apr 14, 2009, at 7:18 AM, Geoffroy Pignot wrote: >>> >>> I agree with you Ralph , and that 's what I expect from openmpi >>> but my >>> second example shows that it's not working >>> >>> cat hostfile.0 >>> r011n002 slots=4 >>> r011n003 slots=4 >>> >>> cat rankfile.0 >>> rank 0=r011n002 slot=0 >>> rank 1=r011n003 slot=1 >>> >>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -n 1 >>> hostname >>> ### CRASHED >>> >>> Error, invalid rank (1) in the rankfile (rankfile.0) >>> >>> >>> >>> >>> >>> -------------------------------------------------------------------------- >>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>> file >>> rmaps_rank_file.c at line 404 >>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>> file >>> base/rmaps_base_map_job.c at line 87 >>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>> file >>> base/plm_base_launch_support.c at line 77 >>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>> file >>> plm_rsh_module.c at line 985 >>> >>> >>> >>> >>> >>> -------------------------------------------------------------------------- >>> A daemon (pid unknown) died unexpectedly on signal 1 while >>> attempting to >>> launch so we are aborting. >>> >>> There may be more information reported by the environment (see >>> above). >>> >>> This may be because the daemon was unable to find all the needed >>> shared >>> libraries on the remote node. You may set your LD_LIBRARY_PATH >>> to >>> have the >>> location of the shared libraries on the remote nodes and this >>> will >>> automatically be forwarded to the remote nodes. >>> >>> >>> >>> >>> >>> -------------------------------------------------------------------------- >>> >>> >>> >>> >>> >>> -------------------------------------------------------------------------- >>> orterun noticed that the job aborted, but has no info as to the >>> process >>> that caused that situation. >>> >>> >>> >>> >>> >>> -------------------------------------------------------------------------- >>> orterun: clean termination accomplished >>> >>> >>> >>> Message: 4 >>> Date: Tue, 14 Apr 2009 06:55:58 -0600 >>> From: Ralph Castain <r...@lanl.gov> >>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? >>> To: Open MPI Users <us...@open-mpi.org> >>> Message-ID: <f6290ada-a196-43f0-a853-cbcb802d8...@lanl.gov> >>> Content-Type: text/plain; charset="us-ascii"; Format="flowed"; >>> DelSp="yes" >>> >>> The rankfile cuts across the entire job - it isn't applied on an >>> app_context basis. So the ranks in your rankfile must correspond >>> to >>> the eventual rank of each process in the cmd line. >>> >>> Unfortunately, that means you have to count ranks. In your case, >>> you >>> only have four, so that makes life easier. Your rankfile would >>> look >>> something like this: >>> >>> rank 0=r001n001 slot=0 >>> rank 1=r001n002 slot=1 >>> rank 2=r001n001 slot=1 >>> rank 3=r001n002 slot=2 >>> >>> HTH >>> Ralph >>> >>> On Apr 14, 2009, at 12:19 AM, Geoffroy Pignot wrote: >>> >>> Hi, >>> >>> I agree that my examples are not very clear. What I want to do >>> is to >>> launch a multiexes application (masters-slaves) and benefit >>> from the >>> processor affinity. >>> Could you show me how to convert this command , using -rf option >>> (whatever the affinity is) >>> >>> mpirun -n 1 -host r001n001 master.x options1 : -n 1 -host >>> r001n002 >>> master.x options2 : -n 1 -host r001n001 slave.x options3 : -n 1 - >>> host r001n002 slave.x options4 >>> >>> Thanks for your help >>> >>> Geoffroy >>> >>> >>> >>> >>> >>> Message: 2 >>> Date: Sun, 12 Apr 2009 18:26:35 +0300 >>> From: Lenny Verkhovsky <lenny.verkhov...@gmail.com> >>> Subject: Re: [OMPI users] 1.3.1 -rf rankfile behaviour ?? >>> To: Open MPI Users <us...@open-mpi.org> >>> Message-ID: >>> >>> <453d39990904120826t2e1d1d33l7bb1fe3de65b5...@mail.gmail.com> >>> Content-Type: text/plain; charset="iso-8859-1" >>> >>> Hi, >>> >>> The first "crash" is OK, since your rankfile has ranks 0 and 1 >>> defined, >>> while n=1, which means only rank 0 is present and can be >>> allocated. >>> >>> NP must be >= the largest rank in rankfile. >>> >>> What exactly are you trying to do ? >>> >>> I tried to recreate your seqv but all I got was >>> >>> ~/work/svn/ompi/trunk/build_x86-64/install/bin/mpirun --hostfile >>> hostfile.0 >>> -rf rankfile.0 -n 1 hostname : -rf rankfile.1 -n 1 hostname >>> [witch19:30798] mca: base: component_find: paffinity >>> "mca_paffinity_linux" >>> uses an MCA interface that is not recognized (component MCA >>> v1.0.0 != >>> supported MCA v2.0.0) -- ignored >>> >>> >>> >>> >>> -------------------------------------------------------------------------- >>> It looks like opal_init failed for some reason; your parallel >>> process is >>> likely to abort. There are many reasons that a parallel process >>> can >>> fail during opal_init; some of which are due to configuration or >>> environment problems. This failure appears to be an internal >>> failure; >>> here's some additional information (which may only be relevant >>> to an >>> Open MPI developer): >>> >>> opal_carto_base_select failed >>> --> Returned value -13 instead of OPAL_SUCCESS >>> >>> >>> >>> >>> -------------------------------------------------------------------------- >>> [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in >>> file >>> ../../orte/runtime/orte_init.c at line 78 >>> [witch19:30798] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in >>> file >>> ../../orte/orted/orted_main.c at line 344 >>> >>> >>> >>> >>> -------------------------------------------------------------------------- >>> A daemon (pid 11629) died unexpectedly with status 243 while >>> attempting >>> to launch so we are aborting. >>> >>> There may be more information reported by the environment (see >>> above). >>> >>> This may be because the daemon was unable to find all the needed >>> shared >>> libraries on the remote node. You may set your LD_LIBRARY_PATH to >>> have the >>> location of the shared libraries on the remote nodes and this >>> will >>> automatically be forwarded to the remote nodes. >>> >>> >>> >>> >>> -------------------------------------------------------------------------- >>> >>> >>> >>> >>> -------------------------------------------------------------------------- >>> mpirun noticed that the job aborted, but has no info as to the >>> process >>> that caused that situation. >>> >>> >>> >>> >>> -------------------------------------------------------------------------- >>> mpirun: clean termination accomplished >>> >>> >>> Lenny. >>> >>> >>> On 4/10/09, Geoffroy Pignot <geopig...@gmail.com> wrote: >>> >>> Hi , >>> >>> I am currently testing the process affinity capabilities of >>> openmpi and I >>> would like to know if the rankfile behaviour I will describe >>> below >>> is normal >>> or not ? >>> >>> cat hostfile.0 >>> r011n002 slots=4 >>> r011n003 slots=4 >>> >>> cat rankfile.0 >>> rank 0=r011n002 slot=0 >>> rank 1=r011n003 slot=1 >>> >>> >>> >>> >>> >>> >>> >>> ################################################################################## >>> >>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 2 hostname ### >>> OK >>> r011n002 >>> r011n003 >>> >>> >>> >>> >>> >>> >>> >>> ################################################################################## >>> but >>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -n 1 >>> hostname >>> ### CRASHED >>> * >>> >>> >>> >>> >>> >>> -------------------------------------------------------------------------- >>> Error, invalid rank (1) in the rankfile (rankfile.0) >>> >>> >>> >>> >>> >>> -------------------------------------------------------------------------- >>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>> file >>> rmaps_rank_file.c at line 404 >>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>> file >>> base/rmaps_base_map_job.c at line 87 >>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>> file >>> base/plm_base_launch_support.c at line 77 >>> [r011n002:25129] [[63976,0],0] ORTE_ERROR_LOG: Bad parameter in >>> file >>> plm_rsh_module.c at line 985 >>> >>> >>> >>> >>> >>> -------------------------------------------------------------------------- >>> A daemon (pid unknown) died unexpectedly on signal 1 while >>> attempting to >>> launch so we are aborting. >>> >>> There may be more information reported by the environment (see >>> above). >>> >>> This may be because the daemon was unable to find all the needed >>> shared >>> libraries on the remote node. You may set your LD_LIBRARY_PATH >>> to >>> have the >>> location of the shared libraries on the remote nodes and this >>> will >>> automatically be forwarded to the remote nodes. >>> >>> >>> >>> >>> >>> -------------------------------------------------------------------------- >>> >>> >>> >>> >>> >>> -------------------------------------------------------------------------- >>> orterun noticed that the job aborted, but has no info as to the >>> process >>> that caused that situation. >>> >>> >>> >>> >>> >>> -------------------------------------------------------------------------- >>> orterun: clean termination accomplished >>> * >>> It seems that the rankfile option is not propagted to the second >>> command >>> line ; there is no global understanding of the ranking inside a >>> mpirun >>> command. >>> >>> >>> >>> >>> >>> >>> >>> ################################################################################## >>> >>> Assuming that , I tried to provide a rankfile to each command >>> line: >>> >>> cat rankfile.0 >>> rank 0=r011n002 slot=0 >>> >>> cat rankfile.1 >>> rank 0=r011n003 slot=1 >>> >>> mpirun --hostfile hostfile.0 -rf rankfile.0 -n 1 hostname : -rf >>> rankfile.1 >>> -n 1 hostname ### CRASHED >>> *[r011n002:28778] *** Process received signal *** >>> [r011n002:28778] Signal: Segmentation fault (11) >>> [r011n002:28778] Signal code: Address not mapped (1) >>> [r011n002:28778] Failing at address: 0x34 >>> [r011n002:28778] [ 0] [0xffffe600] >>> [r011n002:28778] [ 1] >>> /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so. >>> 0(orte_odls_base_default_get_add_procs_data+0x55d) >>> [0x5557decd] >>> [r011n002:28778] [ 2] >>> /tmp/HALMPI/openmpi-1.3.1/lib/libopen-rte.so. >>> 0(orte_plm_base_launch_apps+0x117) >>> [0x555842a7] >>> [r011n002:28778] [ 3] /tmp/HALMPI/openmpi-1.3.1/lib/openmpi/ >>> mca_plm_rsh.so >>> [0x556098c0] >>> [r011n002:28778] [ 4] /tmp/HALMPI/openmpi-1.3.1/bin/orterun >>> [0x804aa27] >>> [r011n002:28778] [ 5] /tmp/HALMPI/openmpi-1.3.1/bin/orterun >>> [0x804a022] >>> [r011n002:28778] [ 6] /lib/libc.so.6(__libc_start_main+0xdc) >>> [0x9f1dec] >>> [r011n002:28778] [ 7] /tmp/HALMPI/openmpi-1.3.1/bin/orterun >>> [0x8049f71] >>> [r011n002:28778] *** End of error message *** >>> Segmentation fault (core dumped)* >>> >>> >>> >>> I hope that I've found a bug because it would be very important >>> for me to >>> have this kind of capabiliy . >>> Launch a multiexe mpirun command line and be able to bind my >>> exes >>> and >>> sockets together. >>> >>> Thanks in advance for your help >>> >>> Geoffroy >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> -------------- next part -------------- >>> HTML attachment scrubbed and removed >>> >>> ------------------------------ >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> End of users Digest, Vol 1202, Issue 2 >>> ************************************** >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> -------------- next part -------------- >>> HTML attachment scrubbed and removed >>> >>> ------------------------------ >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> End of users Digest, Vol 1218, Issue 2 >>> ************************************** >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> -------------- next part -------------- >>> HTML attachment scrubbed and removed >>> >>> ------------------------------ >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> End of users Digest, Vol 1221, Issue 3 >>> ************************************** >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> -------------- next part -------------- >>> HTML attachment scrubbed and removed >>> >>> ------------------------------ >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> End of users Digest, Vol 1221, Issue 6 >>> ************************************** >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> >>> -- >>> Jeff Squyres >>> Cisco Systems >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> -------------- next part -------------- >>> HTML attachment scrubbed and removed >>> >>> ------------------------------ >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> End of users Digest, Vol 1221, Issue 12 >>> *************************************** >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> -- >>> Jeff Squyres >>> Cisco Systems >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> >> -- >> Jeff Squyres >> Cisco Systems >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >