Hi Benson, thanks for your suggestions. I have tried to compile the program with mpijavac (see build.sh which arguments I used) on the cluster and then ran it with the command in run.sh. I still get the same error as before.
I have also tried to run the ping-pong example. This one gives another segmentation fault, which is included in the log file attached. I noticed that if I compile the code with mpijavac but provide my own classpath arguments (Our app has dependencies) the "mpi.jar" and "shmem.jar" files are not found anymore. I have to include those into my classpath argument. If this is the case, is there still any difference compared to using the regular "javac" compiler? Also, after looking into the "mpijavac --showme" command I noticed that neither the cluster nor the local installation of ompi has a shmem.jar file at the location specified by "mpijavac --showme". Since we are getting segmentation faults, could this be the missing dependency? Thanks for looking into my issue, all the best Janek ------ Originalnachricht ------ Von: "Benson Muite" <benson_mu...@emailplus.org> An: "users@lists.open-mpi.org" <users@lists.open-mpi.org> Cc: "Laudan, Janek" <lau...@tu-berlin.de> Gesendet: 18.03.2022 06:48:54 Betreff: Re: [OMPI users] [EXTERNAL] Java Segentation Fault >Hi Janek, > >If you compile your program and produce a class file, does it run using >mpirun -np 1 java matsim-p > >Try to compile OpenMPI from source as indicated at >https://www-lb.open-mpi.org/faq/?category=java > >Java tends to require more memory, so if using a batch system be sure to >request enough. > >Possibly also interesting to try might be: >https://github.com/mboysan/ping-pong-mpi-tcp > >Benson > >On 3/17/22 7:03 PM, Laudan, Janek via users wrote: >> Hi Howard, >> >> thanks for your reply. I am using version 4.1.2 and I didn't compile >> with the mpijavac wrapper. I was hoping that I could maintain some form >> of our maven build infrastructure and then deploy the resulting jar. The >> Project set up is here: >>https://github.com/Janekdererste/matsim-p/blob/master/pom.xml >> <https://github.com/Janekdererste/matsim-p/blob/master/pom.xml> >> >> All the best, >> Janek >> >> ------ Originalnachricht ------ >> Von: "Pritchard Jr., Howard" <howa...@lanl.gov <mailto:howa...@lanl.gov>> >> An: "Laudan, Janek" <lau...@tu-berlin.de <mailto:lau...@tu-berlin.de>>; >> "Open MPI Users" <users@lists.open-mpi.org >> <mailto:users@lists.open-mpi.org>> >> Gesendet: 17.03.2022 16:59:04 >> Betreff: Re: [EXTERNAL] [OMPI users] Java Segentation Fault >> >>> HI Janek, >>> >>> A few questions. >>> >>> First which version of Open MPI are you using? >>> >>> Did you compile your code with the Open MPI mpijavac wrapper? >>> >>> Howard >>> >>> *From: *users <users-boun...@lists.open-mpi.org >>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of "Laudan, Janek >>> via users" <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> >>> *Reply-To: *"Laudan, Janek" <lau...@tu-berlin.de >>> <mailto:lau...@tu-berlin.de>>, Open MPI Users >>> <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> >>> *Date: *Thursday, March 17, 2022 at 9:52 AM >>> *To: *"users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>" >>> <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> >>> *Cc: *"Laudan, Janek" <lau...@tu-berlin.de <mailto:lau...@tu-berlin.de>> >>> *Subject: *[EXTERNAL] [OMPI users] Java Segentation Fault >>> >>> Hi, >>> > > >>> I am trying to extend an existing Java-Project to be run with >>> open-mpi. I have managed to successfully set up open-mpi and my >>> project on my local machine to conduct some test runs. >>> >>> However, when I tried to set up things on our cluster I ran into some >>> problems. I was able to run some trivial examples such as "HelloWorld" >>> and "Ring" which I found on in the ompi-Github-repo. Unfortunately, >>> when I try to run our app wrapped between MPI.Init(args) and >>> MPI.Finalize() I get the following segmentation fault: >>> >>> $ mpirun -np 1 java -cp matsim-p-1.0-SNAPSHOT.jar >>> org.matsim.parallel.RunMinimalMPIExample >>> Java-Version: 11.0.2 >>> before getTestScenario >>> before load config >>> WARNING: sun.reflect.Reflection.getCallerClass is not supported. This >>> will impact performance. >>> [cluster-i:1272 :0:1274] Caught signal 11 (Segmentation fault: address >>> not mapped to object at address 0xc) >>> ==== backtrace (tid: 1274) ==== >>> ================================= >>> # >>> # A fatal error has been detected by the Java Runtime Environment: >>> # >>> # SIGSEGV (0xb) at pc=0x000014a85752fdf4, pid=1272, tid=1274 >>> # >>> # JRE version: Java(TM) SE Runtime Environment (11.0.2+9) (build >>> 11.0.2+9-LTS) >>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (11.0.2+9-LTS, mixed >>> mode, tiered, compressed oops, g1 gc, linux-amd64) >>> # Problematic frame: >>> # J 612 c2 >>> java.lang.StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder; >>> java.base@11.0.2 (8 bytes) @ 0x000014a85752fdf4 >>> [0x000014a85752fdc0+0x0000000000000034] >>> # >>> # No core dump will be written. Core dumps have been disabled. To >>> enable core dumping, try "ulimit -c unlimited" before starting Java again >>> # >>> # An error report file with more information is saved as: >>> # /net/ils/laudan/mpi-test/matsim-p/hs_err_pid1272.log >>> Compiled method (c2) 1052 612 4 >>> java.lang.StringBuilder::append (8 bytes) >>> total in heap [0x000014a85752fc10,0x000014a8575306a8] = 2712 >>> relocation [0x000014a85752fd88,0x000014a85752fdb8] = 48 >>> main code [0x000014a85752fdc0,0x000014a857530360] = 1440 >>> stub code [0x000014a857530360,0x000014a857530378] = 24 >>> metadata [0x000014a857530378,0x000014a8575303c0] = 72 >>> scopes data [0x000014a8575303c0,0x000014a857530578] = 440 >>> scopes pcs [0x000014a857530578,0x000014a857530658] = 224 >>> dependencies [0x000014a857530658,0x000014a857530660] = 8 >>> handler table [0x000014a857530660,0x000014a857530678] = 24 >>> nul chk table [0x000014a857530678,0x000014a8575306a8] = 48 >>> Compiled method (c1) 1053 263 3 >>> java.lang.StringBuilder::<init> (7 bytes) >>> total in heap [0x000014a850102790,0x000014a850102b30] = 928 >>> relocation [0x000014a850102908,0x000014a850102940] = 56 >>> main code [0x000014a850102940,0x000014a850102a20] = 224 >>> stub code [0x000014a850102a20,0x000014a850102ac8] = 168 >>> metadata [0x000014a850102ac8,0x000014a850102ad0] = 8 >>> scopes data [0x000014a850102ad0,0x000014a850102ae8] = 24 >>> scopes pcs [0x000014a850102ae8,0x000014a850102b28] = 64 >>> dependencies [0x000014a850102b28,0x000014a850102b30] = 8 >>> Could not load hsdis-amd64.so; library not loadable; PrintAssembly is >>> disabled >>> # >>> # If you would like to submit a bug report, please visit: >>> # http://bugreport.java.com/bugreport/crash.jsp >>> <http://bugreport.java.com/bugreport/crash.jsp> >>> # >>> [cluster-i:01272] *** Process received signal *** >>> [cluster-i:01272] Signal: Aborted (6) >>> [cluster-i:01272] Signal code: (-6) >>> [cluster-i:01272] [ 0] /usr/lib64/libpthread.so.0(+0xf630)[0x14a86e477630] >>> [cluster-i:01272] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x14a86dcbb387] >>> [cluster-i:01272] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x14a86dcbca78] >>> [cluster-i:01272] [ 3] >>> >>> /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xc00be9)[0x14a86d3f8be9] >>> [cluster-i:01272] [ 4] >>> >>> /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xe29619)[0x14a86d621619] >>> [cluster-i:01272] [ 5] >>> >>> /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xe29e9b)[0x14a86d621e9b] >>> [cluster-i:01272] [ 6] >>> >>> /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xe29ece)[0x14a86d621ece] >>> [cluster-i:01272] [ 7] >>> >>> /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(JVM_handle_linux_signal+0x1c0)[0x14a86d403a00] >>> [cluster-i:01272] [ 8] >>> >>> /afs/math.tu-berlin.de/software/java/jdk-11.0.2/lib/server/libjvm.so(+0xbff5e8)[0x14a86d3f75e8] >>> [cluster-i:01272] [ 9] /usr/lib64/libpthread.so.0(+0xf630)[0x14a86e477630] >>> [cluster-i:01272] [10] [0x14a85752fdf4] >>> [cluster-i:01272] *** End of error message *** >>> -------------------------------------------------------------------------- >>> Primary job terminated normally, but 1 process returned >>> a non-zero exit code. Per user-direction, the job has been aborted. >>> -------------------------------------------------------------------------- >>> -------------------------------------------------------------------------- >>> mpirun noticed that process rank 0 with PID 0 on node cluster-i exited >>> on signal 6 (Aborted). >>> -------------------------------------------------------------------------- >>> >>> I am running ompi 4.1.2 with java-11. The project which I am trying to >>> set up is here: https://github.com/Janekdererste/matsim-p >>> <https://github.com/Janekdererste/matsim-p> >>> >>> I hope somebody can advise on what to try next. Thanks and all the best >>> >>> Janek >>> >
build.sh
Description: build.sh
run.sh
Description: run.sh
ping-pong-mpi-tcp.log
Description: ping-pong-mpi-tcp.log