So I ran valgrind on my code and it came up with a few thousand memory errors, but none of them had anything to do with the code I wrote. It gave a few errors for the LDAP authentication stuff at the beginning, but most of the error came from orte*. The only part that made reference to my code was in the main file on line 13 where I include mpi.h. This seems suspect to me to have so many "error" in well used and test codes. Also the stack trace errors that I previously posted showed errors in places in my code that have been stable and unchanged for about a year.
It seems like maybe this is some kind of error with the system configuration or something like that. It just seems too odd for these memory faults to just appear like that. Sam Adams General Dynamics Information Technology Phone: 210.536.5945 -----Original Message----- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres Sent: Monday, August 13, 2007 4:13 PM To: Open MPI Users Subject: Re: [OMPI users] segmentation faults It *looks* like a run-of-the-mill memory-badness kind of error, but it's impossible to say without more information. Are you able to run this through valgrind or some other memory- checking debugger? It looks like the single process case may be the simplest to check...? On Aug 13, 2007, at 5:03 PM, Adams, Samuel D Contr AFRL/HEDR wrote: > I tried to run a code that I have running for a while now this > morning, > but for some reason it is causing segmentation faults. I can't really > think of anything that I have done recently that would be causing > these > errors. Does anyone have any idea? > > I get this running it on more than one processor...... > [sam@prodnode1 all]$ mpirun -np 2 --prefix > /usr/local/profiles/gcc-openmpi/ /home/sam/code/fdtd/fdtd_0.3/fdtd -t > /home/sam/code/fdtd/fdtd_0.3/test_files/tissue.txt -r > /home/sam/code/fdtd/fdtd_0.3/test_files/tester_x002y002z004.raw -v -f > 3000 --pw 90,0,1,0 -l test_log.out -a 1 > [prodnode1:04400] *** Process received signal *** > [prodnode1:04400] Signal: Segmentation fault (11) > [prodnode1:04400] Signal code: Invalid permissions (2) > [prodnode1:04400] Failing at address: 0x2aaaab000048 > [prodnode1:04399] *** Process received signal *** > [prodnode1:04399] Signal: Segmentation fault (11) > [prodnode1:04399] Signal code: Invalid permissions (2) > [prodnode1:04399] Failing at address: 0x2aaaab0a0a48 > [prodnode1:04400] [ 0] /lib64/libpthread.so.0 [0x3aa840dd40] > [prodnode1:04400] [ 1] > /usr/local/profiles/gcc-openmpi/lib/libopen-pal.so.0(_int_malloc > +0x2a5) > [0x2aaaaafda345] > [prodnode1:04400] [ 2] > /usr/local/profiles/gcc-openmpi/lib/libopen-pal.so.0(calloc+0xaa) > [0x2aaaaafdbd8a] > [prodnode1:04400] [ 3] > /home/sam/code/fdtd/fdtd_0.3/fdtd(parseTissues+0x23) [0x40c9d3] > [prodnode1:04400] [ 4] > /home/sam/code/fdtd/fdtd_0.3/fdtd(parseArgs+0x489) [0x404b09] > [prodnode1:04400] [ 5] /home/sam/code/fdtd/fdtd_0.3/fdtd(main+0x41) > [0x404eb1] > [prodnode1:04400] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4) > [0x3aa781d8a4] > [prodnode1:04400] [ 7] /home/sam/code/fdtd/fdtd_0.3/fdtd [0x4034b9] > [prodnode1:04400] *** End of error message *** > [prodnode1:04399] [ 0] /lib64/libpthread.so.0 [0x3aa840dd40] > [prodnode1:04399] [ 1] > /usr/local/profiles/gcc-openmpi/lib/libopen-pal.so.0(_int_malloc > +0x2a5) > [0x2aaaaafda345] > [prodnode1:04399] [ 2] > /usr/local/profiles/gcc-openmpi/lib/libopen-pal.so.0(calloc+0xaa) > [0x2aaaaafdbd8a] > [prodnode1:04399] [ 3] > /home/sam/code/fdtd/fdtd_0.3/fdtd(parseTissues+0x23) [0x40c9d3] > [prodnode1:04399] [ 4] > /home/sam/code/fdtd/fdtd_0.3/fdtd(parseArgs+0x489) [0x404b09] > [prodnode1:04399] [ 5] /home/sam/code/fdtd/fdtd_0.3/fdtd(main+0x41) > [0x404eb1] > [prodnode1:04399] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4) > [0x3aa781d8a4] > [prodnode1:04399] [ 7] /home/sam/code/fdtd/fdtd_0.3/fdtd [0x4034b9] > [prodnode1:04399] *** End of error message *** > mpirun noticed that job rank 0 with PID 4399 on node > prodnode1.brooks.af.mil exited on signal 11 (Segmentation fault). > 1 additional process aborted (not shown) > > --Or I get this if I run it on just one processor. > [sam@prodnode1 all]$ ./script2.sh [prodnode1:04405] *** Process > received > signal *** > [prodnode1:04405] Signal: Segmentation fault (11) > [prodnode1:04405] Signal code: Address not mapped (1) > [prodnode1:04405] Failing at address: 0x18 > [prodnode1:04405] [ 0] /lib64/libpthread.so.0 [0x3aa840dd40] > [prodnode1:04405] [ 1] /home/sam/code/fdtd/fdtd_0.3/fdtd(calcMass > +0xac) > [0x40443c] > [prodnode1:04405] [ 2] > /home/sam/code/fdtd/fdtd_0.3/fdtd(parseArgs+0x5a1) [0x404c21] > [prodnode1:04405] [ 3] /home/sam/code/fdtd/fdtd_0.3/fdtd(main+0x41) > [0x404eb1] > [prodnode1:04405] [ 4] /lib64/libc.so.6(__libc_start_main+0xf4) > [0x3aa781d8a4] > [prodnode1:04405] [ 5] /home/sam/code/fdtd/fdtd_0.3/fdtd [0x4034b9] > [prodnode1:04405] *** End of error message *** > mpirun noticed that job rank 0 with PID 4405 on node > prodnode1.brooks.af.mil exited on signal 11 (Segmentation fault). > [sam@prodnode1 all]$ > > > Sam Adams > General Dynamics Information Technology > Phone: 210.536.5945 > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users