Many thanks for the replies.

The mismatch in OpeMPI version is my fault: while writing the request
for help I looked at the name of the directory where OpenMPI was built
(I did not build it myself) and did not notice that the name of the
directory did not reflect the version actually compiled.

I had already checked the ulimits defined for the account where the
SIGSEGV happens and they seems OK.

Moreover I have a further result: I created a brand new account with
default privileges and tried to run the program under that one, and it
works!

I'm still trying to spot out the differences between the two
unprivileged accounts.

Cheers,
                           l.

On Wed, Dec 10, 2014 at 6:12 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:
> Hi Luca
>
> Another possibility that comes to mind,
> besides mixed versions mentioned by Gilles,
> is the OS limits.
> Limits may vary according to the user and user privileges.
>
> Large programs tend to require big stacksize (even unlimited),
> and typically segfault when the stack is not large enough.
> Max number of open files is yet another hurdle.
> And if you're using Infinband, the max locked memory size should be
> unlimited.
> Check /etc/security/limits.conf and "ulimit -a".
>
> I hope this helps,
> Gus Correa
>
> On 12/10/2014 08:28 AM, Gilles Gouaillardet wrote:
>>
>> Luca,
>>
>> your email mentions openmpi 1.6.5
>> but gdb output points to openmpi 1.8.1.
>>
>> could the root cause be a mix of versions that does not occur with root
>> account ?
>>
>> which openmpi version are you expecting ?
>>
>> you can run
>> pmap <pid>
>> when your binary is running and/or under gdb to confirm the openmpi
>> library that is really used
>>
>> Cheers,
>>
>> Gilles
>>
>> On Wed, Dec 10, 2014 at 7:21 PM, Luca Fini <lf...@arcetri.astro.it
>> <mailto:lf...@arcetri.astro.it>> wrote:
>>
>>     I've a problem running a well tested MPI based application.
>>
>>     The program has been used for years with no problems. Suddenly the
>>     executable which was run many times with no problems crashed with
>>     SIGSEGV. The very same executable if run with root privileges works
>>     OK. The same happens with other executables and across various
>>     recompilation attempts.
>>
>>     We could not find any relevant difference in the O.S. since a few days
>>     ago when the program worked also under unprivileged user ID. Actually
>>     about in the same span of time we changed the GID of the user
>>     experiencing the fault, but we think this is not relevant because the
>>     same SIGSEGV happens to another user which was not modified. Moreover
>>     we cannot see how that change can affect the running executabe (we
>>     checked all file permissions in the directory tree where the program
>>     is used).
>>
>>     Running the program under GDB we get the trace reported below. The
>>     segfault happens at the very beginning during MPI initialization.
>>
>>     We can use the program with sudo, but I'd like to find out what
>>     happened to go back to "normal" usage.
>>
>>     I'd appreciate any hint on the issue.
>>
>>     Many thanks,
>>
>>                                 Luca Fini
>>
>>     ==============================
>>     Here follows a few environment details:
>>
>>     Program started with: mpirun -debug -debugger gdb  -np 1
>>
>> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/M51b2_OT_2POINT_RH_v1_mod/PREP_PGD
>>
>>     OPEN-MPI 1.6.5
>>
>>     Linux 2.6.32-431.29.2.2.6.32-431.29.2.el6.x86_64
>>
>>     Intel fortran Compiler: 2011.7.256
>>
>>     =========================
>>     Here follows the stack trace:
>>
>>     Starting program:
>>
>> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/M51b2_OT_2POINT_RH_v1_mod/PREP_PGD
>>
>> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/M51b2_OT_2POINT_RH_v1_mod/PREP_PGD
>>     [Thread debugging using libthread_db enabled]
>>
>>     Program received signal SIGSEGV, Segmentation fault.
>>     0x00002aaaaaf652c7 in mca_base_component_find (directory=0x0,
>>     type=0x3b914a7fb5 "rte", static_components=0x3b916cb040,
>>     requested_component_names=0x0, include_mode=128, found_components=0x1,
>>     open_dso_components=16)
>>          at mca_base_component_find.c:162
>>     162        OBJ_CONSTRUCT(found_components, opal_list_t);
>>     Missing separate debuginfos, use: debuginfo-install
>>     glibc-2.12-1.149.el6.x86_64 libgcc-4.4.7-11.el6.x86_64
>>     libgfortran-4.4.7-11.el6.x86_64 libtool-ltdl-2.2.6-15.5.el6.x86_64
>>     openmpi-1.8.1-1.el6.x86_64
>>     (gdb) where
>>     #0  0x00002aaaaaf652c7 in mca_base_component_find (directory=0x0,
>>     type=0x3b914a7fb5 "rte", static_components=0x3b916cb040,
>>     requested_component_names=0x0, include_mode=128, found_components=0x1,
>>     open_dso_components=16)
>>          at mca_base_component_find.c:162
>>     #1  0x0000003b90c4870a in mca_base_framework_components_register ()
>>     from /usr/lib64/openmpi/lib/libopen-pal.so.6
>>     #2  0x0000003b90c48c06 in mca_base_framework_register () from
>>     /usr/lib64/openmpi/lib/libopen-pal.so.6
>>     #3  0x0000003b90c48def in mca_base_framework_open () from
>>     /usr/lib64/openmpi/lib/libopen-pal.so.6
>>     #4  0x0000003b914407e7 in ompi_mpi_init () from
>>     /usr/lib64/openmpi/lib/libmpi.so.1
>>     #5  0x0000003b91463200 in PMPI_Init () from
>>     /usr/lib64/openmpi/lib/libmpi.so.1
>>     #6  0x00002aaaaacd9295 in mpi_init_f (ierr=0x7fffffffd268) at
>>     pinit_f.c:75
>>     #7  0x00000000005bb159 in MODE_MNH_WORLD::init_nmnh_comm_world
>>     (kinfo_ll=Cannot access memory at address 0x0
>>     ) at
>>
>> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/MASTER/spll_mode_mnh_world.f90:45
>>     #8  0x00000000005939d3 in MODE_IO_LL::initio_ll () at
>>
>> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/MASTER/spll_mode_io_ll.f90:107
>>     #9  0x000000000049d02f in prep_pgd () at
>>
>> /home/lascaux/MNH-V5-1-2/src/dir_obj-LXifortI4-MNH-V5-1-2-OMPI12X-O2/MASTER/spll_prep_pgd.f90:130
>>     #10 0x000000000049cf8c in main ()
>>
>>     --
>>     Luca Fini.  INAF - Oss. Astrofisico di Arcetri
>>     L.go E.Fermi, 5. 50125 Firenze. Italy
>>     Tel: +39 055 2752 307 <tel:%2B39%20055%202752%20307>     Fax: +39
>>     055 2752 292 <tel:%2B39%20055%202752%20292>
>>     Skype: l.fini
>>     Web: http://www.arcetri.inaf.it/~lfini
>>     _______________________________________________
>>     users mailing list
>>     us...@open-mpi.org <mailto:us...@open-mpi.org>
>>     Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>     Link to this post:
>>     http://www.open-mpi.org/community/lists/users/2014/12/25945.php
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/12/25946.php
>>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/12/25950.php



-- 
Luca Fini.  INAF - Oss. Astrofisico di Arcetri
L.go E.Fermi, 5. 50125 Firenze. Italy
Tel: +39 055 2752 307     Fax: +39 055 2752 292
Skype: l.fini
Web: http://www.arcetri.inaf.it/~lfini

Reply via email to