Thank you everybody for your answers.
For the .machines file, we already have a script and it is well generated.
We will try to verify again the links and test another version of the fftw3-library. I will keep you informed if the problem is solved.

Best regards,
Rémi Arras

Le 22/10/2014 14:22, Peter Blaha a écrit :
Usually the "crucial" point for lapw0  is the fftw3-library.

I noticed you have fftw-3.3.4, which I never tested. Since fftw is incompatible between fftw2 and 3, maybe they have done something again ...

Besides that, I assume you have installed fftw using the same ifor and mpi versions ...



On 10/22/2014 01:29 PM, Rémi Arras wrote:
Dear Pr. Blaha, Dear Wien2k users,

We tried to install the last version of Wien2k (14.1) on a supercomputer
and we are facing some troubles with the MPI parallel version.

1)lapw0 is running correctly in sequential, but crashes systematically
when the parallel option is activated (independently of the number of
cores we use):

lapw0 -p(16:08:13) starting parallel lapw0 at lun. sept. 29 16:08:13
CEST 2014
-------- .machine0 : 4 processors
Child id1 SIGSEGV
Child id2 SIGSEGV
Child id3 SIGSEGV
Child id0 SIGSEGV
**lapw0 crashed!
0.029u 0.036s 0:50.91 0.0%0+0k 5248+104io 17pf+0w
error: command/eos3/p1229/remir/INSTALLATION_WIEN/14.1/lapw0para -up -c
lapw0.deffailed
stop error

w2k_dispatch_signal(): received: Segmentation fault
w2k_dispatch_signal(): received: Segmentation fault
Child with myid of1has an error
'Unknown' - SIGSEGV
Child id1 SIGSEGV
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 1
**lapw0 crashed!
cat: No match.0.027u 0.034s 1:33.13 0.0%0+0k 5200+96io 16pf+0w
error: command/eos3/p1229/remir/INSTALLATION_WIEN/14.1/lapw0para -up -c
lapw0.deffailed


2) lapw2 also crashes sometimes when MPI parallelization is used.
Sequential or k-parallel runs are ok, and contrary to lapw0, the error
does not occur for all cases (we did not notice any problem when testing
the mpi benchmark with lapw1):

w2k_dispatch_signal(): received: Segmentation fault application called
MPI_Abort(MPI_COMM_WORLD, 768) - process 0

Our system is a Bullx DLC Cluster (LInux Red Hat+ Intel Ivybridge) and
we use the compiler(+mkl) intel/14.0.2.144 and intelmpi/4.1.3.049.
The batch Scheduler is SLURM.

Here are the settings and the options we used for the installation :

OPTIONS:
current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
-Dmkl_scalapack -traceback -xAVX
current:FFTW_OPT:-DFFTW3
-I/users/p1229/remir/INSTALLATION_WIEN/fftw-3.3.4-Intel_MPI/include
current:FFTW_LIBS:-lfftw3_mpi -lfftw3
-L/users/p1229/remir/INSTALLATION_WIEN/fftw-3.3.4-Intel_MPI/lib
current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread
current:DPARALLEL:'-DParallel'
current:R_LIBS:-lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread
-lmkl_core -openmp -lpthread
current:RP_LIBS:-mkl=cluster -lfftw3_mpi -lfftw3
-L/users/p1229/remir/INSTALLATION_WIEN/fftw-3.3.4-Intel_MPI/lib
current:MPIRUN:mpirun -np _NP_ _EXEC_
current:MKL_TARGET_ARCH:intel64

PARALLEL_OPTIONS:
setenv TASKSET "no"
setenv USE_REMOTE 1
setenv MPI_REMOTE 1
setenv WIEN_GRANULARITY 1
setenv WIEN_MPIRUN "mpirun -np _NP_ _EXEC_"

Any suggestions which could help us to solve this problem would be
greatly appreciated.

Best regards,
Rémi Arras


_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html




_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Reply via email to