subject:"\[Wien\] Problem when running MPI\-parallel version of LAPW0"

Re: [Wien] Problem when running MPI-parallel version of LAPW0

2014-10-23 Thread Rémi Arras


Thank you everybody for your answers.
For the .machines file, we already have a script and it is well generated.
We will try to verify again the links and test another version of the 
fftw3-library.  I will keep you informed if the problem is solved.


Best regards,
Rémi Arras

Le 22/10/2014 14:22, Peter Blaha a écrit :

Usually the "crucial" point for lapw0  is the fftw3-library.

I noticed you have fftw-3.3.4, which I never tested. Since fftw is 
incompatible between fftw2 and 3, maybe they have done something again 
...


Besides that, I assume you have installed fftw using the same ifor and 
mpi versions ...




On 10/22/2014 01:29 PM, Rémi Arras wrote:

Dear Pr. Blaha, Dear Wien2k users,

We tried to install the last version of Wien2k (14.1) on a supercomputer
and we are facing some troubles with the MPI parallel version.

1)lapw0 is running correctly in sequential, but crashes systematically
when the parallel option is activated (independently of the number of
cores we use):


lapw0 -p(16:08:13) starting parallel lapw0 at lun. sept. 29 16:08:13

CEST 2014
 .machine0 : 4 processors
Child id1 SIGSEGV
Child id2 SIGSEGV
Child id3 SIGSEGV
Child id0 SIGSEGV
**lapw0 crashed!
0.029u 0.036s 0:50.91 0.0%0+0k 5248+104io 17pf+0w
error: command/eos3/p1229/remir/INSTALLATION_WIEN/14.1/lapw0para -up -c
lapw0.deffailed

stop error


w2k_dispatch_signal(): received: Segmentation fault
w2k_dispatch_signal(): received: Segmentation fault
Child with myid of1has an error
'Unknown' - SIGSEGV
Child id1 SIGSEGV
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 1
**lapw0 crashed!
cat: No match.0.027u 0.034s 1:33.13 0.0%0+0k 5200+96io 16pf+0w
error: command/eos3/p1229/remir/INSTALLATION_WIEN/14.1/lapw0para -up -c
lapw0.deffailed


2) lapw2 also crashes sometimes when MPI parallelization is used.
Sequential or k-parallel runs are ok, and contrary to lapw0, the error
does not occur for all cases (we did not notice any problem when testing
the mpi benchmark with lapw1):

w2k_dispatch_signal(): received: Segmentation fault application called
MPI_Abort(MPI_COMM_WORLD, 768) - process 0

Our system is a Bullx DLC Cluster (LInux Red Hat+ Intel Ivybridge) and
we use the compiler(+mkl) intel/14.0.2.144 and intelmpi/4.1.3.049.
The batch Scheduler is SLURM.

Here are the settings and the options we used for the installation :

OPTIONS:
current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
-Dmkl_scalapack -traceback -xAVX
current:FFTW_OPT:-DFFTW3
-I/users/p1229/remir/INSTALLATION_WIEN/fftw-3.3.4-Intel_MPI/include
current:FFTW_LIBS:-lfftw3_mpi -lfftw3
-L/users/p1229/remir/INSTALLATION_WIEN/fftw-3.3.4-Intel_MPI/lib
current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread
current:DPARALLEL:'-DParallel'
current:R_LIBS:-lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread
-lmkl_core -openmp -lpthread
current:RP_LIBS:-mkl=cluster -lfftw3_mpi -lfftw3
-L/users/p1229/remir/INSTALLATION_WIEN/fftw-3.3.4-Intel_MPI/lib
current:MPIRUN:mpirun -np _NP_ _EXEC_
current:MKL_TARGET_ARCH:intel64

PARALLEL_OPTIONS:
setenv TASKSET "no"
setenv USE_REMOTE 1
setenv MPI_REMOTE 1
setenv WIEN_GRANULARITY 1
setenv WIEN_MPIRUN "mpirun -np _NP_ _EXEC_"

Any suggestions which could help us to solve this problem would be
greatly appreciated.

Best regards,
Rémi Arras


___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at: 
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html







___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] Problem when running MPI-parallel version of LAPW0

2014-10-22 Thread Peter Blaha


Usually the "crucial" point for lapw0  is the fftw3-library.

I noticed you have fftw-3.3.4, which I never tested. Since fftw is 
incompatible between fftw2 and 3, maybe they have done something again ...


Besides that, I assume you have installed fftw using the same ifor and 
mpi versions ...




On 10/22/2014 01:29 PM, Rémi Arras wrote:

Dear Pr. Blaha, Dear Wien2k users,

We tried to install the last version of Wien2k (14.1) on a supercomputer
and we are facing some troubles with the MPI parallel version.

1)lapw0 is running correctly in sequential, but crashes systematically
when the parallel option is activated (independently of the number of
cores we use):


lapw0 -p(16:08:13) starting parallel lapw0 at lun. sept. 29 16:08:13

CEST 2014
 .machine0 : 4 processors
Child id1 SIGSEGV
Child id2 SIGSEGV
Child id3 SIGSEGV
Child id0 SIGSEGV
**lapw0 crashed!
0.029u 0.036s 0:50.91 0.0%0+0k 5248+104io 17pf+0w
error: command/eos3/p1229/remir/INSTALLATION_WIEN/14.1/lapw0para -up -c
lapw0.deffailed

stop error


w2k_dispatch_signal(): received: Segmentation fault
w2k_dispatch_signal(): received: Segmentation fault
Child with myid of1has an error
'Unknown' - SIGSEGV
Child id1 SIGSEGV
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 1
**lapw0 crashed!
cat: No match.0.027u 0.034s 1:33.13 0.0%0+0k 5200+96io 16pf+0w
error: command/eos3/p1229/remir/INSTALLATION_WIEN/14.1/lapw0para -up -c
lapw0.deffailed


2) lapw2 also crashes sometimes when MPI parallelization is used.
Sequential or k-parallel runs are ok, and contrary to lapw0, the error
does not occur for all cases (we did not notice any problem when testing
the mpi benchmark with lapw1):

w2k_dispatch_signal(): received: Segmentation fault application called
MPI_Abort(MPI_COMM_WORLD, 768) - process 0

Our system is a Bullx DLC Cluster (LInux Red Hat+ Intel Ivybridge) and
we use the compiler(+mkl) intel/14.0.2.144 and intelmpi/4.1.3.049.
The batch Scheduler is SLURM.

Here are the settings and the options we used for the installation :

OPTIONS:
current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
-Dmkl_scalapack -traceback -xAVX
current:FFTW_OPT:-DFFTW3
-I/users/p1229/remir/INSTALLATION_WIEN/fftw-3.3.4-Intel_MPI/include
current:FFTW_LIBS:-lfftw3_mpi -lfftw3
-L/users/p1229/remir/INSTALLATION_WIEN/fftw-3.3.4-Intel_MPI/lib
current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread
current:DPARALLEL:'-DParallel'
current:R_LIBS:-lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread
-lmkl_core -openmp -lpthread
current:RP_LIBS:-mkl=cluster -lfftw3_mpi -lfftw3
-L/users/p1229/remir/INSTALLATION_WIEN/fftw-3.3.4-Intel_MPI/lib
current:MPIRUN:mpirun -np _NP_ _EXEC_
current:MKL_TARGET_ARCH:intel64

PARALLEL_OPTIONS:
setenv TASKSET "no"
setenv USE_REMOTE 1
setenv MPI_REMOTE 1
setenv WIEN_GRANULARITY 1
setenv WIEN_MPIRUN "mpirun -np _NP_ _EXEC_"

Any suggestions which could help us to solve this problem would be
greatly appreciated.

Best regards,
Rémi Arras


___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html



--

  P.Blaha
--
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
Email: bl...@theochem.tuwien.ac.atWIEN2k: http://www.wien2k.at
WWW:   http://www.imc.tuwien.ac.at/staff/tc_group_e.php
--
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] Problem when running MPI-parallel version of LAPW0

2014-10-22 Thread Laurence Marks

It is often hard to know exactly what issues are with mpi. Most often it is
due to incorrect combinations of scalapack/blacs in the linking options.

The first think to check is your linking options with
https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/.
What you have does not look exactly right to me, but I have not used your
release.

If that does not work, look in case.dayfile, the log file.

If there is still nothing it is sometimes useful to comment out the line

  CALL W2kinit

in lapw0.F, recompile then just do "x lapw0 -p". You sometimes will get
more information although it is not as safe as mpi tasks can hang forever
without it in some cases.

On Wed, Oct 22, 2014 at 6:29 AM, Rémi Arras  wrote:

>  Dear Pr. Blaha, Dear Wien2k users,
>
> We tried to install the last version of Wien2k (14.1) on a supercomputer
> and we are facing some troubles with the MPI parallel version.
>
> 1)  lapw0 is running correctly in sequential, but crashes systematically
> when the parallel option is activated (independently of the number of cores
> we use):
>
> >   lapw0 -p(16:08:13) starting parallel lapw0 at lun. sept. 29 16:08:13
> CEST 2014
>  .machine0 : 4 processors
>  Child id   1 SIGSEGV
>  Child id   2 SIGSEGV
>  Child id   3 SIGSEGV
>  Child id   0 SIGSEGV
> **  lapw0 crashed!
> 0.029u 0.036s 0:50.91 0.0%  0+0k 5248+104io 17pf+0w
> error: command   /eos3/p1229/remir/INSTALLATION_WIEN/14.1/lapw0para -up -c
> lapw0.def   failed
> >   stop error
>
> w2k_dispatch_signal(): received: Segmentation fault
> w2k_dispatch_signal(): received: Segmentation fault
>  Child with myid of1  has an error
> 'Unknown' - SIGSEGV
>  Child id   1 SIGSEGV
> application called MPI_Abort(MPI_COMM_WORLD, 0) - process 1
> **  lapw0 crashed!
> cat: No match.0.027u 0.034s 1:33.13 0.0%  0+0k 5200+96io 16pf+0w
> error: command   /eos3/p1229/remir/INSTALLATION_WIEN/14.1/lapw0para -up
> -c lapw0.def   failed
>
>
> 2) lapw2 also crashes sometimes when MPI parallelization is used.
> Sequential or k-parallel runs are ok, and contrary to lapw0, the error does
> not occur for all cases (we did not notice any problem when testing the
> mpi benchmark with lapw1):
>
> w2k_dispatch_signal(): received: Segmentation fault application called
> MPI_Abort(MPI_COMM_WORLD, 768) - process 0
>
> Our system is a Bullx DLC Cluster (LInux Red Hat+ Intel Ivybridge) and we
> use the compiler(+mkl) intel/14.0.2.144 and intelmpi/4.1.3.049.
> The batch Scheduler is SLURM.
>
> Here are the settings and the options we used for the installation :
>
> OPTIONS:
> current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
> current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML
> -Dmkl_scalapack -traceback -xAVX
> current:FFTW_OPT:-DFFTW3
> -I/users/p1229/remir/INSTALLATION_WIEN/fftw-3.3.4-Intel_MPI/include
> current:FFTW_LIBS:-lfftw3_mpi -lfftw3
> -L/users/p1229/remir/INSTALLATION_WIEN/fftw-3.3.4-Intel_MPI/lib
> current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread
> current:DPARALLEL:'-DParallel'
> current:R_LIBS:-lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread
> -lmkl_core -openmp -lpthread
> current:RP_LIBS:-mkl=cluster -lfftw3_mpi -lfftw3
> -L/users/p1229/remir/INSTALLATION_WIEN/fftw-3.3.4-Intel_MPI/lib
> current:MPIRUN:mpirun -np _NP_ _EXEC_
> current:MKL_TARGET_ARCH:intel64
>
> PARALLEL_OPTIONS:
> setenv TASKSET "no"
> setenv USE_REMOTE 1
> setenv MPI_REMOTE 1
> setenv WIEN_GRANULARITY 1
> setenv WIEN_MPIRUN "mpirun -np _NP_ _EXEC_"
>
> Any suggestions which could help us to solve this problem would be greatly
> appreciated.
>
> Best regards,
> Rémi Arras
>



-- 
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu
Corrosion in 4D: MURI4D.numis.northwestern.edu
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] Problem when running MPI-parallel version of LAPW0

2014-10-22 Thread Michael Sluydts

Perhaps an important note: the python script is for a Torque PBS queuing 
system (based on $PBS_NODEFILE)


Rémi Arras schreef op 22/10/2014 13:29:

Dear Pr. Blaha, Dear Wien2k users,

We tried to install the last version of Wien2k (14.1) on a 
supercomputer and we are facing some troubles with the MPI parallel 
version.


1)lapw0 is running correctly in sequential, but crashes systematically 
when the parallel option is activated (independently of the number of 
cores we use):


>lapw0 -p(16:08:13) starting parallel lapw0 at lun. sept. 29 16:08:13 
CEST 2014

 .machine0 : 4 processors
Child id1 SIGSEGV
Child id2 SIGSEGV
Child id3 SIGSEGV
Child id0 SIGSEGV
**lapw0 crashed!
0.029u 0.036s 0:50.91 0.0%0+0k 5248+104io 17pf+0w
error: command/eos3/p1229/remir/INSTALLATION_WIEN/14.1/lapw0para -up 
-c lapw0.deffailed

>stop error

w2k_dispatch_signal(): received: Segmentation fault
w2k_dispatch_signal(): received: Segmentation fault
Child with myid of1has an error
'Unknown' - SIGSEGV
Child id1 SIGSEGV
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 1
**lapw0 crashed!
cat: No match.0.027u 0.034s 1:33.13 0.0%0+0k 5200+96io 16pf+0w
error: command/eos3/p1229/remir/INSTALLATION_WIEN/14.1/lapw0para -up 
-c lapw0.deffailed



2) lapw2 also crashes sometimes when MPI parallelization is used. 
Sequential or k-parallel runs are ok, and contrary to lapw0, the error 
does not occur for all cases (we did not notice any problem when 
testing the mpi benchmark with lapw1):


w2k_dispatch_signal(): received: Segmentation fault application called 
MPI_Abort(MPI_COMM_WORLD, 768) - process 0


Our system is a Bullx DLC Cluster (LInux Red Hat+ Intel Ivybridge) and 
we use the compiler(+mkl) intel/14.0.2.144 and intelmpi/4.1.3.049.

The batch Scheduler is SLURM.

Here are the settings and the options we used for the installation :

OPTIONS:
current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML 
-Dmkl_scalapack -traceback -xAVX
current:FFTW_OPT:-DFFTW3 
-I/users/p1229/remir/INSTALLATION_WIEN/fftw-3.3.4-Intel_MPI/include
current:FFTW_LIBS:-lfftw3_mpi -lfftw3 
-L/users/p1229/remir/INSTALLATION_WIEN/fftw-3.3.4-Intel_MPI/lib

current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread
current:DPARALLEL:'-DParallel'
current:R_LIBS:-lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread 
-lmkl_core -openmp -lpthread
current:RP_LIBS:-mkl=cluster -lfftw3_mpi -lfftw3 
-L/users/p1229/remir/INSTALLATION_WIEN/fftw-3.3.4-Intel_MPI/lib

current:MPIRUN:mpirun -np _NP_ _EXEC_
current:MKL_TARGET_ARCH:intel64

PARALLEL_OPTIONS:
setenv TASKSET "no"
setenv USE_REMOTE 1
setenv MPI_REMOTE 1
setenv WIEN_GRANULARITY 1
setenv WIEN_MPIRUN "mpirun -np _NP_ _EXEC_"

Any suggestions which could help us to solve this problem would be 
greatly appreciated.


Best regards,
Rémi Arras


___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] Problem when running MPI-parallel version of LAPW0

2014-10-22 Thread Michael Sluydts


Hello Rémi,

While I'm not sure this is the (only) problem, in our setup we also give 
mpirun the machines file:


setenv WIEN_MPIRUN "mpirun  -np _NP_ -machinefile _HOSTS_ _EXEC_"

which I generate based on a 1 k-point per node setup with the following 
python script:


/wienhybrid
#!/usr/bin/env python
#Machines file generator for WIEN2k
#May 13th 2013
#
#Michael Sluydts
#Center for Molecular Modeling
#Ghent University
from collections import Counter
import subprocess, os
nodefile = subprocess.Popen('echo 
$PBS_NODEFILE',stdout=subprocess.PIPE,shell=True)

nodefile = nodefile.communicate()[0].strip()
nodefile = open(nodefile,'r')

machines = nodefile.readlines()
nodefile.close()

node = ''
corecount=Counter()


#gather cores per nodes
for core in machines:
node = core.split('.')[0]
corecount[node] += 1



#if there are more nodes than k-points we must redistribute the 
remaining cores


#count the irreducible kpoints
IBZ = int(subprocess.Popen('wc -l < ' + os.getcwd().split('/')[-1] + 
'.klist',stdout=subprocess.PIPE,shell=True).communicate()[0])-2


corerank = corecount.most_common()

alloc = Counter()
total = Counter()
nodemap = []
#pick out the largest nodes and redivide the remaining ones by adding 
the largest leftover node to the k-point with least allocated cores


for node,cores in corerank:
if len(alloc) < IBZ:
alloc[node] += cores
total[node] += cores
else:
lowcore = total.most_common()[-1][0]
total[lowcore] += cores
nodemap.append((node,lowcore))

#give lapw0 all cores
machinesfile = 'lapw0: ' + corecount.keys()[0] + ':' + 
str(corecount[corecount.keys()[0]]) + '\n'

#for node in corecount.keys():
#machinesfile += node + ':' + str(corecount[node]) + ' '
#machinesfile += '\n'

#machinesfile = ''
for node in alloc.keys():
#allocate main node
machinesfile += '1:' + node + ':' + str(alloc[node])
#machinesfile += '1:' + node
#for i in range(1,alloc[node]):
#machinesfile += ' ' + node
#distribute leftover nodes
extra = [x for x,y in nodemap if y == node]
for ext in extra:
#machinesfile += ' ' + ext + ':' + str(corecount[ext])
for i in range(1,corecount[ext]):
machinesfile+=' ' + ext
machinesfile += '\n'


#If your nodes do not all have the same specifications you may have to 
change the weights above 1: and the granularity below, if you use a 
residue machine you should remove extrafine and add the residue 
configuration

machinesfile += 'granularity:1\nextrafine:1\n'

#if you have memory issues or a limited bandwidth between nodes try 
uncommenting the following line (can always try it and see if it speeds 
things up)

#machinesfile += 'lapw2 vector split:2\n'

machines = file('.machines','w')
machines.write(machinesfile)
machines.close()



___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

[Wien] Problem when running MPI-parallel version of LAPW0

2014-10-22 Thread Rémi Arras


Dear Pr. Blaha, Dear Wien2k users,

We tried to install the last version of Wien2k (14.1) on a supercomputer 
and we are facing some troubles with the MPI parallel version.


1)lapw0 is running correctly in sequential, but crashes systematically 
when the parallel option is activated (independently of the number of 
cores we use):


lapw0 -p(16:08:13) starting parallel lapw0 at lun. sept. 29 16:08:13 

CEST 2014
 .machine0 : 4 processors
Child id1 SIGSEGV
Child id2 SIGSEGV
Child id3 SIGSEGV
Child id0 SIGSEGV
**lapw0 crashed!
0.029u 0.036s 0:50.91 0.0%0+0k 5248+104io 17pf+0w
error: command/eos3/p1229/remir/INSTALLATION_WIEN/14.1/lapw0para -up -c 
lapw0.deffailed

stop error


w2k_dispatch_signal(): received: Segmentation fault
w2k_dispatch_signal(): received: Segmentation fault
Child with myid of1has an error
'Unknown' - SIGSEGV
Child id1 SIGSEGV
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 1
**lapw0 crashed!
cat: No match.0.027u 0.034s 1:33.13 0.0%0+0k 5200+96io 16pf+0w
error: command/eos3/p1229/remir/INSTALLATION_WIEN/14.1/lapw0para -up -c 
lapw0.deffailed



2) lapw2 also crashes sometimes when MPI parallelization is used. 
Sequential or k-parallel runs are ok, and contrary to lapw0, the error 
does not occur for all cases (we did not notice any problem when testing 
the mpi benchmark with lapw1):


w2k_dispatch_signal(): received: Segmentation fault application called 
MPI_Abort(MPI_COMM_WORLD, 768) - process 0


Our system is a Bullx DLC Cluster (LInux Red Hat+ Intel Ivybridge) and 
we use the compiler(+mkl) intel/14.0.2.144 and intelmpi/4.1.3.049.

The batch Scheduler is SLURM.

Here are the settings and the options we used for the installation :

OPTIONS:
current:FOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML -traceback
current:FPOPT:-FR -mp1 -w -prec_div -pc80 -pad -ip -DINTEL_VML 
-Dmkl_scalapack -traceback -xAVX
current:FFTW_OPT:-DFFTW3 
-I/users/p1229/remir/INSTALLATION_WIEN/fftw-3.3.4-Intel_MPI/include
current:FFTW_LIBS:-lfftw3_mpi -lfftw3 
-L/users/p1229/remir/INSTALLATION_WIEN/fftw-3.3.4-Intel_MPI/lib

current:LDFLAGS:$(FOPT) -L$(MKLROOT)/lib/$(MKL_TARGET_ARCH) -pthread
current:DPARALLEL:'-DParallel'
current:R_LIBS:-lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread 
-lmkl_core -openmp -lpthread
current:RP_LIBS:-mkl=cluster -lfftw3_mpi -lfftw3 
-L/users/p1229/remir/INSTALLATION_WIEN/fftw-3.3.4-Intel_MPI/lib

current:MPIRUN:mpirun -np _NP_ _EXEC_
current:MKL_TARGET_ARCH:intel64

PARALLEL_OPTIONS:
setenv TASKSET "no"
setenv USE_REMOTE 1
setenv MPI_REMOTE 1
setenv WIEN_GRANULARITY 1
setenv WIEN_MPIRUN "mpirun -np _NP_ _EXEC_"

Any suggestions which could help us to solve this problem would be 
greatly appreciated.


Best regards,
Rémi Arras
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Re: [Wien] Problem when running MPI-parallel version of LAPW0

Re: [Wien] Problem when running MPI-parallel version of LAPW0

Re: [Wien] Problem when running MPI-parallel version of LAPW0

Re: [Wien] Problem when running MPI-parallel version of LAPW0

Re: [Wien] Problem when running MPI-parallel version of LAPW0

[Wien] Problem when running MPI-parallel version of LAPW0

6 matches

Site Navigation

Mail list logo

Footer information