A guess: you are using the wrong version of blacs. You need a
-lmkl_blacs_intelmpi_XX
where XX is the one for your system. I have seen this give the same error.
Use http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/
For reference, with openmpi it is _openmpi_ instead of _intelmpi_, and
similarly for sgi.
2012/1/22 Paul Fons paul-fons at aist.go.jp:
Hi,
I have Wien2K running on a cluster of linux boxes each with 32 cores and
connected by 10Gb ethernet. ?I have compiled Wien2K by the 3.174 version of
Wien2K (I learned the hard way that bugs in the newer versions of the Intel
compiler lead to crashes in Wien2K). ?I have also installed Intel's MPI.
?First, the single process Wien2K, let's say for the TiC case, works fine.
?It also works fine when I use a .machines file like
granulaity:1
localhost:1
localhost:1
? ?(24 times).
This file leads to parallel execution without error. ?I can vary the number
of processes by increasing the number of localhost:1 and the number of
localhost:1 lines in the file and still everything works fine. ?When I try
to use mpi to communicate with one process, it works as well.
1:localhost:1
lstarting parallel lapw1 at Mon Jan 23 06:49:16 JST 2012
- starting parallel LAPW1 jobs at Mon Jan 23 06:49:16 JST 2012
running LAPW1 in parallel mode (using .machines)
1 number_of_parallel_jobs
[1] 22417
LAPW1 END
[1] + Done ( cd $PWD; $t $exe ${def}_$loop.def; rm
-f .lock_$lockfile[$p] ) .time1_$loop
localhost(111) 179.004u 4.635s 0:32.73 561.0%0+0k 0+26392io 0pf+0w
Summary of lapw1para:
localhost k=111 user=179.004wallclock=32.73
179.167u 4.791s 0:35.61 516.5%0+0k 0+26624io 0pf+0w
Changing the machine file to use more than one process ?(the same form of
error occurs for more than 2)
1:localhost:2
lead to a run time error in the MPI subsystem.
starting parallel lapw1 at Mon Jan 23 06:51:04 JST 2012
- starting parallel LAPW1 jobs at Mon Jan 23 06:51:04 JST 2012
running LAPW1 in parallel mode (using .machines)
1 number_of_parallel_jobs
[1] 22673
Fatal error in MPI_Comm_size: Invalid communicator, error stack:
MPI_Comm_size(123): MPI_Comm_size(comm=0x5b, size=0x7ed20c) failed
MPI_Comm_size(76).: Invalid communicator
Fatal error in MPI_Comm_size: Invalid communicator, error stack:
MPI_Comm_size(123): MPI_Comm_size(comm=0x5b, size=0x7ed20c) failed
MPI_Comm_size(76).: Invalid communicator
[1] + Done ( cd $PWD; $t $ttt; rm -f
.lock_$lockfile[$p] ) .time1_$loop
localhost localhost(111) APPLICATION TERMINATED WITH THE EXIT STRING:
Hangup (signal 1)
0.037u 0.036s 0:00.06 100.0% 0+0k 0+0io 0pf+0w
TiC.scf1_1: No such file or directory.
Summary of lapw1para:
localhost k=0 user=111wallclock=0
0.105u 0.168s 0:03.21 8.0%0+0k 0+216io 0pf+0w
I have properly sourced the appropriate runtime environment for the Intel
system. ?For example, compiling (mpiifort) and running the f90 mpi test
program from intel produces:
mpirun -np 32 /home/paulfons/mpitest/testf90
?Hello world: rank ? ? ? ? ? ?0 ?of ? ? ? ? ? 32 ?running on
?asccmp177
?Hello world: rank ? ? ? ? ? ?1 ?of ? ? ? ? ? 32 ?running on ? ?(32 times)
Does anyone have any suggestions as to what to try next? ?I am not sure how
to debug things from here. ?I have about 512 nodes that I can use for larger
calculations that only can be accessed by mpi (the ssh setup works fine as
well by the way). ?It would be great to figure out what is wrong.
Thanks.
Dr. Paul Fons
Functional Nano-phase-change Research Team
Team Leader
Nanodevice Innovation Research Center (NIRC)
National Institute for Advanced Industrial Science Technology
METI
AIST Central 4, Higashi 1-1-1
Tsukuba, Ibaraki JAPAN 305-8568
tel. +81-298-61-5636
fax. +81-298-61-2939
email:?paul-fons at aist.go.jp
The following lines are in a Japanese font
?305-8562 ? 1-1-1
?
??
___
Wien mailing list
Wien at zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
--
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
Research is to see what everybody else has seen, and to think what
nobody else has thought
Albert Szent-Gyorgi