it crashed with the message "Host key verification failed. "
Seems that your cluster does not allow ssh to an allocated node.(Ask
your sys admin).
In $WIENROOT/WIEN2k_parallel_options there are variables like
USE_REMOTE. If set to zero, ssh is not used and you can run in
parallel, but only on one shared memory node.
In order to use multiple nodes, you need to be able to do passwordless
ssh to the allocated nodes (or any other command substituting ssh).
Herethe content of file
/lustre/ukt/milias/scratch/Wien2k_23.2_job.main.N1.n4.jid3009460/LvO2onQg/.machines:
1:lxbk1177
1:lxbk1177
1:lxbk1177
1:lxbk1177
1:lxbk1177
1:lxbk1177
1:lxbk1177
1:lxbk1177
Job is running on lxbk1177, with 8 cpus allocated;
and this is from log :
running x dstart :
starting parallel dstart at Tue 20 Jun 2023 05:16:21 PM CEST
-------- .machine0 : processors
running dstart in single mode
STOP DSTART ENDS
10.249u 0.322s 0:11.19 94.3% 0+0k 158496+101160io 437pf+0w
running 'run_lapw -p -ec 0.0001 -NI'
STOP LAPW0 END
Host key verification failed.
[1] + Done ( ( $remote $machine[$p] "cd
$PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def ;fixerr
or_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdout1_$loop;
if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw .stdout1_$loop > .
temp1_$loop; grep \% .temp1_$loop >> .time1_$loop; grep -v \%
.temp1_$loop | perl -e "print stderr <STDIN>" )
Host key verification failed.
[1] + Done ( ( $remote $machine[$p] "cd
$PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def
;fixerror_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdo
ut1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw
.stdout1_$loop > .temp1_$loop; grep \% .temp1_$loop >> .time1_$loop;
grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" )
Host key verification failed.
[1] + Done ( ( $remote $machine[$p] "cd
$PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def
;fixerror_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdo
ut1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw
.stdout1_$loop > .temp1_$loop; grep \% .temp1_$loop >> .time1_$loop;
grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" )
Host key verification failed.
[1] + Done ( ( $remote $machine[$p] "cd
$PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def
;fixerror_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdo
ut1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw
.stdout1_$loop > .temp1_$loop; grep \% .temp1_$loop >> .time1_$loop;
grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" )
Host key verification failed.
[1] + Done ( ( $remote $machine[$p] "cd
$PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def
;fixerror_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdo
ut1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw
.stdout1_$loop > .temp1_$loop; grep \% .temp1_$loop >> .time1_$loop;
grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" )
Host key verification failed.
[1] + Done ( ( $remote $machine[$p] "cd
$PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def
;fixerror_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdo
ut1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw
.stdout1_$loop > .temp1_$loop; grep \% .temp1_$loop >> .time1_$loop;
grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" )
Host key verification failed.
[1] + Done ( ( $remote $machine[$p] "cd
$PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def
;fixerror_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdo
ut1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw
.stdout1_$loop > .temp1_$loop; grep \% .temp1_$loop >> .time1_$loop;
grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" )
Host key verification failed.
[1] Done ( ( $remote $machine[$p] "cd
$PWD;$set_OMP_NUM_THREADS;$t $taskset0 $exe ${def}_$loop.def
;fixerror_lapw ${def}_$loop"; rm -f .lock_$lockfile[$p] ) >& .stdo
ut1_$loop; if ( -f .stdout1_$loop ) bashtime2csh.pl_lapw
.stdout1_$loop > .temp1_$loop; grep \% .temp1_$loop >> .time1_$loop;
grep -v \% .temp1_$loop | perl -e "print stderr <STDIN>" )
LvO2onQg.scf1_1: No such file or directory.
grep: *scf1*: No such file or directory
STOP FERMI - Error
cp: cannot stat '.in.tmp': No such file or directory
grep: *scf1*: No such file or directory
> stop error
file ":parallel"
starting parallel lapw1 at Tue 20 Jun 2023 05:17:08 PM CEST
lxbk1177(4) lxbk1177(3) lxbk1177(3) lxbk1177(3)
lxbk1177(3) lxbk1177(3) lxbk1177(3) l
xbk1177(3) Summary of lapw1para:
lxbk1177 k=25 user=0 wallclock=0
<- done at Tue 20 Jun 2023 05:17:14 PM CEST
-----------------------------------------------------------------
-> starting Fermi on lxbk1177 at Tue 20 Jun 2023 05:17:15 PM CEST
** LAPW2 crashed at Tue 20 Jun 2023 05:17:16 PM CEST
** check ERROR FILES!
-----------------------------------------------------------------
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST
at:http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
--
-----------------------------------------------------------------------
Peter Blaha, Inst. f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-158801165300
Email:peter.bl...@tuwien.ac.at
WWW:http://www.imc.tuwien.ac.at WIEN2k:http://www.wien2k.at
-------------------------------------------------------------------------
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html