I have an update and some questions on hybrid calculations on a 96 atom 
cluster.  I am running my initial tests with two 24 core machines connected by 
Infiniband.  I have included 4 k-points using a 2x2x2 MP grid.  My .machines 
file is as below.

lapw0:localhost:12
1:localhost:12
1:localhost:12
1:draco-ib:12
1:draco-ib:12
granularity:1
extrafine:1


 I have done a conventional PBE calculation on the same cluster using the above 
.machines file and the calculation finished without errors in a few hours.  I 
then initialized a hf calculation using lapw_hf_lapw and specified the same 
2x2x2 grid.  I specified 770 bands in my case.inhf as I have 1526 electrons.  
The initialize ran without errors.  I then invoked the scf loop using “run_lapw 
-hf -p” using the same machines file.  The lapw0 and the initial part of the 
scf loop appears to have run without errors, but the calculation stopped on the 
second iteration of the SCF loop.  In particular, the second loop failed due to 
a missing file "/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old”.  
Before I continue, I should add that the WIENSCRATCH environmental variable is 
correctly set and the directories on both machines exist.  I should add that of 
course the regular parallel PBE run ran without errors as well and I assumed it 
used the same scratch directories without error.  The file in question 
“aCGT.vectorhf_old” does not exist in either of the WIENSCRATCH directories nor 
does it exist in the home directory of the calculation.  The two nodes are 
gemini (localhost) and draco-ib (the infini-band connected second node).  The 
contents of the scratch directories on both nodes are listed below as well as 
the files with vector within the files on the project directory.   The current 
calculation only involves four k-points.



The actual run output went as follows:

run_lapw -hf -p -in1new 2
 LAPW0 END
 LAPW0 END
 LAPW1 END
mv: cannot stat `aCGT.vector': No such file or directory
 LAPW1 END
 LAPW1 END
 LAPW1 END
 LAPW1 END
mv: cannot stat `aCGT.vectorhf_old': No such file or directory
 LAPW2 END
mv: cannot stat `aCGT.vector': No such file or directory
LAPW2 - FERMI; weighs written
 LAPW2 END
 LAPW2 END
 LAPW2 END
 LAPW2 END
 SUMPARA END
 CORE  END
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
OPEN FAILED
error with vector files

>   stop error





vector files in the different directories.

On gemini (locahost) working directory

ls -l aCGT*vector*
-rw-rw-r-- 1 matstud matstud 0 Mar  8 19:18 aCGT.vectorhf


On gemini (locahost) WIENSCRATCH

mats...@gemini.a04.aist.go.jp:/usr/local/share/wien2k/Fons/aCGT>ls -l 
$HOME/WIENSCRATCH
total 3592696
-rw-rw-r-- 1 matstud matstud 2943235040 Mar  9 13:31 aCGT.vector
-rw-rw-r-- 1 matstud matstud  367792558 Mar  9 14:11 aCGT.vector_1
-rw-rw-r-- 1 matstud matstud  367877082 Mar  9 14:11 aCGT.vector_2


On draco-ib (remote host) WIENSCRATCH directory

mats...@gemini.a04.aist.go.jp:/usr/local/share/wien2k/Fons/aCGT>ssh draco-ib ls 
-l WIENSCRATCH
total 718780
-rw-r--r-- 1 matstud matstud 367646498 Mar  9 14:04 aCGT.vector_3
-rw-r--r-- 1 matstud matstud 368373958 Mar  9 14:04 aCGT.vector_4




DAYFILE

cat aCGT.dayfile 

Calculating aCGT in /usr/local/share/wien2k/Fons/aCGT
on gemini.a04.aist.go.jp with PID 45216
using WIEN2k_14.2 (Release 15/10/2014) in /home/matstud/Wien2K


    start       (Mon Mar  9 10:27:54 JST 2015) with lapw0 (40/99 to go)

    cycle 1     (Mon Mar  9 10:27:55 JST 2015)  (40/99 to go)

>   lapw0 -grr -p       (10:27:55) starting parallel lapw0 at Mon Mar  9 
> 10:27:55 JST 2015
-------- .machine0 : 12 processors
755.913u 3.546s 1:06.22 1146.8% 0+0k 184+796936io 0pf+0w
>   lapw0 -p    (10:29:01) starting parallel lapw0 at Mon Mar  9 10:29:01 JST 
> 2015
-------- .machine0 : 12 processors
622.223u 2.856s 0:54.57 1145.4% 0+0k 48+203264io 0pf+0w
>   lapw1    -c         (10:29:56) 20873.217u 161.505s 3:01:39.31 192.9%        
> 0+0k 14448+5913840io 0pf+0w
>   lapw1  -p   -c      (13:31:36) starting parallel lapw1 at Mon Mar  9 
> 13:31:36 JST 2015
->  starting parallel LAPW1 jobs at Mon Mar  9 13:31:36 JST 2015
running LAPW1 in parallel mode (using .machines)
4 number_of_parallel_jobs
     localhost localhost localhost localhost localhost localhost localhost 
localhost localhost localhost localhost localhost(1) 27361.552u 625.141s 
39:53.54 1169.2%    0+0k 8+882304io 0pf+0w
     localhost localhost localhost localhost localhost localhost localhost 
localhost localhost localhost localhost localhost(1) 27183.185u 653.051s 
39:50.16 1164.6%    0+0k 0+719488io 0pf+0w
     draco-ib draco-ib draco-ib draco-ib draco-ib draco-ib draco-ib draco-ib 
draco-ib draco-ib draco-ib draco-ib(1) 0.020u 0.024s 33:06.78 0.0% 0+0k 0+0io 
0pf+0w
     draco-ib draco-ib draco-ib draco-ib draco-ib draco-ib draco-ib draco-ib 
draco-ib draco-ib draco-ib draco-ib(1) 0.023u 0.029s 33:15.23 0.0% 0+0k 0+0io 
0pf+0w
   Summary of lapw1para:
   localhost     k=0     user=0  wallclock=0
   draco-ib      k=0     user=0  wallclock=0
54550.790u 1281.369s 39:55.96 2330.2%   0+0k 72+1603312io 0pf+0w
>   lapw2   -c  (14:11:32) 979.641u 57.954s 9:12.71 187.7%      0+0k 
> 1128+253800io 0pf+0w
>   lapw2 -p   -c       (14:20:45) running LAPW2 in parallel mode
      localhost 261.005u 5.646s 0:25.12 1061.4% 0+0k 64+253704io 0pf+0w
      localhost 228.920u 5.488s 0:22.16 1057.7% 0+0k 16+199776io 0pf+0w
      draco-ib 0.033u 0.031s 0:21.96 0.2% 0+0k 8+0io 0pf+0w
      draco-ib 0.032u 0.033s 0:21.80 0.2% 0+0k 8+0io 0pf+0w
   Summary of lapw2para:
   localhost     user=489.925    wallclock=47.28
   draco-ib      user=0.065      wallclock=43.76
505.406u 13.549s 0:57.49 902.6% 0+0k 594800+654112io 0pf+0w
>   lcore       (14:21:43) 4.164u 0.365s 0:06.55 69.0%  0+0k 8+69416io 0pf+0w
>   hf       -p -c      (14:21:50) running HF in parallel mode
      localhost ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted 0.241u 0.956s 0:00.72 165.2% 0+0k 16+8io 0pf+0w
      localhost ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted 0.240u 0.982s 0:00.73 167.1% 0+0k 16+8io 0pf+0w
      draco-ib ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted 0.031u 0.018s 0:00.93 4.3% 0+0k 8+8io 0pf+0w
      draco-ib ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted ERROR IN OPENING UNIT: 11 FILENAME: 
/home/matstud/WIENSCRATCH/aCGT.vectorhf_old STATUS: old FORM:unformatted ERROR 
IN OPENING UNIT: 11 FILENAME: /home/matstud/WIENSCRATCH/aCGT.vectorhf_old 
STATUS: old FORM:unformatted 0.022u 0.025s 0:00.86 4.6% 0+0k 8+8io 0pf+0w
   Summary of hfpara:
   localhost     user=0  wallclock=0
   draco-ib      user=0  wallclock=0
**  HF crashed!
0.755u 2.429s 0:07.75 40.9%     0+0k 96+1352io 0pf+0w
error: command   /home/matstud/Wien2K/hfcpara -c hf.def   failed

>   stop error
_______________________________________________
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html

Reply via email to