Re: [Wien] [SPAM?] Re: k point parallel calculations

2015-02-24 Thread Gavin Abo

In addition:

Did you setup and run the calculation from scratch for each WIEN2k 
version in its own directory?  It is usually not a good idea to mix 
initialization and run of a calculation in a single directory with 
different WIEN2k versions.  In WIEN2k 12/13, I believe the exchange and 
correlation potential was specified by a number is case.in0.  However, 
words (characters) are now being used instead of a number in the 14 version.


Good, it looks like you have checked the case.dayfile and *.error 
files.  However, it looks like you have one of those cases where they 
don't provide anything too useful.  The other thing to check would be 
the terminal output.  Since it is failing in lapw1, you would want to 
run just that step (x lapw1 -p) in a terminal and see what it gives you 
as output in the terminal.  If you are not allowed to run "x lapw1 -p" 
directly in a terminal and are required to use a queue system like qsub, 
the terminal output is usually written instead to a user named file (or 
sometimes two files, an output and error file) instead of the terminal [ 
http://stackoverflow.com/questions/9096959/how-to-specify-error-log-file-and-output-file-in-qsub 
].  So, you should check if you haven't already done so the standard 
output and error file(s).


On 2/24/2015 11:39 AM, Laurence Marks wrote:


I am not certain, but it looks like the mixer error for 12/13 is due 
to a format error in your case.in0. This may be incorrect, please look 
at what is at line 168 of your mixer.F.


In most cases where I have seen errors such as this it is because 
something has gone wrong earlier. Check with "cat *.error" as all 
theses files should be empty. Check that your case.clmval and 
case.clmcor are not empty and do not contain NAN. Look at the end of 
the case.output* files to check that the programs really worked.


___
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 
MURI4D.numis.northwestern.edu 
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to think what 
nobody else has thought"

Albert Szent-Gyorgi

On Feb 24, 2015 12:19 PM, "Priyanka Seth" 
> wrote:


Hello all,

I have been trying to run some k-point parallel calculations for some
large structures and have been having problems for versions 12, 13 and
14 on an ifort compilation. In all cases, I am running on the same
number of cores as k vectors. Note that calculations begun from
the same
input and run on a single core calculation run without any problems.

v12/v13
=

This is the output for versions 12 and 13 (I've removed the
node-dependent lines):

LAPW0 END
LAPW1 END
LAPW2 - FERMI; weighs written
LAPW2 END
SUMPARA END
CORE  END
forrtl: severe (59): list-directed I/O syntax error, unit -5, file
Internal List-Directed Read
Image  PCRoutineLine Source
mixer  0051693D  Unknown  Unknown Unknown
mixer  00515445  Unknown  Unknown Unknown
mixer  004BC9E0  Unknown  Unknown Unknown
mixer  0046F4BA  Unknown  Unknown Unknown
mixer  0046ECB0  Unknown  Unknown Unknown
mixer  00492B76  Unknown  Unknown Unknown
mixer  0049043B  Unknown  Unknown Unknown
mixer  00407E7E  MAIN__ 168 mixer.F
mixer  0040414C  Unknown  Unknown Unknown
libc.so.6  0037C241D994  Unknown  Unknown Unknown
mixer  00403FC9  Unknown  Unknown Unknown

 >   stop error

Looking at the error files, I have "Error in MIXER" in both versions.

The dayfile ends as follows:
1.884u 0.844s 0:09.73 27.9%0+0k 0+0io 8pf+0w
 >   lcore(09:33:51) 0.046u 0.007s 0:00.14 28.5% 0+0k 0+0io 7pf+0w
 >   mixer(09:33:51) 0.000u 0.005s 0:00.04 0.0%0+0k 0+0io
8pf+0w
error: command   /home/pseth/SOURCES/WIEN2K_v13/mixer mixer.def failed

 >   stop error


v14
===

I get to the second cycle, but then the calculation crashes with
"Error
in LAPW1" in lapw1_*.error:

  LAPW2 END
  SUMPARA END
  CORE  END
  MIXER END
ec cc and fc_conv 0 0 1
in cycle 2ETEST: 0   CTEST: 0
  LAPW0 END

There is nothing obviously wrong looking at the case.scf1_* files
or at
the dayfile which ends like this:

 >   lapw1  -p   (09:37:40) starting parallel lapw1 at Tue Feb
10 09:37:40 CET 2015
->  starting parallel LAPW1 jobs at Tue Feb 10 09:37:40 CET 2015
running LAPW1 in parallel mode (using .machines)
24 number_of_parallel_jobs
[1] 30405
[2] 30437
[3] 30471
[4] 30507
[5

[Wien] [SPAM?] Re: k point parallel calculations

2015-02-24 Thread Laurence Marks
I am not certain, but it looks like the mixer error for 12/13 is due to a
format error in your case.in0. This may be incorrect, please look at what
is at line 168 of your mixer.F.

In most cases where I have seen errors such as this it is because something
has gone wrong earlier. Check with "cat *.error" as all theses files should
be empty. Check that your case.clmval and case.clmcor are not empty and do
not contain NAN. Look at the end of the case.output* files to check that
the programs really worked.

___
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu
MURI4D.numis.northwestern.edu
Co-Editor, Acta Cryst A
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
On Feb 24, 2015 12:19 PM, "Priyanka Seth" 
wrote:

> Hello all,
>
> I have been trying to run some k-point parallel calculations for some
> large structures and have been having problems for versions 12, 13 and
> 14 on an ifort compilation. In all cases, I am running on the same
> number of cores as k vectors. Note that calculations begun from the same
> input and run on a single core calculation run without any problems.
>
> v12/v13
> =
>
> This is the output for versions 12 and 13 (I've removed the
> node-dependent lines):
>
> LAPW0 END
> LAPW1 END
> LAPW2 - FERMI; weighs written
> LAPW2 END
> SUMPARA END
> CORE  END
> forrtl: severe (59): list-directed I/O syntax error, unit -5, file
> Internal List-Directed Read
> Image  PCRoutineLine Source
> mixer  0051693D  Unknown   Unknown Unknown
> mixer  00515445  Unknown   Unknown Unknown
> mixer  004BC9E0  Unknown   Unknown Unknown
> mixer  0046F4BA  Unknown   Unknown Unknown
> mixer  0046ECB0  Unknown   Unknown Unknown
> mixer  00492B76  Unknown   Unknown Unknown
> mixer  0049043B  Unknown   Unknown Unknown
> mixer  00407E7E  MAIN__168 mixer.F
> mixer  0040414C  Unknown   Unknown Unknown
> libc.so.6  0037C241D994  Unknown   Unknown Unknown
> mixer  00403FC9  Unknown   Unknown Unknown
>
>  >   stop error
>
> Looking at the error files, I have "Error in MIXER" in both versions.
>
> The dayfile ends as follows:
> 1.884u 0.844s 0:09.73 27.9%0+0k 0+0io 8pf+0w
>  >   lcore(09:33:51) 0.046u 0.007s 0:00.14 28.5%0+0k 0+0io 7pf+0w
>  >   mixer(09:33:51) 0.000u 0.005s 0:00.04 0.0%0+0k 0+0io 8pf+0w
> error: command   /home/pseth/SOURCES/WIEN2K_v13/mixer mixer.def failed
>
>  >   stop error
>
>
> v14
> ===
>
> I get to the second cycle, but then the calculation crashes with "Error
> in LAPW1" in lapw1_*.error:
>
>   LAPW2 END
>   SUMPARA END
>   CORE  END
>   MIXER END
> ec cc and fc_conv 0 0 1
> in cycle 2ETEST: 0   CTEST: 0
>   LAPW0 END
>
> There is nothing obviously wrong looking at the case.scf1_* files or at
> the dayfile which ends like this:
>
>  >   lapw1  -p   (09:37:40) starting parallel lapw1 at Tue Feb
> 10 09:37:40 CET 2015
> ->  starting parallel LAPW1 jobs at Tue Feb 10 09:37:40 CET 2015
> running LAPW1 in parallel mode (using .machines)
> 24 number_of_parallel_jobs
> [1] 30405
> [2] 30437
> [3] 30471
> [4] 30507
> [5] 30559
> [6] 30606
> [7] 30653
> [8] 30717
> [9] 30809
> [10] 30916
> [11] 31000
> [12] 31070
> [13] 31192
> [14] 31329
> [15] 31428
> [16] 31504
> [17] 31664
> [18] 31788
> [19] 31871
> [20] 31900
> [21] 31928
> [22] 31956
> [23] 31982
> [24] 32010
> [5]Done  ( ( $remote $machine[$p]  ...
>
>
> I understand that this is not much information to go on, but I don't
> really know where else to look! Has anyone had similar issues? What else
> would help in diagnosing the problem?
>
> Many thanks,
> Priyanka
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html