[Wien] SIGSEGV fault error with mBJ

2013-03-19 Thread Jameson Maibam
Dear?support
?I tried to calculate the TiC simply for test. The scf cycle completes without 
any error. While the mBJ encounters the following type of error
LAPW0 END
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image  PCRoutineLineSource 
lapw0  0040519B  c3fft_1_  119  
fftpack_helpers.f
lapw0  00415128  fftpack_mp_c3fft_ 397  
fft_modules.F
lapw0  0048B865  vresp_106  vresp.F
lapw0  004A239D  xcpot3_   147  xcpot3.F
lapw0  0046664E  MAIN__   1935  lapw0.F
lapw0  004039BC  Unknown   Unknown  Unknown
libc.so.6  003D1C01EC5D  Unknown   Unknown  Unknown
lapw0  004038B9  Unknown   Unknown  Unknown
>   stop error
My computer is i3 hp desktop. I used intel fortran composer xe 
(l_fcompxe_2013.1.117.tgz) and wien2k 12. And my operating system is centos6.
Help required.
Thanks
Yours sincerely
Jameson Maibam
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130319/b15e2402/attachment.htm>


[Wien] IBM AIX error

2013-03-19 Thread Laurence Marks
N.B., unless Peter can do the essl coversions, I can only add to the mixer
which will be in the next release (which is better than the current one).

---
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
 On Mar 19, 2013 8:32 PM, "Laurence Marks"  wrote:

> H, this is tricky.  Based upon the links below it looks like essl uses
> non-standard lapack versions.
>
> http://www.cpmd.org:81/pipermail/cpmd-list/2006-December/003584.html
> http://cms.mpi.univie.ac.at/vasp-forum/forum_viewtopic.php?2.45
>
> To handle this, I see two options:
> a) Someone with access to essl works (i can help) to add "#ifdef essl" to
> the mixer routines. Since I have no access to aix/essl I cannot do this.
> b) You, and perhaps others switch to standard lapack for the mixer.
>
> I believe essl should conform to the published standard.
>
> N.B., there may be a problem if essl decides to do its own error handling
> with, for instance, eigenvalues of singular matrices. These are supposed to
> fail and if essl crashes out the mixer will fail.
>
> N.N.B. In emergency you can try regressing to MSEC1 although this is not
> as good as MSR1 & MSR1a. This will let you know if the other codes are
> working.
>
> ---
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> www.numis.northwestern.edu 1-847-491-3996
> "Research is to see what everybody else has seen, and to think what nobody
> else has thought"
> Albert Szent-Gyorgi
>  On Mar 19, 2013 7:15 PM, "Oliver Albertini"  wrote:
>
>>  Dear WIEN2k users,
>>
>>  I recently compiled 12.1 on AIX (v 6.1) pwr6. Like Luis, I also had to
>> make some changes to SRC's in order to finish the compilation. These were
>> mostly issues with xlf like syntax. 9.2 was the most recent version before
>> this.
>>
>>  To check the program, ran NiO 2x2x2 supercell.
>> init_lapw went well, and upon running runsp_lapw, got the following
>> output:
>>
>>  # runsp_lapw
>> hup: Command not found.
>> STOP  LAPW0 END
>> STOP  LAPW1 END
>> STOP  LAPW1 END
>> STOP  LAPW2 END
>> syntax error on line 1 stdin
>> STOP  LAPW2 END
>> syntax error on line 1 stdin
>> STOP  CORE  END
>> STOP  CORE  END
>> STOP  MIXER END
>> Sending nohup output to nohup.out.
>> hup: Command not found.
>> STOP  LAPW0 END
>> STOP  LAPW1 END
>> STOP  LAPW1 END
>> STOP  LAPW2 END
>> syntax error on line 1 stdin
>> STOP  LAPW2 END
>> syntax error on line 1 stdin
>> STOP  CORE  END
>> STOP  CORE  END
>> STOP  MIXER END
>> Sending nohup output to nohup.out.
>> hup: Command not found.
>> STOP  LAPW0 END
>> STOP  LAPW1 END
>> STOP  LAPW1 END
>> STOP  LAPW2 END
>> syntax error on line 1 stdin
>> STOP  LAPW2 END
>> syntax error on line 1 stdin
>> STOP  CORE  END
>> STOP  CORE  END
>> STOP 1
>>
>>  >   stop error
>>
>>
>>  I ran a few more times with '-NI' and got a few more cycles out. The
>> energies are reasonable in comparison with other machines. in mixer.error,
>> the following was printed:
>>
>>  Error in MIXER
>>
>>  Also , the NiO.output2up/dn files have the line 'no read error', and
>> NiO.outputm says the following:
>>
>>  DGEEV : 2538-2099
>> End of input argument error reporting. For more information, refer to
>> Engineering and Scientific Subroutine Library Guide and Reference
>> (SA22-7904).
>>
>>  DGEEV : 2538-2604
>> Execution terminating due to error count for error number 2099.
>>
>>  Finally, the dayfile reveals the following error:
>>
>>  error: command   /usr/bin/WIEN2k/12.1/mixer mixer.def   failed
>>
>>  mixer was the last program that I compiled, and I had to install a
>> 64-bit version of LAPACK to make this work, since the routines dggglm and
>> dgelsy were coming back as undefined symbols.
>>
>>  I look forward to hearing suggestions.
>>
>>  Sincerely,
>>
>> Oliver Albertini
>>
>
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130319/a9fbac9f/attachment.htm>


[Wien] IBM AIX error

2013-03-19 Thread Laurence Marks
H, this is tricky.  Based upon the links below it looks like essl uses
non-standard lapack versions.

http://www.cpmd.org:81/pipermail/cpmd-list/2006-December/003584.html
http://cms.mpi.univie.ac.at/vasp-forum/forum_viewtopic.php?2.45

To handle this, I see two options:
a) Someone with access to essl works (i can help) to add "#ifdef essl" to
the mixer routines. Since I have no access to aix/essl I cannot do this.
b) You, and perhaps others switch to standard lapack for the mixer.

I believe essl should conform to the published standard.

N.B., there may be a problem if essl decides to do its own error handling
with, for instance, eigenvalues of singular matrices. These are supposed to
fail and if essl crashes out the mixer will fail.

N.N.B. In emergency you can try regressing to MSEC1 although this is not as
good as MSR1 & MSR1a. This will let you know if the other codes are working.

---
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what nobody
else has thought"
Albert Szent-Gyorgi
 On Mar 19, 2013 7:15 PM, "Oliver Albertini"  wrote:

>  Dear WIEN2k users,
>
>  I recently compiled 12.1 on AIX (v 6.1) pwr6. Like Luis, I also had to
> make some changes to SRC's in order to finish the compilation. These were
> mostly issues with xlf like syntax. 9.2 was the most recent version before
> this.
>
>  To check the program, ran NiO 2x2x2 supercell.
> init_lapw went well, and upon running runsp_lapw, got the following output:
>
>  # runsp_lapw
> hup: Command not found.
> STOP  LAPW0 END
> STOP  LAPW1 END
> STOP  LAPW1 END
> STOP  LAPW2 END
> syntax error on line 1 stdin
> STOP  LAPW2 END
> syntax error on line 1 stdin
> STOP  CORE  END
> STOP  CORE  END
> STOP  MIXER END
> Sending nohup output to nohup.out.
> hup: Command not found.
> STOP  LAPW0 END
> STOP  LAPW1 END
> STOP  LAPW1 END
> STOP  LAPW2 END
> syntax error on line 1 stdin
> STOP  LAPW2 END
> syntax error on line 1 stdin
> STOP  CORE  END
> STOP  CORE  END
> STOP  MIXER END
> Sending nohup output to nohup.out.
> hup: Command not found.
> STOP  LAPW0 END
> STOP  LAPW1 END
> STOP  LAPW1 END
> STOP  LAPW2 END
> syntax error on line 1 stdin
> STOP  LAPW2 END
> syntax error on line 1 stdin
> STOP  CORE  END
> STOP  CORE  END
> STOP 1
>
>  >   stop error
>
>
>  I ran a few more times with '-NI' and got a few more cycles out. The
> energies are reasonable in comparison with other machines. in mixer.error,
> the following was printed:
>
>  Error in MIXER
>
>  Also , the NiO.output2up/dn files have the line 'no read error', and
> NiO.outputm says the following:
>
>  DGEEV : 2538-2099
> End of input argument error reporting. For more information, refer to
> Engineering and Scientific Subroutine Library Guide and Reference
> (SA22-7904).
>
>  DGEEV : 2538-2604
> Execution terminating due to error count for error number 2099.
>
>  Finally, the dayfile reveals the following error:
>
>  error: command   /usr/bin/WIEN2k/12.1/mixer mixer.def   failed
>
>  mixer was the last program that I compiled, and I had to install a
> 64-bit version of LAPACK to make this work, since the routines dggglm and
> dgelsy were coming back as undefined symbols.
>
>  I look forward to hearing suggestions.
>
>  Sincerely,
>
> Oliver Albertini
>
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130319/02cca61b/attachment.htm>


[Wien] IBM AIX error

2013-03-19 Thread Oliver Albertini
Dear WIEN2k users,

I recently compiled 12.1 on AIX (v 6.1) pwr6. Like Luis, I also had to make
some changes to SRC's in order to finish the compilation. These were mostly
issues with xlf like syntax. 9.2 was the most recent version before this.

To check the program, ran NiO 2x2x2 supercell.
init_lapw went well, and upon running runsp_lapw, got the following output:

# runsp_lapw
hup: Command not found.
STOP  LAPW0 END
STOP  LAPW1 END
STOP  LAPW1 END
STOP  LAPW2 END
syntax error on line 1 stdin
STOP  LAPW2 END
syntax error on line 1 stdin
STOP  CORE  END
STOP  CORE  END
STOP  MIXER END
Sending nohup output to nohup.out.
hup: Command not found.
STOP  LAPW0 END
STOP  LAPW1 END
STOP  LAPW1 END
STOP  LAPW2 END
syntax error on line 1 stdin
STOP  LAPW2 END
syntax error on line 1 stdin
STOP  CORE  END
STOP  CORE  END
STOP  MIXER END
Sending nohup output to nohup.out.
hup: Command not found.
STOP  LAPW0 END
STOP  LAPW1 END
STOP  LAPW1 END
STOP  LAPW2 END
syntax error on line 1 stdin
STOP  LAPW2 END
syntax error on line 1 stdin
STOP  CORE  END
STOP  CORE  END
STOP 1

>   stop error


I ran a few more times with '-NI' and got a few more cycles out. The
energies are reasonable in comparison with other machines. in mixer.error,
the following was printed:

Error in MIXER

Also , the NiO.output2up/dn files have the line 'no read error', and
NiO.outputm says the following:

DGEEV : 2538-2099
End of input argument error reporting. For more information, refer to
Engineering and Scientific Subroutine Library Guide and Reference
(SA22-7904).

DGEEV : 2538-2604
Execution terminating due to error count for error number 2099.

Finally, the dayfile reveals the following error:

error: command   /usr/bin/WIEN2k/12.1/mixer mixer.def   failed

mixer was the last program that I compiled, and I had to install a 64-bit
version of LAPACK to make this work, since the routines dggglm and dgelsy
were coming back as undefined symbols.

I look forward to hearing suggestions.

Sincerely,

Oliver Albertini
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130319/c2c008b7/attachment.htm>


[Wien] SIGSEGV fault error with mBJ

2013-03-19 Thread Peter Blaha
Please search the mailing list.

It was mentioned before that you have to fix fftpack (patch was in 
cluded in the mailing list), or you switch to FFTW2/3

On 03/19/2013 02:18 PM, Jameson Maibam wrote:
> Dear support
>   I tried to calculate the TiC simply for test. The scf cycle completes
> without any error. While the mBJ encounters the following type of error
> LAPW0 END
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> Image PC Routine Line Source
> lapw0 0040519B c3fft_1_ 119 fftpack_helpers.f
> lapw0 00415128 fftpack_mp_c3fft_ 397 fft_modules.F
> lapw0 0048B865 vresp_ 106 vresp.F
> lapw0 004A239D xcpot3_ 147 xcpot3.F
> lapw0 0046664E MAIN__ 1935 lapw0.F
> lapw0 004039BC Unknown Unknown Unknown
> libc.so.6 003D1C01EC5D Unknown Unknown Unknown
> lapw0 004038B9 Unknown Unknown Unknown
>>   stop error
> My computer is i3 hp desktop. I used intel fortran composer xe
> (l_fcompxe_2013.1.117.tgz) and wien2k 12. And my operating system is
> centos6.
> Help required.
> Thanks
> Yours sincerely
> Jameson Maibam
>
>
> ___
> Wien mailing list
> Wien at zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>

-- 

   P.Blaha
--
Peter BLAHA, Inst.f. Materials Chemistry, TU Vienna, A-1060 Vienna
Phone: +43-1-58801-165300 FAX: +43-1-58801-165982
Email: blaha at theochem.tuwien.ac.atWWW: 
http://info.tuwien.ac.at/theochem/
--


[Wien] QTL-B message in scf2 after "x lapw2 -qtl"

2013-03-19 Thread Sanae Fujita
Dear Prof.Blaha: Thank you very much for your kind answer to my question
about details.

I must say, about most interested energy range(for me)  0-2Ry from Fermi
level, I could get rid of ghost band by your first advise. But I want to
get rid of as higher range as possible... So please let me continue.



I tried your suggestion in your last mail and met oscillating behavior like
this:

 "After getting rid of atom2 l=1 ghost band, I got atom1 l=1 ghost band
.After getting  rid of atom1 l=1 ghost band, I got atom2 l=0 ghost band.
After getting rid of atom2 l=0 ghost band, I got atom2 l~1 ghost band...".



Is  there  further strategy to get rid of ghost band from 0-3Ry?

Adding APW+lo and LOCAL ORBITAL of l=3,4 is no sense?



Datails are below:

(I had to set emax as 3.5 ,not 2.5. But even when I set emax=3.5, situation
doesn't change.)

At first I changed atom2 l=0's LO energy parameter to 2.5Ry as below.

-

WFFIL  EF= 0.5   (WFFIL, WFPRI, ENFIL, SUPWF)

  7.00   104 (R-MT*K-MAX; MAX L IN WF, V-NMT

  0.304  0  (GLOBAL E-PARAMETER WITH n OTHER CHOICES, global
APW/LAPW)

 00.30  0.000 CONT 1

 0   -5.57  0.001 STOP 1

 10.30  0.000 CONT 1

 1   -3.12  0.001 STOP 1

  0.304  0  (G...)

 0   -1.46  0.002 CONT 1

 00.30  0.000 CONT 1

 10.30  0.000 CONT 1

 12.50  0.000 CONT 1

K-VECTORS FROM UNIT:4   -9.0   2.518   emin/emax/nband #red

-

But after DOS,I got QTL-B value=7.55419 in 2.07915Ry of atom1,l=1.

Next I changed atom2 l=0's LO energy parameter to 3.0Ry or 4.0Ry.

But the result was almost same.



So next, I changed atom1 l=0 energy parameter from (0.3 and -3.12)
to(2.0&-3.12).

After DOS,I got  QTL-B value=3.26254 in 2.39181Ry of atom2 l=0.

I thought this QTL-B value is rather small, so checked help032 file and
found below;



  L= 0   12.72219   9.975 3.26346.972   -14.698-9.046



3.263/12.72=26% is larger than a few percent, so it may not be good.

(I saw )



So next, I changed atom2 l=0 energy parameter from(-1.46&1.3)to(-1.46&2.3).

The in1_st file at this point is as below.

---

WFFIL  EF= 0.5   (WFFIL, WFPRI, ENFIL, SUPWF)

  7.00   104 (R-MT*K-MAX; MAX L IN WF, V-NMT

  0.304  0  (GLOBAL E-PARAMETER WITH n OTHER CHOICES, global
APW/LAPW)

 00.30  0.000 CONT 1

 0   -5.57  0.001 STOP 1

 12.00  0.000 CONT 1

 1   -3.12  0.001 STOP 1

  0.304  0  (GLOBAL...)

 0   -1.46  0.002 CONT 1

 02.30  0.000 CONT 1

 10.30  0.000 CONT 1

 12.50  0.000 CONT 1

K-VECTORS FROM UNIT:4   -9.0   2.518   emin/emax/nband #red

---

After DOS, I got  QTL-B value=2.94021 in 1.22167Ry of atom2 l=1 and found
below in help032 file.

---

  L= 1   47.80363  28.378 2.940 2.635 4.587 2.339

---

2.94/47.80=6.2% is still a few percent but below 10%.

User Guide says " The few percent message (e.g up to 10 %) does not
indicate a ghost band, but can

happen e.g. in narrow d-bands, where the linearization reaches its limits.
In these cases

one can add a local orbital to improve the flexibility of the basis set."

But I have already added local orbital at atom2 l=0.



I changed atom2 l=1 energy parameter from(0.3 &2.5) to (0.3,1.5).

But I got QTL-B value=10.05510 in 2.48523Ry of atom2 l=1.

I think some oscillating behavior occurred.



Best regards,

S.Fujita
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://zeus.theochem.tuwien.ac.at/pipermail/wien/attachments/20130319/8b8b7443/attachment.htm>


[Wien] Systematic slowing down of calculations with time

2013-03-19 Thread Laurence Marks
I was very lucky; the issue is related to cached memory and running

sync; echo 3 > /proc/sys/vm/drop_caches

solved the problem.

(see http://www.hosting.com/support/linux/clear-memory-cache-on-linux-server
& http://www.linuxinsight.com/proc_sys_vm_drop_caches.html )

No idea why this occurred but obviously something (impi, mkl, ...) is
leading to some combination of clean caches, dentries and inodes
sitting in memory and degrading performance.

I will put an appropriate cron task in, others might want to talk to
their sys_admin if they ever see this.

On Tue, Mar 19, 2013 at 8:11 AM, Laurence Marks
 wrote:
> I have a reproducible slowing down of calculations which appears to be
> in lapw1 due to something (memory leak,?) which is going to be hard to
> track down so I welcome suggestions.
>
> I first noticed it when one newish E5-2660 node was systematically
> running at ~1/2 the speed of others, reproducibly. After rebooting it
> went back to running at the same speed as others.
>
> I have now reproduced a systematic slowing down of lapw1 (I cannot see
> anything in lapw2) for a long calculation (-it -noHinv, but I don't
> think this matters). It is shown in the attached with the x axis
> iteration, the y axis time in minutes. (The image may get shuffled to
> a link by the listserver software.) Starting from ~ 7minutes the
> slowdown is approximately 8 seconds/iteration. This is a fairly big
> calculation with a matrix size of 45456 and 835m/core (virtual)
> running on 64 cores. There is no indication that this is
> communications related, the slowdown is in CPU and WALL remains very
> close to this.
>
> Obviously recompiling with debug on is not going to be a viable
> approach. Also a scatter debug strategy, for instance trying to add
> calls to release memory from mkl calls is going to be very painful as
> we are talking about ~1 day to test. Ideal is innovative ideas to
> trace down why it has gone slow.
>
> Ideas?
>
> For reference, I am using composer_xe_2013.2.146 and Intel impi. I
> don't see this on older E5410 nodes but I have not run enough
> iterations to notice.
>
> N.B., others might want to look in long recent runs to see if they
> also have evidence for this.
>
> --
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> www.numis.northwestern.edu 1-847-491-3996
> "Research is to see what everybody else has seen, and to think what
> nobody else has thought"
> Albert Szent-Gyorgi



-- 
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what
nobody else has thought"
Albert Szent-Gyorgi


[Wien] Systematic slowing down of calculations with time

2013-03-19 Thread Laurence Marks
Minor correction, x-axis is iteration*4

On Tue, Mar 19, 2013 at 8:11 AM, Laurence Marks
 wrote:
> I have a reproducible slowing down of calculations which appears to be
> in lapw1 due to something (memory leak,?) which is going to be hard to
> track down so I welcome suggestions.
>
> I first noticed it when one newish E5-2660 node was systematically
> running at ~1/2 the speed of others, reproducibly. After rebooting it
> went back to running at the same speed as others.
>
> I have now reproduced a systematic slowing down of lapw1 (I cannot see
> anything in lapw2) for a long calculation (-it -noHinv, but I don't
> think this matters). It is shown in the attached with the x axis
> iteration, the y axis time in minutes. (The image may get shuffled to
> a link by the listserver software.) Starting from ~ 7minutes the
> slowdown is approximately 8 seconds/iteration. This is a fairly big
> calculation with a matrix size of 45456 and 835m/core (virtual)
> running on 64 cores. There is no indication that this is
> communications related, the slowdown is in CPU and WALL remains very
> close to this.
>
> Obviously recompiling with debug on is not going to be a viable
> approach. Also a scatter debug strategy, for instance trying to add
> calls to release memory from mkl calls is going to be very painful as
> we are talking about ~1 day to test. Ideal is innovative ideas to
> trace down why it has gone slow.
>
> Ideas?
>
> For reference, I am using composer_xe_2013.2.146 and Intel impi. I
> don't see this on older E5410 nodes but I have not run enough
> iterations to notice.
>
> N.B., others might want to look in long recent runs to see if they
> also have evidence for this.
>
> --
> Professor Laurence Marks
> Department of Materials Science and Engineering
> Northwestern University
> www.numis.northwestern.edu 1-847-491-3996
> "Research is to see what everybody else has seen, and to think what
> nobody else has thought"
> Albert Szent-Gyorgi



-- 
Professor Laurence Marks
Department of Materials Science and Engineering
Northwestern University
www.numis.northwestern.edu 1-847-491-3996
"Research is to see what everybody else has seen, and to think what
nobody else has thought"
Albert Szent-Gyorgi


[Wien] QTL-B message in scf2 after "x lapw2 -qtl"

2013-03-19 Thread Peter Blaha
Getting rid of those qtl-B is an iterative process. However, I do not 
understand some of your
"reactions":

> "After getting rid of atom2 l=1 ghost band, I got atom1 l=1 ghost band .After 
> getting rid of atom1 l=1 ghost band, I got atom2 l=0 ghost band. After 
> getting rid of atom2
> l=0 ghost band, I got atom2 l~1 ghost band...".

> Adding APW+lo and LOCAL ORBITAL of l=3,4 is no sense?

No, this does not make sense.

> At first I changed atom2 l=0's LO energy parameter to 2.5Ry as below.
>
> But after DOS,I got QTL-B value=7.55419 in 2.07915Ry of atom1,l=1.
>
> Next I changed atom2 l=0's LO energy parameter to 3.0Ry or 4.0Ry.
>
> But the result was almost same.

You got a qtl-b on atom 1, l=1.  So you need to modify atoms 1, l=1 
Energyparameter,
not atoms 2, l=0   ?? !!

Always modify the energy parameters of the atom and l, where qtl-bs occur.
And small values like 2.x for states at high energy are probably ok 

> The in1_st file at this point is as below.

Why case.in1_st ??? It must be case.in1 


-- 
-
Peter Blaha
Inst. Materials Chemistry, TU Vienna
Getreidemarkt 9, A-1060 Vienna, Austria
Tel: +43-1-5880115671
Fax: +43-1-5880115698
email: pblaha at theochem.tuwien.ac.at
-