Re: [Wien] optic program crashed

2016-02-19 Thread Dr. K. C. Bhamu
Sorry to interrupt you again.

I got my scratch DIR. I will try by tomorrow and then report it back with
new updates.
Thank you very much.
Bhamu

On Sat, Feb 20, 2016 at 12:13 AM, Dr. K. C. Bhamu 
wrote:

> Plz see my updates on optic:
>
>> ssh: connect to host nid01855 port 204: Connection refused^M  >>> this
>> error is removed now.
>> [1]  + Exit 255  ( $remote $machine[$p] "cd $PWD;$t
>> $taskset0 $exe ${def}_${loop}.def;rm -f .lock_$lockfile[$p]" ) >>
>> .timeop_$loop
>> ***  OPTIC crashed!   *  remained same
>> 0.840u 1.800s 1:50.21 2.3%  0+0k 82495+1135io 4pf+0w
>> error: command   /usr/common/software/wien2k-ccm/14.2/opticpara
>> optic.def   failed
>> ...
>>
>>
>
>  I copies case.inop file from templets folder and edited according to
> need.
>
> What I observed is that: *case.vector (_n, where n is no of processors)
> file is not there in case DIR.*
> I looked around why it is not in case DIR while it should be generated in
> SCF process and UG suggests that "x optic" need case.vector  file. And came
> to conclusion that to save the lots off space in PWD it may be stored in
> scratch DIR.
> I did not find where the scratch DIR is present.
>
> So it may be a cause for optic crash.
>
> It may be a source of error.
> Please see my updates on issue and help me.
>
> regards
> Bhamu
>
>
>
>
>
>
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] optic program crashed

2016-02-19 Thread Dr. K. C. Bhamu
Plz see my updates on optic:

> ssh: connect to host nid01855 port 204: Connection refused^M  >>> this
> error is removed now.
> [1]  + Exit 255  ( $remote $machine[$p] "cd $PWD;$t
> $taskset0 $exe ${def}_${loop}.def;rm -f .lock_$lockfile[$p]" ) >>
> .timeop_$loop
> ***  OPTIC crashed!   *  remained same
> 0.840u 1.800s 1:50.21 2.3%  0+0k 82495+1135io 4pf+0w
> error: command   /usr/common/software/wien2k-ccm/14.2/opticpara
> optic.def   failed
> ...
>
>

 I copies case.inop file from templets folder and edited according to need.

What I observed is that: *case.vector (_n, where n is no of processors)
file is not there in case DIR.*
I looked around why it is not in case DIR while it should be generated in
SCF process and UG suggests that "x optic" need case.vector  file. And came
to conclusion that to save the lots off space in PWD it may be stored in
scratch DIR.
I did not find where the scratch DIR is present.

So it may be a cause for optic crash.

It may be a source of error.
Please see my updates on issue and help me.

regards
Bhamu
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] optic program crashed

2016-02-18 Thread Gavin Abo
At 
http://www.nersc.gov/users/software/applications/materials-science/wien2k/ 
, the last line in the job file has a star (*) after .machine.  It seems 
to missing in the last line of your job file.  Without it, the old 
.machines is not removed and maybe that prevents the new .machines file 
from being created.


Also, I suggest you talk to the consultant that administrates the 
cluster.  They should be able to tell you better why you are getting the 
error "ssh: connect to host nid01855 port 204: Connection refused".  
They might have a firewall setup to block port 204 or might have 
disabled ssh access to node nid01855.


On 2/18/2016 8:31 AM, Dr. K. C. Bhamu wrote:

Dear Users and developers

I ran my job via slurm job file on  a remote server (2 nodes/64 cores) 
everything went fine upto DOSS but when I ran "x optic -p" through job 
file the below mentioned message occurred:


[1] 1371
ssh: connect to host nid01855 port 204: Connection refused^M
[1]  + Exit 255  ( $remote $machine[$p] "cd 
$PWD;$t $taskset0 $exe ${def}_${loop}.def;rm -f .lock_$lockfile[$p]" ) 
>> .timeop_$loop

[1] 1375
ssh: connect to host nid01855 port 204: Connection refused^M
[1]  + Exit 255  ( $remote $machine[$p] "cd 
$PWD;$t $taskset0 $exe ${def}_${loop}.def;rm -f .lock_$lockfile[$p]" ) 
>> .timeop_$loop

[1] 1379
ssh: connect to host nid01855 port 204: Connection refused^M
[1]  + Exit 255  ( $remote $machine[$p] "cd 
$PWD;$t $taskset0 $exe ${def}_${loop}.def;rm -f .lock_$lockfile[$p]" ) 
>> .timeop_$loop

***  OPTIC crashed!*
0.840u 1.800s 1:50.21 2.3%  0+0k 82495+1135io 4pf+0w
error: command /usr/common/software/wien2k-ccm/14.2/opticpara 
optic.def failed

...

I went through the list and found couples of threads but the error is 
not solved.


Please look for this.

The job was successfully complied on a local two CPU based cluster 
(4GB RAM each)


The job file was:

#!/bin/bash -l
#SBATCH -N 2
#SBATCH -n 64
#SBATCH -t 00:20:00
#SBATCH -p regular
#SBATCH -J orthorhombic_1
#SBATCH --ccm

#module load wien2k-ccm
#generating .machines file for k-point and mpi parallel lapw1/2
let ntasks_per_kgroup=1
gen.machines -m $ntasks_per_kgroup

#need to disable SLURM envs hereafter
unset `env|grep SLURM_|awk -F= '{print $1}'`

#put your Wien2k command here
x optic -p
#remove leftover .machines file
rm -fr .machine
---
*

*
regards
Bhamu*
*
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] optic program crashed

2016-02-18 Thread Gavin Abo

Before you ran the job file, did you create case.inop from the templates:

cp $WIENROOT/SRC_templates/case.inop X.inop

and edit it in a text editor (like nano):

nano X.inop

where X is the name you used for your calculation.

Maybe "x lapw2 -qtl" crashed in an earlier calculation such that QTL did 
not get changed back to TOT [ 
http://www.mail-archive.com/wien%40zeus.theochem.tuwien.ac.at/msg10508.html 
].  In WIEN2k versions older than 14.2, it could also happen due to a 
bug [ 
http://www.mail-archive.com/wien%40zeus.theochem.tuwien.ac.at/msg11165.html 
].


On 2/18/2016 12:56 PM, Dr. K. C. Bhamu wrote:

Two more issue:

I did not find case.inop file while x lapw2 -fermi -p was ran 
successfully.

in case.in2 TOT was automatically changed to QTL.

regards
Bhamu


**

On Thu, Feb 18, 2016 at 10:07 PM, Dr. K. C. Bhamu > wrote:


Yes, x lapw1 -p -band
X lapw2 -band -qtl -p ran successfully.

Regards
Bhamu

On 18-Feb-2016 10:01 pm, "Elias Assmann" mailto:elias.assm...@gmail.com>> wrote:

On 02/18/2016 04:31 PM, Dr. K. C. Bhamu wrote:
> [1] 1371
> ssh: connect to host nid01855 port 204: Connection refused^M

This is the relevant message, I think.

Two guesses what could be happening:

 1. You have not set up the passwordless ssh login that k-parallel
Wien2k requires (try the command ‘ssh-copy-id’).

 2. You are on a cluster that permits ssh only to nodes that are
allocated to your job but you are trying to connect to other
nodes (e.g.
due to a stale ‘.machines’).

Actually, my guess is no. 2 since no. 1 should show a password
prompt
instead.  Do any k-parallel jobs work?


Elias


--
Elias Assmann
Institute of Theoretical and Computational Physics
TU Graz   ⟨https://itp.tugraz.at/⟩

___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] optic program crashed

2016-02-18 Thread Dr. K. C. Bhamu
Two more issue:

I did not find case.inop file while x lapw2 -fermi -p was ran successfully.
in case.in2 TOT was automatically changed to QTL.

regards
Bhamu



On Thu, Feb 18, 2016 at 10:07 PM, Dr. K. C. Bhamu 
wrote:

> Yes, x lapw1 -p -band
> X lapw2 -band -qtl -p ran successfully.
>
> Regards
> Bhamu
> On 18-Feb-2016 10:01 pm, "Elias Assmann"  wrote:
>
>> On 02/18/2016 04:31 PM, Dr. K. C. Bhamu wrote:
>> > [1] 1371
>> > ssh: connect to host nid01855 port 204: Connection refused^M
>>
>> This is the relevant message, I think.
>>
>> Two guesses what could be happening:
>>
>>  1. You have not set up the passwordless ssh login that k-parallel
>> Wien2k requires (try the command ‘ssh-copy-id’).
>>
>>  2. You are on a cluster that permits ssh only to nodes that are
>> allocated to your job but you are trying to connect to other nodes (e.g.
>> due to a stale ‘.machines’).
>>
>> Actually, my guess is no. 2 since no. 1 should show a password prompt
>> instead.  Do any k-parallel jobs work?
>>
>>
>> Elias
>>
>>
>> --
>> Elias Assmann
>> Institute of Theoretical and Computational Physics
>> TU Graz   ⟨https://itp.tugraz.at/⟩
>> ___
>> Wien mailing list
>> Wien@zeus.theochem.tuwien.ac.at
>> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
>> SEARCH the MAILING-LIST at:
>> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>>
>
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] optic program crashed

2016-02-18 Thread Dr. K. C. Bhamu
Yes, x lapw1 -p -band
X lapw2 -band -qtl -p ran successfully.

Regards
Bhamu
On 18-Feb-2016 10:01 pm, "Elias Assmann"  wrote:

> On 02/18/2016 04:31 PM, Dr. K. C. Bhamu wrote:
> > [1] 1371
> > ssh: connect to host nid01855 port 204: Connection refused^M
>
> This is the relevant message, I think.
>
> Two guesses what could be happening:
>
>  1. You have not set up the passwordless ssh login that k-parallel
> Wien2k requires (try the command ‘ssh-copy-id’).
>
>  2. You are on a cluster that permits ssh only to nodes that are
> allocated to your job but you are trying to connect to other nodes (e.g.
> due to a stale ‘.machines’).
>
> Actually, my guess is no. 2 since no. 1 should show a password prompt
> instead.  Do any k-parallel jobs work?
>
>
> Elias
>
>
> --
> Elias Assmann
> Institute of Theoretical and Computational Physics
> TU Graz   ⟨https://itp.tugraz.at/⟩
> ___
> Wien mailing list
> Wien@zeus.theochem.tuwien.ac.at
> http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
> SEARCH the MAILING-LIST at:
> http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
>
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html


Re: [Wien] optic program crashed

2016-02-18 Thread Elias Assmann
On 02/18/2016 04:31 PM, Dr. K. C. Bhamu wrote:
> [1] 1371
> ssh: connect to host nid01855 port 204: Connection refused^M

This is the relevant message, I think.

Two guesses what could be happening:

 1. You have not set up the passwordless ssh login that k-parallel
Wien2k requires (try the command ‘ssh-copy-id’).

 2. You are on a cluster that permits ssh only to nodes that are
allocated to your job but you are trying to connect to other nodes (e.g.
due to a stale ‘.machines’).

Actually, my guess is no. 2 since no. 1 should show a password prompt
instead.  Do any k-parallel jobs work?


Elias


-- 
Elias Assmann
Institute of Theoretical and Computational Physics
TU Graz   ⟨https://itp.tugraz.at/⟩
___
Wien mailing list
Wien@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at:  
http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html