Am 11.04.2015 um 14:02 schrieb Marlies Hankel:

> Dear Reuti,
> 
> No, I did not use ScaLAPACK for now.

Aha, I asked as I never got the ScLAPACK version of VASP running, only the 
traditional parallelization.


> We do not have intelMPI and at the moment I needed to get things going to get 
> our new cluster up and usable.
> 
> All our calculations are MPI based, not just VASP, and my own home grown code 
> does not run either through SGE, so I hope I can find the problem soon....

Does this happen to a simple mpihello application too?

-- Reuti


> Best wishes
> 
> Marlies
> 
> On 04/11/2015 07:40 PM, Reuti wrote:
>> Am 11.04.2015 um 03:16 schrieb Marlies Hankel:
>> 
>>> Dear all,
>>> 
>>> Yes, I checked the paths and that looked ok. Also, I made sure that it 
>>> finds the right MPI version and vasp path etc.
>>> 
>>> I do not think the h_vmem is the problem as I do not get any errors in the 
>>> queue logs for example. Also, in the end I change h_vmem to be not 
>>> consumable and I also asked for a lot and that made no difference.
>>> 
>>> I will try and use a 1.6.5 openMPI version and see if that makes any 
>>> difference.
>>> 
>>> Would the network scan cause SGE to abort the job?
>> No. But there is a delay in startup.
>> 
>> BTW: Are you using ScaLAPACK for VASP?
>> 
>> -- Reuti
>> 
>> 
>>> I do get some message about finding to IBs but I also get that when I run 
>>> interactively (ssh to node not via a qlogin). I have switched that off to 
>>> via mca to make sure this was not causing trouble.
>>> 
>>> Best wishes
>>> 
>>> Marlies
>>> 
>>> 
>>> On 04/10/2015 08:12 PM, Reuti wrote:
>>>>> Am 10.04.2015 um 04:51 schrieb Marlies Hankel<[email protected]>:
>>>>> 
>>>>> Dear all,
>>>>> 
>>>>> I have a ROCKS 6.1.1 install and I have also installed the SGE roll. So 
>>>>> the base config was done via the ROCKS install. The only changes I have 
>>>>> made are setting the h_vmem complex to consumable and setting up a 
>>>>> scratch complex. I have also set the h_vmem for all hosts.
>>>> And the VASP job does work without h_vmem? We are using VASP too and have 
>>>> no problems with any set h_vmem.
>>>> 
>>>> 
>>>>> I can run single CPU jobs fine and can execute simple things like
>>>>> 
>>>>> mpirun -np 40 hostname
>>>>> 
>>>>> but I cannot run proper MPI programs. I get the following error.
>>>>> 
>>>>> mpirun noticed that process rank 0 with PID 27465 on node phi-0-3 exited 
>>>>> on signal 11 (Segmentation fault).
>>>> Are you using the correct `mpiexec` also during execution of a job, i.e. 
>>>> between the nodes - maybe the interactive login has a different $PATH set 
>>>> than inside a job script?
>>>> 
>>>> And if it's from Open MPI: was the application compiled with the same 
>>>> version of Open MPI which's `mpiexec` is used later on on all nodes?
>>>> 
>>>> 
>>>>> Basically the queues error logs on the head node and the execution nodes 
>>>>> show nothing (/opt/gridengine/default/spool/../messages), also the .e, .o 
>>>>> and .pe, .po also show nothing. The above error is in the standard output 
>>>>> file of the program. I am trying VASP but have also tried a home grown 
>>>>> MPI code. Both of these have been running out of the box via SGE for 
>>>>> years on our old cluster (which was not ROCKS). I have tried the supplied 
>>>>> orte PE (programs are compiled with openmpi 1.8.4
>>>> The easiest would be to stay with Open MPI 1.6.5 as long as possible. In 
>>>> the 1.8 series they changed some things which might hinder a proper use:
>>>> 
>>>> - The core binding is enabled by default in Open MPI 1.8. Having two MPI 
>>>> jobs on a node they may use the same cores and leave others idle. One can 
>>>> use "--bind-to none" and leave the binding of SGE in effect (if any). The 
>>>> behavior is different in that way, as SGE will give a job a set of cores, 
>>>> and the Linux scheduler is free to move the processes around inside this 
>>>> set. The native binding in Open MPI is per process (something SGE can't do 
>>>> of course, as Open MPI opens additional forks after the initial startup of 
>>>>  `orted`. (Sure, the given set of cores by SGE could be rearranged to give 
>>>> this list to Open MPI).
>>>> 
>>>> - Open MPI may scan the network before the actual jobs start to get all 
>>>> possible routes between the nodes. Depending on the network setup this may 
>>>> take 1-2 minutes.
>>>> 
>>>> -- Reuti
>>>> 
>>>> 
>>>>>  compiled with intel and with --with-sge and --with-verbs) and have also 
>>>>> tried one where I specify catch rsh and startmpi and stopmpi scripts but 
>>>>> it made no difference. It seems as if the program does not even start. I 
>>>>> am not even trying to run over several nodes yet.
>>>>> 
>>>>> Adding to that is that I can run the program (VASP) perfectly fine by ssh 
>>>>> to a node and just running from the command line. And also over several 
>>>>> nodes via a hostfile. So VASP itself is working fine.
>>>>> 
>>>>> I had a look at env and made sure ulimits are set OK (need ulimit -s 
>>>>> unlimted for VASP to work) but all looks OK.
>>>>> 
>>>>> Has anyone seen this problem before? Or do you have any suggestion on 
>>>>> what to do to get some info on where it actually goes wrong?
>>>>> 
>>>>> Thanks in advance
>>>>> 
>>>>> Marlies
>>>>> -- 
>>>>> 
>>>>> ------------------
>>>>> 
>>>>> Dr. Marlies Hankel
>>>>> Research Fellow, Theory and Computation Group
>>>>> Australian Institute for Bioengineering and Nanotechnology (Bldg 75)
>>>>> eResearch Analyst, Research Computing Centre and Queensland Cyber 
>>>>> Infrastructure Foundation
>>>>> The University of Queensland
>>>>> Qld 4072, Brisbane, Australia
>>>>> Tel: +61 7 334 63996 | Fax: +61 7 334 63992 | mobile:0404262445
>>>>> Email:
>>>>> [email protected] | www.theory-computation.uq.edu.au
>>>>> 
>>>>> 
>>>>> 
>>>>> Notice: If you receive this e-mail by mistake, please notify me,
>>>>> and do not make any use of its contents. I do not waive any
>>>>> privilege, confidentiality or copyright associated with it. Unless
>>>>> stated otherwise, this e-mail represents only the views of the
>>>>> Sender and not the views of The University of Queensland.
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> [email protected]
>>>>> https://gridengine.org/mailman/listinfo/users
>>> -- 
>>> 
>>> ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
>>> 
>>> Please note change of work hours: Monday, Wednesday and Friday
>>> 
>>> Dr. Marlies Hankel
>>> Research Fellow
>>> High Performance Computing, Quantum Dynamics&   Nanotechnology
>>> Theory and Computational Molecular Sciences Group
>>> Room 229 Australian Institute for Bioengineering and Nanotechnology  (75)
>>> The University of Queensland
>>> Qld 4072, Brisbane
>>> Australia
>>> Tel: +61 (0)7-33463996
>>> Fax: +61 (0)7-334 63992
>>> mobile:+61 (0)404262445
>>> Email: [email protected]
>>> http://web.aibn.uq.edu.au/cbn/
>>> 
>>> ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
>>> 
>>> Notice: If you receive this e-mail by mistake, please notify me, and do
>>> not make any use of its contents. I do not waive any privilege,
>>> confidentiality or copyright associated with it. Unless stated
>>> otherwise, this e-mail represents only the views of the Sender and not
>>> the views of The University of Queensland.
>>> 
>>> 
>>> 
> 
> -- 
> 
> ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
> 
> Please note change of work hours: Monday, Wednesday and Friday
> 
> Dr. Marlies Hankel
> Research Fellow
> High Performance Computing, Quantum Dynamics&  Nanotechnology
> Theory and Computational Molecular Sciences Group
> Room 229 Australian Institute for Bioengineering and Nanotechnology  (75)
> The University of Queensland
> Qld 4072, Brisbane
> Australia
> Tel: +61 (0)7-33463996
> Fax: +61 (0)7-334 63992
> mobile:+61 (0)404262445
> Email: [email protected]
> http://web.aibn.uq.edu.au/cbn/
> 
> ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
> 
> Notice: If you receive this e-mail by mistake, please notify me, and do
> not make any use of its contents. I do not waive any privilege,
> confidentiality or copyright associated with it. Unless stated
> otherwise, this e-mail represents only the views of the Sender and not
> the views of The University of Queensland.
> 
> 
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to