And also answering my original problem, setting H_MEMORYLOCKED now has
openmpi-1.8.4 working and VASP running. Seems that openmpi-1.6.5 was a
bit more verbose in reporting the error with memlock while the newer
version just crashed with a segfault.
Thank you all for your help
Marlies
On 04/13/2015 02:54 PM, Marlies Hankel wrote:
Answering my own question
setting in qconf -mconf
execd_params H_MEMORYLOCKED=unlimited
does the trick.
Marlies
On 04/13/2015 02:12 PM, Marlies Hankel wrote:
Dear all,
I now have at least a simple hello world mpi program running by using
openmpi-1.6.5 (thanks Reuti). But now I get a new error:
The OpenFabrics (openib) BTL failed to initialize while trying to
allocate some locked memory. This typically can indicate that the
memlock limits are set too low. For most HPC installations, the
memlock limits should be set to "unlimited". The failure occured
here:
Local host: cpu-1-5.local
OMPI source: btl_openib_component.c:1216
Function: ompi_free_list_init_ex_new()
Device: mlx4_0
Memlock limit: 65536
In my normal shell ulimit -l unlimited but when I go through SGE it
is set to 64k.
How do I change this?
Can I set S_MEMORYLOCKED and H_MEMORYLOCKED to unlimited or is there
another way to set these that they are taken as set for the shell by
SGE?
Best wishes
Marlies
On 04/11/2015 11:02 PM, Reuti wrote:
Am 11.04.2015 um 14:02 schrieb Marlies Hankel:
Dear Reuti,
No, I did not use ScaLAPACK for now.
Aha, I asked as I never got the ScLAPACK version of VASP running,
only the traditional parallelization.
We do not have intelMPI and at the moment I needed to get things
going to get our new cluster up and usable.
All our calculations are MPI based, not just VASP, and my own home
grown code does not run either through SGE, so I hope I can find
the problem soon....
Does this happen to a simple mpihello application too?
-- Reuti
Best wishes
Marlies
On 04/11/2015 07:40 PM, Reuti wrote:
Am 11.04.2015 um 03:16 schrieb Marlies Hankel:
Dear all,
Yes, I checked the paths and that looked ok. Also, I made sure
that it finds the right MPI version and vasp path etc.
I do not think the h_vmem is the problem as I do not get any
errors in the queue logs for example. Also, in the end I change
h_vmem to be not consumable and I also asked for a lot and that
made no difference.
I will try and use a 1.6.5 openMPI version and see if that makes
any difference.
Would the network scan cause SGE to abort the job?
No. But there is a delay in startup.
BTW: Are you using ScaLAPACK for VASP?
-- Reuti
I do get some message about finding to IBs but I also get that
when I run interactively (ssh to node not via a qlogin). I have
switched that off to via mca to make sure this was not causing
trouble.
Best wishes
Marlies
On 04/10/2015 08:12 PM, Reuti wrote:
Am 10.04.2015 um 04:51 schrieb Marlies Hankel<[email protected]>:
Dear all,
I have a ROCKS 6.1.1 install and I have also installed the SGE
roll. So the base config was done via the ROCKS install. The
only changes I have made are setting the h_vmem complex to
consumable and setting up a scratch complex. I have also set
the h_vmem for all hosts.
And the VASP job does work without h_vmem? We are using VASP too
and have no problems with any set h_vmem.
I can run single CPU jobs fine and can execute simple things like
mpirun -np 40 hostname
but I cannot run proper MPI programs. I get the following error.
mpirun noticed that process rank 0 with PID 27465 on node
phi-0-3 exited on signal 11 (Segmentation fault).
Are you using the correct `mpiexec` also during execution of a
job, i.e. between the nodes - maybe the interactive login has a
different $PATH set than inside a job script?
And if it's from Open MPI: was the application compiled with the
same version of Open MPI which's `mpiexec` is used later on on
all nodes?
Basically the queues error logs on the head node and the
execution nodes show nothing
(/opt/gridengine/default/spool/../messages), also the .e, .o
and .pe, .po also show nothing. The above error is in the
standard output file of the program. I am trying VASP but have
also tried a home grown MPI code. Both of these have been
running out of the box via SGE for years on our old cluster
(which was not ROCKS). I have tried the supplied orte PE
(programs are compiled with openmpi 1.8.4
The easiest would be to stay with Open MPI 1.6.5 as long as
possible. In the 1.8 series they changed some things which might
hinder a proper use:
- The core binding is enabled by default in Open MPI 1.8. Having
two MPI jobs on a node they may use the same cores and leave
others idle. One can use "--bind-to none" and leave the binding
of SGE in effect (if any). The behavior is different in that
way, as SGE will give a job a set of cores, and the Linux
scheduler is free to move the processes around inside this set.
The native binding in Open MPI is per process (something SGE
can't do of course, as Open MPI opens additional forks after the
initial startup of `orted`. (Sure, the given set of cores by SGE
could be rearranged to give this list to Open MPI).
- Open MPI may scan the network before the actual jobs start to
get all possible routes between the nodes. Depending on the
network setup this may take 1-2 minutes.
-- Reuti
compiled with intel and with --with-sge and --with-verbs) and
have also tried one where I specify catch rsh and startmpi and
stopmpi scripts but it made no difference. It seems as if the
program does not even start. I am not even trying to run over
several nodes yet.
Adding to that is that I can run the program (VASP) perfectly
fine by ssh to a node and just running from the command line.
And also over several nodes via a hostfile. So VASP itself is
working fine.
I had a look at env and made sure ulimits are set OK (need
ulimit -s unlimted for VASP to work) but all looks OK.
Has anyone seen this problem before? Or do you have any
suggestion on what to do to get some info on where it actually
goes wrong?
Thanks in advance
Marlies
--
------------------
Dr. Marlies Hankel
Research Fellow, Theory and Computation Group
Australian Institute for Bioengineering and Nanotechnology
(Bldg 75)
eResearch Analyst, Research Computing Centre and Queensland
Cyber Infrastructure Foundation
The University of Queensland
Qld 4072, Brisbane, Australia
Tel: +61 7 334 63996 | Fax: +61 7 334 63992 | mobile:0404262445
Email:
[email protected] | www.theory-computation.uq.edu.au
Notice: If you receive this e-mail by mistake, please notify me,
and do not make any use of its contents. I do not waive any
privilege, confidentiality or copyright associated with it. Unless
stated otherwise, this e-mail represents only the views of the
Sender and not the views of The University of Queensland.
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
--
ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
Please note change of work hours: Monday, Wednesday and Friday
Dr. Marlies Hankel
Research Fellow
High Performance Computing, Quantum Dynamics& Nanotechnology
Theory and Computational Molecular Sciences Group
Room 229 Australian Institute for Bioengineering and
Nanotechnology (75)
The University of Queensland
Qld 4072, Brisbane
Australia
Tel: +61 (0)7-33463996
Fax: +61 (0)7-334 63992
mobile:+61 (0)404262445
Email: [email protected]
http://web.aibn.uq.edu.au/cbn/
ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
Notice: If you receive this e-mail by mistake, please notify me,
and do
not make any use of its contents. I do not waive any privilege,
confidentiality or copyright associated with it. Unless stated
otherwise, this e-mail represents only the views of the Sender
and not
the views of The University of Queensland.
--
ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
Please note change of work hours: Monday, Wednesday and Friday
Dr. Marlies Hankel
Research Fellow
High Performance Computing, Quantum Dynamics& Nanotechnology
Theory and Computational Molecular Sciences Group
Room 229 Australian Institute for Bioengineering and
Nanotechnology (75)
The University of Queensland
Qld 4072, Brisbane
Australia
Tel: +61 (0)7-33463996
Fax: +61 (0)7-334 63992
mobile:+61 (0)404262445
Email: [email protected]
http://web.aibn.uq.edu.au/cbn/
ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
Notice: If you receive this e-mail by mistake, please notify me,
and do
not make any use of its contents. I do not waive any privilege,
confidentiality or copyright associated with it. Unless stated
otherwise, this e-mail represents only the views of the Sender and not
the views of The University of Queensland.
--
------------------
Dr. Marlies Hankel
Research Fellow, Theory and Computation Group
Australian Institute for Bioengineering and Nanotechnology (Bldg 75)
eResearch Analyst, Research Computing Centre and Queensland Cyber
Infrastructure Foundation
The University of Queensland
Qld 4072, Brisbane, Australia
Tel: +61 7 334 63996 | Fax: +61 7 334 63992 | mobile:0404262445
Email: [email protected] | www.theory-computation.uq.edu.au
Notice: If you receive this e-mail by mistake, please notify me,
and do not make any use of its contents. I do not waive any
privilege, confidentiality or copyright associated with it. Unless
stated otherwise, this e-mail represents only the views of the
Sender and not the views of The University of Queensland.
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users