Hi,

here's another test with openmpi 1.8.3. With 1.8.1, 32GB was detected, now
it is just 16:
% mpirun -np 2 /usr/lib64/openmpi-intel/bin/mpitests-osu_get_bw
--------------------------------------------------------------------------
WARNING: It appears that your OpenFabrics subsystem is configured to only
allow registering part of your physical memory.  This can cause MPI jobs to
run with erratic performance, hang, and/or crash.

This may be caused by your OpenFabrics vendor limiting the amount of
physical memory that can be registered.  You should investigate the
relevant Linux kernel module parameters that control how much physical
memory can be registered, and increase them to allow registering all
physical memory on your machine.

See this Open MPI FAQ item for more information on these Linux kernel module
parameters:

    http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages

  Local host:              pax95
  Registerable memory:     16384 MiB
  Total memory:            49106 MiB

Your MPI job will continue, but may be behave poorly and/or hang.
--------------------------------------------------------------------------
# OSU MPI_Get Bandwidth Test v4.3
# Window creation: MPI_Win_allocate
# Synchronization: MPI_Win_flush
# Size      Bandwidth (MB/s)
1                      28.56
2                      58.74


So it wasn't fixed for RHEL 6.6.

Regards, Götz

On Mon, Dec 8, 2014 at 4:00 PM, Götz Waschk <goetz.was...@gmail.com> wrote:

> Hi,
>
> I had tested 1.8.4rc1 and it wasn't fixed. I can try again though,
> maybe I had made an error.
>
> Regards, Götz Waschk
>
> On Mon, Dec 8, 2014 at 3:17 PM, Joshua Ladd <jladd.m...@gmail.com> wrote:
> > Hi,
> >
> > This should be fixed in OMPI 1.8.3. Is it possible for you to give 1.8.3
> a
> > shot?
> >
> > Best,
> >
> > Josh
> >
> > On Mon, Dec 8, 2014 at 8:43 AM, Götz Waschk <goetz.was...@gmail.com>
> wrote:
> >>
> >> Dear Open-MPI experts,
> >>
> >> I have updated my little cluster from Scientific Linux 6.5 to 6.6,
> >> this included extensive changes in the Infiniband drivers and a newer
> >> openmpi version (1.8.1). Now I'm getting this message on all nodes
> >> with more than 32 GB of RAM:
> >>
> >>
> >> WARNING: It appears that your OpenFabrics subsystem is configured to
> only
> >> allow registering part of your physical memory.  This can cause MPI jobs
> >> to
> >> run with erratic performance, hang, and/or crash.
> >>
> >> This may be caused by your OpenFabrics vendor limiting the amount of
> >> physical memory that can be registered.  You should investigate the
> >> relevant Linux kernel module parameters that control how much physical
> >> memory can be registered, and increase them to allow registering all
> >> physical memory on your machine.
> >>
> >> See this Open MPI FAQ item for more information on these Linux kernel
> >> module
> >> parameters:
> >>
> >>     http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
> >>
> >>   Local host:              pax98
> >>   Registerable memory:     32768 MiB
> >>   Total memory:            49106 MiB
> >>
> >> Your MPI job will continue, but may be behave poorly and/or hang.
> >>
> >>
> >> The issue is similar to the one described in a previous thread about
> >> Ubuntu nodes:
> >> http://www.open-mpi.org/community/lists/users/2014/08/25090.php
> >> But the Infiniband driver is different, the values log_num_mtt and
> >> log_mtts_per_seg both still exist, but they cannot be changed and have
> >> on all configurations the same values:
> >> [pax52] /root # cat /sys/module/mlx4_core/parameters/log_num_mtt
> >> 0
> >> [pax52] /root # cat /sys/module/mlx4_core/parameters/log_mtts_per_seg
> >> 3
> >>
> >> The kernel changelog says that Red Hat has included this commit:
> >> mlx4: Scale size of MTT table with system RAM (Doug Ledford)
> >> so it should be all fine, the buffers scale automatically, however, as
> >> far as I can see, the wrong value calculated by calculate_max_reg() is
> >> used in the code, so I think I cannot simply ignore the warning. Also,
> >> a user has reported a problem with a job, I cannot confirm that this
> >> is the cause.
> >>
> >> My workaround was to simply load the mlx5_core kernel module, as this
> >> is used by calculate_max_reg() to detect OFED 2.0.
> >>
> >> Regards, Götz Waschk
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> Link to this post:
> >> http://www.open-mpi.org/community/lists/users/2014/12/25923.php
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> > http://www.open-mpi.org/community/lists/users/2014/12/25924.php
>
>
>
> --
> AL I:40: Do what thou wilt shall be the whole of the Law.
>



-- 
AL I:40: Do what thou wilt shall be the whole of the Law.

Reply via email to