unning on different cores, and less
likely to interfere with each other.
-Tom
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Andy Witzig
Sent: Monday, February 06, 2017 8:25 AM
To: Open MPI Users <users@lists.open-mpi.org>
Subject: Re: [OMPI users] Performance Issues on SMP
here the MPI ranks only get distributed to a
> small number of processors, which you see if they all have small
> percentages of CPU.
>
> Just flailing in the dark...
>
> -- bennet
>
> On Wed, Feb 1, 2017 at 6:36 PM, Andy Witzig <cap1...@icloud.com
> <javascri
bennet
On Wed, Feb 1, 2017 at 6:36 PM, Andy Witzig <cap1...@icloud.com> wrote:
> Thank for the idea. I did the test and only get a single host.
>
> Thanks,
> Andy
>
> On Feb 1, 2017, at 5:04 PM, r...@open-mpi.org wrote:
>
> Simple test: replace your executable with “ho
Feb 1, 2017, at 2:46 PM, Andy Witzig <cap1...@icloud.com
> <mailto:cap1...@icloud.com>> wrote:
>
> Honestly, I’m not exactly sure what scheme is being used. I am using the
> default template from Penguin Computing for job submission. It looks like:
>
> #PBS -S
gt; hosts come out on your cluster, then you know why the performance is
> different.
>
> On Feb 1, 2017, at 2:46 PM, Andy Witzig <cap1...@icloud.com> wrote:
>
> Honestly, I’m not exactly sure what scheme is being used. I am using the
> default template from Penguin Co
using on the cluster. For some codes we see
noticeable differences using fillup vs round robin, not 4x though. Fillup is
more shared memory use while round robin uses more infinniband.
Doug
> On Feb 1, 2017, at 3:25 PM, Andy Witzig <cap1...@icloud.com> wrote:
>
> Hi Tom,
>
> The
formance Issues on SMP Workstation
>
> By the way, the workstation has a total of 36 cores / 72 threads, so using
> mpirun
> -np 20 is possible (and should be equivalent) on both platforms.
>
> Thanks,
> cap79
>
>> On Feb 1, 2017, at 2:52 PM, Andy Witzig <cap1...@icloud.co
Hi all,
I’m testing my application on a SMP workstation (dual Intel Xeon E5-2697 V4 2.3
GHz Intel Broadwell (boost 2.8-3.1GHz) processors 128GB RAM) and am seeing a 4x
performance drop compared to a cluster system with 2.6GHz Intel Haswell with 20
cores / node and 128GB RAM/node. Both