Andy,

What allocation scheme are you using on the cluster. For some codes we see 
noticeable differences using fillup vs round robin, not 4x though. Fillup is 
more shared memory use while round robin uses more infinniband.

Doug
> On Feb 1, 2017, at 3:25 PM, Andy Witzig <cap1...@icloud.com> wrote:
> 
> Hi Tom,
> 
> The cluster uses an Infiniband interconnect.  On the cluster I’m requesting: 
> #PBS -l walltime=24:00:00,nodes=1:ppn=20.  So technically, the run on the 
> cluster should be SMP on the node, since there are 20 cores/node.  On the 
> workstation I’m just using the command: mpirun -np 20 …. I haven’t finished 
> setting Torque/PBS up yet.
> 
> Best regards,
> Andy
> 
> On Feb 1, 2017, at 4:10 PM, Elken, Tom <tom.el...@intel.com> wrote:
> 
> For this case:  " a cluster system with 2.6GHz Intel Haswell with 20 cores / 
> node and 128GB RAM/node.  "
> 
> are you running 5 ranks per node on 4 nodes?
> What interconnect are you using for the cluster?
> 
> -Tom
> 
>> -----Original Message-----
>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Andrew
>> Witzig
>> Sent: Wednesday, February 01, 2017 1:37 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] Performance Issues on SMP Workstation
>> 
>> By the way, the workstation has a total of 36 cores / 72 threads, so using 
>> mpirun
>> -np 20 is possible (and should be equivalent) on both platforms.
>> 
>> Thanks,
>> cap79
>> 
>>> On Feb 1, 2017, at 2:52 PM, Andy Witzig <cap1...@icloud.com> wrote:
>>> 
>>> Hi all,
>>> 
>>> I’m testing my application on a SMP workstation (dual Intel Xeon E5-2697 V4
>> 2.3 GHz Intel Broadwell (boost 2.8-3.1GHz) processors 128GB RAM) and am
>> seeing a 4x performance drop compared to a cluster system with 2.6GHz Intel
>> Haswell with 20 cores / node and 128GB RAM/node.  Both applications have
>> been compiled using OpenMPI 1.6.4.  I have tried running:
>>> 
>>> mpirun -np 20 $EXECUTABLE $INPUT_FILE
>>> mpirun -np 20 --mca btl self,sm $EXECUTABLE $INPUT_FILE
>>> 
>>> and others, but cannot achieve the same performance on the workstation as is
>> seen on the cluster.  The workstation outperforms on other non-MPI but multi-
>> threaded applications, so I don’t think it’s a hardware issue.
>>> 
>>> Any help you can provide would be appreciated.
>>> 
>>> Thanks,
>>> cap79
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to