Thank you, Bennet.  From my testing, I’ve seen that the application usually 
performs better at much smaller ranks on the workstation.  I’ve tested on the 
cluster and do not see the same response (i.e. see better performance with 
ranks of -np 15 or 20).   The workstation is not shared and is not doing any 
other work.  I ran the application on the Workstation with top and confirmed 
that 20 procs were fully loaded.

I’ll look into the diagnostics you mentioned and get back with you.

Best regards,
Andy
  
On Feb 1, 2017, at 6:15 PM, Bennet Fauber <ben...@umich.edu> wrote:

How do they compare if you run a much smaller number of ranks, say -np 2 or 4?

Is the workstation shared and doing any other work?

You could insert some diagnostics into your script, for example,
uptime and free, both before and after running your MPI program and
compare.

You could also run top in batch mode in the background for your own
username, then run your MPI program, and compare the results from top.
We've seen instances where the MPI ranks only get distributed to a
small number of processors, which you see if they all have small
percentages of CPU.

Just flailing in the dark...

-- bennet



On Wed, Feb 1, 2017 at 6:36 PM, Andy Witzig <cap1...@icloud.com> wrote:
> Thank for the idea.  I did the test and only get a single host.
> 
> Thanks,
> Andy
> 
> On Feb 1, 2017, at 5:04 PM, r...@open-mpi.org wrote:
> 
> Simple test: replace your executable with “hostname”. If you see multiple
> hosts come out on your cluster, then you know why the performance is
> different.
> 
> On Feb 1, 2017, at 2:46 PM, Andy Witzig <cap1...@icloud.com> wrote:
> 
> Honestly, I’m not exactly sure what scheme is being used.  I am using the
> default template from Penguin Computing for job submission.  It looks like:
> 
> #PBS -S /bin/bash
> #PBS -q T30
> #PBS -l walltime=24:00:00,nodes=1:ppn=20
> #PBS -j oe
> #PBS -N test
> #PBS -r n
> 
> mpirun $EXECUTABLE $INPUT_FILE
> 
> I’m not configuring OpenMPI anywhere else. It is possible the Penguin
> Computing folks have pre-configured my MPI environment.  I’ll see what I can
> find.
> 
> Best regards,
> Andy
> 
> On Feb 1, 2017, at 4:32 PM, Douglas L Reeder <d...@centurylink.net> wrote:
> 
> Andy,
> 
> What allocation scheme are you using on the cluster. For some codes we see
> noticeable differences using fillup vs round robin, not 4x though. Fillup is
> more shared memory use while round robin uses more infinniband.
> 
> Doug
> 
> On Feb 1, 2017, at 3:25 PM, Andy Witzig <cap1...@icloud.com> wrote:
> 
> Hi Tom,
> 
> The cluster uses an Infiniband interconnect.  On the cluster I’m requesting:
> #PBS -l walltime=24:00:00,nodes=1:ppn=20.  So technically, the run on the
> cluster should be SMP on the node, since there are 20 cores/node.  On the
> workstation I’m just using the command: mpirun -np 20 …. I haven’t finished
> setting Torque/PBS up yet.
> 
> Best regards,
> Andy
> 
> On Feb 1, 2017, at 4:10 PM, Elken, Tom <tom.el...@intel.com> wrote:
> 
> For this case:  " a cluster system with 2.6GHz Intel Haswell with 20 cores /
> node and 128GB RAM/node.  "
> 
> are you running 5 ranks per node on 4 nodes?
> What interconnect are you using for the cluster?
> 
> -Tom
> 
> -----Original Message-----
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Andrew
> Witzig
> Sent: Wednesday, February 01, 2017 1:37 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] Performance Issues on SMP Workstation
> 
> By the way, the workstation has a total of 36 cores / 72 threads, so using
> mpirun
> -np 20 is possible (and should be equivalent) on both platforms.
> 
> Thanks,
> cap79
> 
> On Feb 1, 2017, at 2:52 PM, Andy Witzig <cap1...@icloud.com> wrote:
> 
> Hi all,
> 
> I’m testing my application on a SMP workstation (dual Intel Xeon E5-2697 V4
> 
> 2.3 GHz Intel Broadwell (boost 2.8-3.1GHz) processors 128GB RAM) and am
> seeing a 4x performance drop compared to a cluster system with 2.6GHz Intel
> Haswell with 20 cores / node and 128GB RAM/node.  Both applications have
> been compiled using OpenMPI 1.6.4.  I have tried running:
> 
> 
> mpirun -np 20 $EXECUTABLE $INPUT_FILE
> mpirun -np 20 --mca btl self,sm $EXECUTABLE $INPUT_FILE
> 
> and others, but cannot achieve the same performance on the workstation as is
> 
> seen on the cluster.  The workstation outperforms on other non-MPI but
> multi-
> threaded applications, so I don’t think it’s a hardware issue.
> 
> 
> Any help you can provide would be appreciated.
> 
> Thanks,
> cap79
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to