[slurm-users] Re: Performance Discrepancy between Slurm and Direct mpirun for VASP Jobs.

Hongyi Zhao via slurm-users Fri, 24 May 2024 19:06:53 -0700

On Sat, May 25, 2024 at 9:49 AM Hongyi Zhao <hongyi.z...@gmail.com> wrote:
>
> On Sat, May 25, 2024 at 7:50 AM Hongyi Zhao <hongyi.z...@gmail.com> wrote:
> >
> > On Sat, May 25, 2024 at 12:02 AM Hermann Schwärzler via slurm-users
> > <slurm-users@lists.schedmd.com> wrote:
> > >
> > > Hi Zhao,
> > >
> > > my guess is that in your faster case you are using hyperthreading
> > > whereas in the Slurm case you don't.
> > >
> > > Can you check what performance you get when you add
> > >
> > > #SBATCH --hint=multithread
> > >
> > > to you slurm script?
> >
> > I tried to add the above instructions to the slurm script, and only
> > found that the job will stuck there forever. Here are the results 10
> > minutes after the job was submitted:
> >
> >
> > werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$
> > cat sub.sh.o6
> > #######################################################
> > date                    = 2024年 05月 25日 星期六 07:31:31 CST
> > hostname                = x13dai-t
> > pwd                     =
> > /home/werner/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV
> > sbatch                  = /usr/bin/sbatch
> >
> > WORK_DIR                =
> > SLURM_SUBMIT_DIR        =
> > /home/werner/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV
> > SLURM_JOB_NUM_NODES     = 1
> > SLURM_NTASKS            = 36
> > SLURM_NTASKS_PER_NODE   =
> > SLURM_CPUS_PER_TASK     =
> > SLURM_JOBID             = 6
> > SLURM_JOB_NODELIST      = localhost
> > SLURM_NNODES            = 1
> > SLURMTMPDIR             =
> > #######################################################
> >
> >  running   36 mpi-ranks, on    1 nodes
> >  distrk:  each k-point on   36 cores,    1 groups
> >  distr:  one band on    4 cores,    9 groups
> >  vasp.6.4.3 19Mar24 (build May 17 2024 09:27:19) complex
> >
> >  POSCAR found type information on POSCAR Cr
> >  POSCAR found :  1 types and      72 ions
> >  Reading from existing POTCAR
> >  scaLAPACK will be used
> >  Reading from existing POTCAR
> >  
> > -----------------------------------------------------------------------------
> > |                                                                           
> >   |
> > |               ----> ADVICE to this user running VASP <----                
> >   |
> > |                                                                           
> >   |
> > |     You have a (more or less) 'large supercell' and for larger cells it   
> >   |
> > |     might be more efficient to use real-space projection operators.       
> >   |
> > |     Therefore, try LREAL= Auto in the INCAR file.                         
> >   |
> > |     Mind: For very accurate calculation, you might also keep the          
> >   |
> > |     reciprocal projection scheme (i.e. LREAL=.FALSE.).                    
> >   |
> > |                                                                           
> >   |
> >  
> > -----------------------------------------------------------------------------
> >
> >  LDA part: xc-table for (Slater+PW92), standard interpolation
> >  POSCAR, INCAR and KPOINTS ok, starting setup
> >  FFT: planning ... GRIDC
> >  FFT: planning ... GRID_SOFT
> >  FFT: planning ... GRID
> >  WAVECAR not read
>
> Ultimately, I found that the cause of the problem was that
> hyper-threading was enabled by default in the BIOS. If I disable
> hyper-threading, I observed that the computational efficiency is
> consistent between using slurm and using mpirun directly. Therefore,
> it appears that hyper-threading should not be enabled in the BIOS when
> using slurm.


Regarding the reason, I think the description here [1] is reasonable:

It is recommended to disable processor hyper-threading. In
applications that are compute-intensive rather than I/O-intensive,
enabling HyperThreading is likely to decrease the overall performance
of the server. Intuitively, the physical memory available per core is
reduced after hyper-threading is enabled.

[1] 
https://gist.github.com/weijianwen/acee3cd49825da8c8dfb4a99365b54c8#%E5%85%B3%E9%97%AD%E5%A4%84%E7%90%86%E5%99%A8%E8%B6%85%E7%BA%BF%E7%A8%8B

Regards,
Zhao

> >
> > > Another difference between the two might be
> > > a) the communication channel/interface that is used.
> >
> > I tried to use `mpirun', `mpiexec', and `srun --mpi pmi2', and they
> > all have similar behaviors as described above.
> >
> > > b) the number of nodes involved: when using mpirun you might run things
> > > on more than one node.
> >
> > This is a single-node cluster with two sockets.
> >
> > > Regards,
> > > Hermann
> >
> > Regards,
> > Zhao
> >
> > > On 5/24/24 15:32, Hongyi Zhao via slurm-users wrote:
> > > > Dear Slurm Users,
> > > >
> > > > I am experiencing a significant performance discrepancy when running
> > > > the same VASP job through the Slurm scheduler compared to running it
> > > > directly with mpirun. I am hoping for some insights or advice on how
> > > > to resolve this issue.
> > > >
> > > > System Information:
> > > >
> > > > Slurm Version: 21.08.5
> > > > OS: Ubuntu 22.04.4 LTS (Jammy)
> > > >
> > > >
> > > > Job Submission Script:
> > > >
> > > > #!/usr/bin/env bash
> > > > #SBATCH -N 1
> > > > #SBATCH -D .
> > > > #SBATCH --output=%j.out
> > > > #SBATCH --error=%j.err
> > > > ##SBATCH --time=2-00:00:00
> > > > #SBATCH --ntasks=36
> > > > #SBATCH --mem=64G
> > > >
> > > > echo '#######################################################'
> > > > echo "date                    = $(date)"
> > > > echo "hostname                = $(hostname -s)"
> > > > echo "pwd                     = $(pwd)"
> > > > echo "sbatch                  = $(which sbatch | xargs realpath -e)"
> > > > echo ""
> > > > echo "WORK_DIR                = $WORK_DIR"
> > > > echo "SLURM_SUBMIT_DIR        = $SLURM_SUBMIT_DIR"
> > > > echo "SLURM_JOB_NUM_NODES     = $SLURM_JOB_NUM_NODES"
> > > > echo "SLURM_NTASKS            = $SLURM_NTASKS"
> > > > echo "SLURM_NTASKS_PER_NODE   = $SLURM_NTASKS_PER_NODE"
> > > > echo "SLURM_CPUS_PER_TASK     = $SLURM_CPUS_PER_TASK"
> > > > echo "SLURM_JOBID             = $SLURM_JOBID"
> > > > echo "SLURM_JOB_NODELIST      = $SLURM_JOB_NODELIST"
> > > > echo "SLURM_NNODES            = $SLURM_NNODES"
> > > > echo "SLURMTMPDIR             = $SLURMTMPDIR"
> > > > echo '#######################################################'
> > > > echo ""
> > > >
> > > > module purge > /dev/null 2>&1
> > > > module load vasp
> > > > ulimit -s unlimited
> > > > mpirun vasp_std
> > > >
> > > >
> > > > Performance Observation:
> > > >
> > > > When running the job through Slurm:
> > > >
> > > > werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$
> > > > grep LOOP OUTCAR
> > > >        LOOP:  cpu time     14.4893: real time     14.5049
> > > >        LOOP:  cpu time     14.3538: real time     14.3621
> > > >        LOOP:  cpu time     14.3870: real time     14.3568
> > > >        LOOP:  cpu time     15.9722: real time     15.9018
> > > >        LOOP:  cpu time     16.4527: real time     16.4370
> > > >        LOOP:  cpu time     16.7918: real time     16.7781
> > > >        LOOP:  cpu time     16.9797: real time     16.9961
> > > >        LOOP:  cpu time     15.9762: real time     16.0124
> > > >        LOOP:  cpu time     16.8835: real time     16.9008
> > > >        LOOP:  cpu time     15.2828: real time     15.2921
> > > >       LOOP+:  cpu time    176.0917: real time    176.0755
> > > >
> > > > When running the job directly with mpirun:
> > > >
> > > >
> > > > werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$
> > > > mpirun -n 36 vasp_std
> > > > werner@x13dai-t:~/Public/hpc/servers/benchmark/Cr72_3x3x3K_350eV_10DAV$
> > > > grep LOOP OUTCAR
> > > >        LOOP:  cpu time      9.0072: real time      9.0074
> > > >        LOOP:  cpu time      9.0515: real time      9.0524
> > > >        LOOP:  cpu time      9.1896: real time      9.1907
> > > >        LOOP:  cpu time     10.1467: real time     10.1479
> > > >        LOOP:  cpu time     10.2691: real time     10.2705
> > > >        LOOP:  cpu time     10.4330: real time     10.4340
> > > >        LOOP:  cpu time     10.9049: real time     10.9055
> > > >        LOOP:  cpu time      9.9718: real time      9.9714
> > > >        LOOP:  cpu time     10.4511: real time     10.4470
> > > >        LOOP:  cpu time      9.4621: real time      9.4584
> > > >       LOOP+:  cpu time    110.0790: real time    110.0739
> > > >
> > > >
> > > > Could you provide any insights or suggestions on what might be causing
> > > > this performance issue? Are there any specific configurations or
> > > > settings in Slurm that I should check or adjust to align the
> > > > performance more closely with the direct mpirun execution?
> > > >
> > > > Thank you for your time and assistance.
> > > >
> > > > Best regards,
> > > > Zhao
> > >
> > > --
> > > slurm-users mailing list -- slurm-users@lists.schedmd.com
> > > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Performance Discrepancy between Slurm and Direct mpirun for VASP Jobs.

Reply via email to