Hello,

Thanks for the information, I’ll try to install it as per the instructions 
provided.

Regards,
Ghadiyali Mohammed Kader
Post Doctoral Fellow
King Abdullah University of Science and Technology
On 29 Jun 2022, 11:06 PM +0300, Alberto Garcia <alber...@icmab.es>, wrote:
> Hello,
>
> We are writing a section on GPUs in the documentation, but until it is ready 
> you can use the ideas below:
>
>
> There are two ways to take advantage of GPUs (enabled only for the solver 
> stage, which typically takes up the most time):
>
> -- Using the ELPA library and its native interface in Siesta (this method is 
> available for Siesta versions 4.1.5 and up)
>
> -- Using the ELSI library (for Siesta "MaX" versions
> (see the Guide to Siesta Versions in 
> https://urldefense.com/v3/__https://gitlab.com/siesta-project/siesta/-/wikis/Guide-to-Siesta-versions__;!!Nmw4Hv0!zoGYljExKqs1b-VkrSmFhJ6futBuvXnhXa2RrwwV6TwhqqPzKFcb7n5hg1lr7CF5rT0insiOyeJscp-ZjrM4f0RLn3Di2A$
>  )
>
> In both cases the special installation instructions involve only enabling GPU 
> support in either ELPA or ELSI, and using the proper options in Siesta.
>
> For the first method the fdf options to enable GPUs are (example):
>
> diag-algorithm elpa-2
> diag-elpa-usegpu T
> diag-blocksize 16
> # Optional
> number-of-eigenstates 17320
> use-tree-timer T
>
>
> For the second (ELSI) method:
>
> solution-method elsi
> elsi-solver elpa
> elsi-elpa-gpu 1
> elsi-elpa-flavor 2
>
> # Optional
> number-of-eigenstates 17320
> use-tree-timer T
> elsi-output-level 3
>
> The installation of ELPA and ELSI with GPU support is system-specific, but 
> you can get inspiration from the following examples:
>
> * ELPA (on Marconi-100 at CINECA, with IBM P9 chips and nVidia A100 GPUs, 
> using the gcc compiler):
>
> Script to configure:
>
> #!/bin/sh
>
> # (Need to define properly the symbols used below)
> # Note that the P9 does not use the typical Intel kernels
>
> FC=mpifort CC=mpicc CXX=mpic++ \
> CFLAGS="-O3 -mcpu=native -std=c++11" \
> FCFLAGS="-O3 -mcpu=native -ffree-line-length-none" LDFLAGS="${SCALAPACK_LIBS} 
> ${LAPACK_LIBS}" \
> ../configure \
> --with-cuda-path=${CUDA_HOME} \
> --with-cuda-sdk-path=${CUDA_HOME} \
> --enable-nvidia-gpu --with-NVIDIA-GPU-compute-capability=sm_70 \
> --enable-NVIDIA-gpu-memory-debug --enable-nvtx \
> --disable-sse-assembly --disable-sse --disable-avx --disable-avx2 
> --disable-avx512 \
> --enable-c-tests=no --prefix=$PRJ/bin/gcc/elpa/2021.05.002.jul22
>
>
> (Adapt the options to your system)
>
> * ELSI
>
> SET(CMAKE_INSTALL_PREFIX "$ENV{BASE_DIR}/elsi/2.6.2" CACHE STRING 
> "Installation dir")
>
> SET(CMAKE_Fortran_COMPILER "mpif90" CACHE STRING "MPI Fortran compiler")
> SET(CMAKE_C_COMPILER "mpicc" CACHE STRING "MPI C compiler")
> SET(CMAKE_CXX_COMPILER "mpicxx" CACHE STRING "MPI C++ compiler")
>
> SET(CMAKE_Fortran_FLAGS "-O2 -g -fbacktrace -fdump-core" CACHE STRING 
> "Fortran flags")
> SET(CMAKE_C_FLAGS "-O2 -g -std=c99" CACHE STRING "C flags")
> SET(CMAKE_CXX_FLAGS "-O2 -g -std=c++11" CACHE STRING "C++ flags")
> SET(CMAKE_CUDA_FLAGS "-O3 -arch=sm_70 -std=c++11" CACHE STRING "CUDA flags")
> # Workaround: specify -std=c++11 in CMAKE_CUDA_FLAGS to avoid __ieee128 
> gcc/cuda bug
>
> SET(USE_GPU_CUDA ON CACHE BOOL "Use CUDA-based GPU acceleration in ELPA")
> SET(ENABLE_PEXSI ON CACHE BOOL "Enable PEXSI")
> SET(ENABLE_TESTS ON CACHE BOOL "Enable tests")
> #SET(ADD_UNDERSCORE OFF CACHE BOOL "Do not suffix C functions with an 
> underscore")
>
> SET(LIB_PATHS 
> "/cineca/prod/opt/libraries/lapack/3.9.0/gnu--8.4.0/lib;/cineca/prod/opt/libraries/scalapack/2.1.0/spectrum_mpi--10.3.1--binary/lib;/cineca/prod/opt/compilers/cuda/11.0/none/lib64;/cineca/prod/opt/libraries/essl/6.2.1/binary/lib64"
>  CACHE STRING "External library paths")
>
> SET(LIBS "scalapack;lapack;essl;cublas;cudart" CACHE STRING "External 
> libraries")
> You should modify appropriately the location and version numbers of your 
> libraries.
>
> Finally, a note about the importance of the proper execution incantation, for 
> "pinning" the MPI ranks to the appropriate GPU:
>
> (There are probably better and more streamlined ways to do this)
>
> For this example I use the 32 cores (2x16) in Marconi for MPI tasks, no 
> OpenMP, and do not take advantage of the 4x hyperthreading.
>
> The slurm script I typically use is: (gcc_env et al are my own Lmod modules)
> =============================================================================
> #!/bin/bash
> #SBATCH --job-name=test-covid
> #SBATCH --account=Pra19_MaX_1
> #SBATCH --partition=m100_usr_prod
> #SBATCH --output=mpi_%j.out
> #SBATCH --error=mpi_%j.err
> #SBATCH --nodes=8
> #SBATCH --ntasks-per-node=32
> #SBATCH --ntasks-per-socket=16
> #SBATCH --cpus-per-task=4
> #SBATCH --gres=gpu:4
> #SBATCH --time=00:19:00
>
> #
> ml purge
> ml gcc_env
> ml siesta-max/1.0-14
> #
> date
> which siesta
> echo "-------------------"
> #
> export OMP_NUM_THREADS=1
> #
> mpirun --map-by socket:PE=1 --rank-by core --report-bindings \
> -np ${SLURM_NTASKS} ./gpu_bind.sh \
> siesta covid.fdf
> =============================================================================
>
> The crucial part is the gpu_bind.sh script, which contains code to make sure 
> that each socket
> talks to the right GPUs (1st socket, GPU0/GPU1), 2nd socket, GPU2/GPU3), and 
> within each socket, the first 8 tasks
> use GPU0/2 and the second group of 8 tasks use GPU1/3. For this, the tasks 
> have to be ordered. (This is specific to Marconi). I found that using
> the
>
> --map-by socket:PE=1 --rank-by-core
>
> incantation I could achieve that ordering.
>
> The contents of gpu_bind.sh are:
>
> ====================================================
> #!/bin/bash
>
> np_node=$OMPI_COMM_WORLD_LOCAL_SIZE
> rank=$OMPI_COMM_WORLD_LOCAL_RANK
>
> block=$(( $np_node / 4 )) # We have 4 GPUs
> # If np_node is 32 (typical), then block=8
>
> limit0=$(( $block * 1 ))
> limit1=$(( $block * 2 ))
> limit2=$(( $block * 3 ))
> limit3=$(( $block * 4 ))
>
> #-----------------
>
> if [ $rank -lt $limit0 ]
> then
> export CUDA_VISIBLE_DEVICES=0
>
> elif [ $rank -lt $limit1 ]
> then
> export CUDA_VISIBLE_DEVICES=1
>
> elif [ $rank -lt $limit2 ]
> then
> export CUDA_VISIBLE_DEVICES=2
> else
> export CUDA_VISIBLE_DEVICES=3
> fi
>
> $@
> ====================================================
>
>
> I hope this helps.
>
> Best regards,
>
> Alberto
>
>
> ----- El 28 de Junio de 2022, a las 10:28, Mohammed Ghadiyali 
> mohammed.ghadiy...@kaust.edu.sa escribió:
>
> | Hello,
> |
> | I’ve went Q&A available on max-center website and as per it Siesta can use
> | GPU’s. However I’m not able to find any documentation on installation. So 
> can
> | some one inform me the procedure for installing Siesta on GPU’s. Our systems
> | have 8xV100 (32 GB each) with NVLink.
> |
> | Regards,
> | Ghadiyali Mohammed Kader
> | Post Doctoral Fellow
> | King Abdullah University of Science and Technology
> |
> |
> | This message and its contents, including attachments are intended solely 
> for the
> | original recipient. If you are not the intended recipient or have received 
> this
> | message in error, please notify me immediately and delete this message from
> | your computer system. Any unauthorized use or distribution is prohibited.
> | Please consider the environment before printing this email.
> |
> |
> | --
> | SIESTA is supported by the Spanish Research Agency (AEI) and by the European
> | H2020 MaX Centre of Excellence 
> (https://urldefense.com/v3/__http://www.max-centre.eu/__;!!Nmw4Hv0!zoGYljExKqs1b-VkrSmFhJ6futBuvXnhXa2RrwwV6TwhqqPzKFcb7n5hg1lr7CF5rT0insiOyeJscp-ZjrM4f0S9nLP1Cg$
>  )
>
>
> --
> SIESTA is supported by the Spanish Research Agency (AEI) and by the European 
> H2020 MaX Centre of Excellence 
> (https://urldefense.com/v3/__http://www.max-centre.eu/__;!!Nmw4Hv0!zoGYljExKqs1b-VkrSmFhJ6futBuvXnhXa2RrwwV6TwhqqPzKFcb7n5hg1lr7CF5rT0insiOyeJscp-ZjrM4f0S9nLP1Cg$
>  )

-- 

This message and its contents, including attachments are intended solely 
for the original recipient. If you are not the intended recipient or have 
received this message in error, please notify me immediately and delete 
this message from your computer system. Any unauthorized use or 
distribution is prohibited. Please consider the environment before printing 
this email.
-- 
SIESTA is supported by the Spanish Research Agency (AEI) and by the European 
H2020 MaX Centre of Excellence (http://www.max-centre.eu/)

Responder a