[slurm-users] why is the performance fo mpi allreduce job lower than the expected level?

2020-12-24 Thread hu...@sugon.com
Dear there, We tested mpi allreduce job in three modes (srun-dtcp 、mpirun-slurm、mpirun-ssh), and we found that the job running time in the mpirun-ssh mode is shorter than the other modes. We've set parameters like below: /usr/lib/systemd/system/slurmd.service: LimitMEMLOCK=i

[slurm-users] pmix and ucx(IB) testing fails with error "Cannot get polling fd"

2019-11-09 Thread hu...@sugon.com
Hi, When I was testing slurm-19.05.3 with openmpi-4.0.1 、pmix-3.1.3rc4 and ucx-1.6.1(with IB) ,I got a different error unlike Bug 7646(https://bugs.schedmd.com/show_bug.cgi?id=7646).At first , the job like "srun --mpi=pmix_v3 xxx" could run with "SLURM_PMIX_DIRECT_CONN=true" and "SLURM_PMIX_DIR

[slurm-users] OverSubscribe=FORCE:1overloads nodes?

2019-09-08 Thread hu...@sugon.com
Dear there, I have two jobs in my cluster, which has 32 cores per compute node. The first job uses eight nodes and 256 cores, which means it takes up all eight nodes. The second job uses five nodes and 32 cores, which means only partial cores of five nodes will be used. Slurm, however, alloc

[slurm-users] 转发: a heterogeneous job terminate unexpectedly

2019-02-27 Thread hu...@sugon.com
Dear there, I have a cluster with 9 nodes(cmbc[1530-1538]) , each node has 2 cpus and each cpu has 32cores, but when I submitted a heterogeneous job twice ,the second job terminated unexpectedly. This problem has been bothering me all day. Slurm version is 18.08.5 and here is the job :

Re: [slurm-users] How to get the CPU usage of history jobs at each compute node?

2019-02-15 Thread hu...@sugon.com
ology Unit University of Cambridge Cambridge, CB2 0XY United Kingdom On 15 Feb 2019, at 10:05, hu...@sugon.com wrote: Dear there, How to view the cpu usage of history jobs at each compute node? However, this command(control show jobs jobid --detail) can only get the cpu usage of the currently runn

[slurm-users] How to get the CPU usage of history jobs at each compute node?

2019-02-15 Thread hu...@sugon.com
Dear there, How to view the cpu usage of history jobs at each compute node? However, this command(control show jobs jobid --detail) can only get the cpu usage of the currently running job at each compute node : Appreciatively, Menglong