Dear all, there's been some discussions about this already, but the issue is still there (in 1.4.4). When running SLURM jobs with the --cpus-per-task parameter set (e.g. when running Open MPI-OpenMP jobs, so that --cpus-per-tasks corresponds to the number of OpenMP threads per rank), you get the
"All nodes which are allocated for this job are already filled." error, if SLURM_CPUS_PER_TASK > SLURM_TASKS_PER_NODE. In ras_slurm_module.c, the number of slots is divided by the SLURM_CPUS_PER_TASK value (so that it becomes 0). The following patch seems to work for our cluster: --- a/orte/mca/ras/slurm/ras_slurm_module.c 2009-12-08 21:36:38.000000000 +0100 +++ b/orte/mca/ras/slurm/ras_slurm_module.c 2011-11-25 12:28:55.000000000 +0100 @@ -353,7 +353,8 @@ node->state = ORTE_NODE_STATE_UP; node->slots_inuse = 0; node->slots_max = 0; - node->slots = slots[i] / cpus_per_task; + /* Don't divide by cpus_per_task */ + node->slots = slots[i]; opal_list_append(nodelist, &node->super); } free(slots); Are there situations where this might not work? Best regards Igor -- Igor Geier -------------------------------------- Center for Scientific Computing (CSC) University of Frankfurt Max-von-Laue-Straße 1 60438 Frankfurt am Main +49(0)69/798-47353 ge...@csc.uni-frankfurt.de http://csc.uni-frankfurt.de/ --------------------------------------