Hi, I'm investigating some job issues and would like to figure out the exact CPU distribution of the job from the accounting info. Right now SLURM does not offer something like the "exec_host" field in Torque. This makes it difficult to achieve this task. I can only make a guess from the NodeList, AllocCPUS, and Layout fields but after some testing I find this approach extremely unreliable. For example, on a shared cluster, if I acquire the resources with "--ntasks=13", I would get 8 processes running on node 0 and 5 on node 1. However by examining the Layout of the job I see that it is registered as "Cyclic" fashion instead of "Block" as I would image. If nodes are partially used, it may even end up with 3 5 5 fashion, so I have no idea how many processes were actually launched on a node.
So my question here is, how to recreate the CPU distribution for a job from the accounting info? This will be extremely useful for people to debug a job in a shared environment after something bad happened. If no way under the current framework, would that be possible to add this as an extra field for the accounting info? Thanks, Yong Qin