Folks,
I am seeing an odd behavior and I am wondering if others have seen this
and might be able to explain it.
The behavior is the following message:
[2015-11-12T08:00:14.415] debug: Job 28447 still has 1 active steps
showing up in the slurmctld logs long (many minutes) after job 28447 has
completed and is no longer found in the squeue output. There is no
trace of the original job on either of the nodes it was originally run
on, and those nodes are idle, but the debug message keeps appearing.
I am running a locally modified 15.08.1. My local modifications should
not have anything to do with this, but I am open to the possibility.
Mostly I am curious whether this is a known behavior, and, if so,
whether there is a workaround or fix for it.
Thanks!
Eric