I was able to narrow down my problem further.  I managed to stop all slurm
daemons on head and slave nodes.  On head node, if I start the new slurm:
/.../new-slurm/etc/init.d/slurm start
the slurmctld daemon runs ok, but with error messages in the
/var/log/slurmctld file about Incompatible versions.  If I do "ps -u root
-F |grep slurm", I can see why.  Periodically, the /../old-slurm/scontrol
and squeue run.  It seems like somehow, the "new-slurm/etc/init.d/slurm
start" is triggering these old daemons to run.  I check the path in my
$PATH, slurm.conf, and /../new-slurm/etc/init.d/slurm file, and there is
nothing like that.  I would appreciate any guidance on resolving this.

Regards
Andrew

On Wed, Jun 3, 2015 at 4:52 PM, Andrew Petersen <aapet...@ncsu.edu> wrote:

> Hello
>
> I installed a new version of slurm, 14.11.3.  It works fine.  However I
> noticed that my log file /var/log/slurmctld shows
> error: slurm_receive_msg: Incompatible versions of client and server code
> This led me to discover that old slurm scontrol, squeue and sacct are
> still running on the head node, using
> ps -u root -F |grep slurm
>
> I have tried to kill this every which way, but they wont die, they keep
> resurrecting with different pid's.  I tried
>  /old-slurm-version/bin/scontrol shutdown
> but this gives me
> slurm_shutdown error: Zero Bytes were transmitted or received
>
> It seems like something is automatically restarting the old slurm.  I am
> using Bright Cluster Manager, and I set it so that it does NOT auto-start
> or run the slurm daemon, but that did not help.
>
> Can someone help me kill this thing?  It is causing the creation of big
> log zip files, and using up cpu capacity on the head node.
>
> Regards
> Andrew Petersen
>
>

Reply via email to