I was able to narrow down my problem further. I managed to stop all slurm daemons on head and slave nodes. On head node, if I start the new slurm: /.../new-slurm/etc/init.d/slurm start the slurmctld daemon runs ok, but with error messages in the /var/log/slurmctld file about Incompatible versions. If I do "ps -u root -F |grep slurm", I can see why. Periodically, the /../old-slurm/scontrol and squeue run. It seems like somehow, the "new-slurm/etc/init.d/slurm start" is triggering these old daemons to run. I check the path in my $PATH, slurm.conf, and /../new-slurm/etc/init.d/slurm file, and there is nothing like that. I would appreciate any guidance on resolving this.
Regards Andrew On Wed, Jun 3, 2015 at 4:52 PM, Andrew Petersen <aapet...@ncsu.edu> wrote: > Hello > > I installed a new version of slurm, 14.11.3. It works fine. However I > noticed that my log file /var/log/slurmctld shows > error: slurm_receive_msg: Incompatible versions of client and server code > This led me to discover that old slurm scontrol, squeue and sacct are > still running on the head node, using > ps -u root -F |grep slurm > > I have tried to kill this every which way, but they wont die, they keep > resurrecting with different pid's. I tried > /old-slurm-version/bin/scontrol shutdown > but this gives me > slurm_shutdown error: Zero Bytes were transmitted or received > > It seems like something is automatically restarting the old slurm. I am > using Bright Cluster Manager, and I set it so that it does NOT auto-start > or run the slurm daemon, but that did not help. > > Can someone help me kill this thing? It is causing the creation of big > log zip files, and using up cpu capacity on the head node. > > Regards > Andrew Petersen > >