Re: [slurm-users] [EXT] slurmctld error

2021-04-08 Thread Sean Crosby
rs *On Behalf Of > *Sean Crosby > *Sent:* Thursday, April 8, 2021 10:18 AM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] [EXT] slurmctld error > > > > The reason why your nodes are drained is "Low RealMemory" > > > > This reason is beca

Re: [slurm-users] [EXT] slurmctld error

2021-04-08 Thread Sean Crosby
; >Reason=Low RealMemory [root@2021-04- > > > > *From:* slurm-users *On Behalf Of > *Sean Crosby > *Sent:* Tuesday, April 6, 2021 2:11 PM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] [EXT] slurmctld error > > > > I just checked my clust

Re: [slurm-users] [EXT] slurmctld error

2021-04-08 Thread Ioannis Botsis
-- --- - - --- - - --- - - --- - tuc 127.0.0.1 6817 8704 1 From: slurm-users On Behalf Of Sean Crosby Sent: Tuesday, April 6, 2021 2:11 PM To: Slurm User Community List Subject: Re: [slurm-users] [EXT] slurmctld error I just checked my

Re: [slurm-users] [EXT] slurmctld error

2021-04-08 Thread Ioannis Botsis
=n/s Reason=Low RealMemory [root@2021-04- From: slurm-users On Behalf Of Sean Crosby Sent: Tuesday, April 6, 2021 2:11 PM To: Slurm User Community List Subject: Re: [slurm-users] [EXT] slurmctld error I just checked my cluster and my spool dir is SlurmdSpoolDir=/var/spool/s

Re: [slurm-users] [EXT] slurmctld error

2021-04-06 Thread Sean Crosby
>>> >>> wn029 drained 0/0/2/2 3934 TUC* up >>> >>> wn030 drained 0/0/2/2 3934 TUC* up >>> >>> wn031 drained 0/0/2/2 3934 TUC* up >>> >>> wn032 drained 0/0/2/2 3934 TUC* up >>> >>> wn033 drained 0/0

Re: [slurm-users] [EXT] slurmctld error

2021-04-06 Thread Sean Crosby
35 drained 0/0/2/2 3934 TUC* up >> >> wn036 drained 0/0/2/2 3934 TUC* up >> >> wn037 drained 0/0/2/2 3934 TUC* up >> >> wn038 drained 0/0/2/2 3934 TUC* up >> >> wn039 drained 0/0/2/2 3934 TUC* up >> >> wn040 drained 0/0/2/2 3934 TUC*

Re: [slurm-users] [EXT] slurmctld error

2021-04-06 Thread Sean Crosby
drained 0/0/2/2 3934 TUC* up > > wn039 drained 0/0/2/2 3934 TUC* up > > wn040 drained 0/0/2/2 3934 TUC* up > > wn041 drained 0/0/2/2 3934 TUC* up > > wn042 drained 0/0/2/2 3934 TUC* up > > wn043 drained 0/0/2/2 3934 TUC* up > > wn044 drained 0/0/2/2 3934 TUC* u

Re: [slurm-users] [EXT] slurmctld error

2021-04-06 Thread ibotsis
ubject: Re: [slurm-users] [EXT] slurmctld error It looks like your attachment of sinfo -R didn't come through It also looks like your dbd isn't set up correctly Can you also show the output of sacctmgr list cluster and scontrol show config | grep ClusterName Sean -- S

Re: [slurm-users] [EXT] slurmctld error

2021-04-06 Thread ibotsis
: slurm-users On Behalf Of Sean Crosby Sent: Tuesday, April 6, 2021 12:47 PM To: Slurm User Community List Subject: Re: [slurm-users] [EXT] slurmctld error It looks like your attachment of sinfo -R didn't come through It also looks like your dbd isn't set up correctly Can you also show

Re: [slurm-users] [EXT] slurmctld error

2021-04-06 Thread Sean Crosby
021-04-06T12:10:35.702] debug: backfill: beginning > > [2021-04-06T12:10:35.702] debug: backfill: no jobs to backfill > > [2021-04-06T12:10:37.001] debug: slurmdbd: PERSIST_RC is -1 from > DBD_FLUSH_JOBS(1408): (null) > > > > Attached sinfo -R > > > > Any hint? >

Re: [slurm-users] [EXT] slurmctld error

2021-04-06 Thread Ioannis Botsis
schedmd.com> On Behalf Of Sean Crosby Sent: Tuesday, April 6, 2021 12:49 AM To: Slurm User Community List < <mailto:slurm-users@lists.schedmd.com> slurm-users@lists.schedmd.com> Subject: Re: [slurm-users] [EXT] slurmctld error What's the output of ss -lntp | grep $(pidof sl

Re: [slurm-users] [EXT] slurmctld error

2021-04-06 Thread Ole Holm Nielsen
Hi Ioannis, On 06-04-2021 07:56, Ioannis Botsis wrote: slurmctld is active and running but on system reboot doesn’t start automatically…..I have to start it manually Maybe you will find my Slurm Wiki pages of use for setting up your Slurm system: https://wiki.fysik.dtu.dk/niflheim/SLURM

Re: [slurm-users] [EXT] slurmctld error

2021-04-05 Thread Ioannis Botsis
Hi Sean, slurmctld is active and running but on system reboot doesn’t start automatically…..I have to start it manually jb From: slurm-users On Behalf Of Sean Crosby Sent: Tuesday, April 6, 2021 7:54 AM To: Slurm User Community List Subject: Re: [slurm-users] [EXT] slurmctld error

Re: [slurm-users] [EXT] slurmctld error

2021-04-05 Thread Ioannis Botsis
I turned DbdAddr and DbdHost to localhost and now slurmctld is active and running….. Thanks jb From: slurm-users On Behalf Of Sean Crosby Sent: Tuesday, April 6, 2021 7:54 AM To: Slurm User Community List Subject: Re: [slurm-users] [EXT] slurmctld error The other thing I

Re: [slurm-users] [EXT] slurmctld error

2021-04-05 Thread Sean Crosby
se01.grid.tuc.gr systemd[1]: Starting Slurm DBD >> accounting daemon... >> >> Apr 05 13:52:35 se01.grid.tuc.gr systemd[1]: slurmdbd.service: Can't >> open PID file /run/slurmdbd.pid (yet?) after start: Operation not permitted >> >> Apr 05 13:52:35 se01.grid.tuc.gr s

Re: [slurm-users] [EXT] slurmctld error

2021-04-05 Thread Sean Crosby
Apr 2021 at 05:00, wrote: > > *UoM notice: *External email. Be cautious of links, attachments, or > impersonation attempts > > > -- > > Hi Sean, > > > > 10.0.0.100 is the dbd and ctld host with name se01. Firewall is inactive…… > >

Re: [slurm-users] [EXT] slurmctld error

2021-04-05 Thread ibotsis
]: Started Slurm DBD accounting daemon. File /run/slurmdbd.pid exist and has pidof slurmdbd value…. From: slurm-users On Behalf Of Sean Crosby Sent: Tuesday, April 6, 2021 12:49 AM To: Slurm User Community List Subject: Re: [slurm-users] [EXT] slurmctld error What's the output

Re: [slurm-users] [EXT] slurmctld error

2021-04-05 Thread Sean Crosby
ection not working > > > > give me back ….. Connection not working > > > > jb > > > > > > *From:* slurm-users *On Behalf Of > *Sean Crosby > *Sent:* Monday, April 5, 2021 2:52 PM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users]

Re: [slurm-users] [EXT] slurmctld error

2021-04-05 Thread ibotsis
User Community List Subject: Re: [slurm-users] [EXT] slurmctld error The error shows slurmctld: debug2: Error connecting slurm stream socket at 10.0.0.100:6819 <http://10.0.0.100:6819> : Connection refused slurmctld: error: slurm_persist_conn_open_without_init: failed to open pers

Re: [slurm-users] [EXT] slurmctld error

2021-04-05 Thread Sean Crosby
ou for your prompt response, I made the changes you suggested, > slurmctld refuse running……. find attached new slurmctld -D > > > > jb > > > > > > > > *From:* slurm-users *On Behalf Of > *Sean Crosby > *Sent:* Monday, April 5, 2021 11:46 AM > *To:

Re: [slurm-users] [EXT] slurmctld error

2021-04-05 Thread Ioannis Botsis
-users] [EXT] slurmctld error Hi Jb, You have set AccountingStoragePort to 3306 in slurm.conf, which is the MySQL port running on the DBD host. AccountingStoragePort is the port for the Slurmdbd service, and not for MySQL. Change AccountingStoragePort to 6819 and it should fix your

Re: [slurm-users] [EXT] slurmctld error

2021-04-05 Thread Sean Crosby
Hi Jb, You have set AccountingStoragePort to 3306 in slurm.conf, which is the MySQL port running on the DBD host. AccountingStoragePort is the port for the Slurmdbd service, and not for MySQL. Change AccountingStoragePort to 6819 and it should fix your issues. I also think you should comment