Re: [slurm-users] [External] ERROR: slurmctld: auth/munge: _print_cred: DECODED

2022-12-03 Thread Nousheen
e: *Friday, December 2, 2022 at 09:22 > *To: *Slurm User Community List > *Subject: *Re: [slurm-users] [External] ERROR: slurmctld: auth/munge: > _print_cred: DECODED > > *CAUTION:* This email originated from outside of the Colorado School of > Mines organization. Do not click on link

Re: [slurm-users] [External] ERROR: slurmctld: auth/munge: _print_cred: DECODED

2022-12-02 Thread Michael Robbert
| Integrity | Respect | Responsibility From: slurm-users on behalf of Nousheen Date: Friday, December 2, 2022 at 09:22 To: Slurm User Community List Subject: Re: [slurm-users] [External] ERROR: slurmctld: auth/munge: _print_cred: DECODED CAUTION: This email originated from outside of the

Re: [slurm-users] [External] ERROR: slurmctld: auth/munge: _print_cred: DECODED

2022-12-02 Thread Nousheen
Dear Ole, Thank you so much for your response. I have now adjusted the RealMemory in the slurm.conf which was set by default previously. Your insight was really helpful. Now, when I submit the job, it is running on three nodes but one node (104) is not responding. The details of some commands are

Re: [slurm-users] [External] ERROR: slurmctld: auth/munge: _print_cred: DECODED

2022-12-01 Thread Ole Holm Nielsen
Hi Nousheen, It seems that you have configured incorrectly the nodes in slurm.conf. I notice this: RealMemory=1 This means 1 Megabyte of RAM memory, we only had this with IBM PCs back in the 1980ies :-) See how to configure nodes in https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_confi

Re: [slurm-users] [External] ERROR: slurmctld: auth/munge: _print_cred: DECODED

2022-12-01 Thread Nousheen
Dear Robbert, Thankyou so much for your response. I was so focused on sync of time that I missed the date on one of the nodes which was 1 day behind as you said. I have corrected it and now i get the following output in status. *(base) [nousheen@nousheen slurm]$ systemctl status slurmctld.service

Re: [slurm-users] [External] ERROR: slurmctld: auth/munge: _print_cred: DECODED

2022-12-01 Thread Michael Robbert
I believe that the error you need to pay attention to for this issue is this line: Dec 01 16:17:19 nousheen slurmctld[1631]: slurmctld: error: Check for out of sync clocks It looks like your compute nodes clock is a full day ahead of your controller node. Dec. 2 instead of Dec. 1. The clo