Re: [slurm-users] Munge decode failing on new node

2020-04-17 Thread Dean Schulze
Just noticed this. On the problem node the munged.log file has an entry every 1:40: 2020-04-17 15:31:02 -0600 Info: Invalid credential 2020-04-17 15:32:42 -0600 Info: Invalid credential 2020-04-17 15:34:22 -0600 Info: Invalid credential This happens on the failed node and two

Re: [slurm-users] Munge decode failing on new node

2020-04-17 Thread Dean Schulze
Both work. The only discrepancy is that the slurm controller output had these two lines: UID: ??? (1000) GID: ??? (1000) Like the controller doesn't know the username for UID 1000. But it returned success 0 On Fri, Apr 17, 2020 at 2:00 PM Riebs, Andy wrote: > A

Re: [slurm-users] Munge decode failing on new node

2020-04-17 Thread Riebs, Andy
A couple of quick checks to see if the problem is munge: 1. On the problem node, try $ echo foo | munge | unmunge 2. If (1) works, try this from the node running slurmctld to the problem node slurm-node$ echo foo | ssh node munge | unmunge From: slurm-users

[slurm-users] Alternative to munge for use with slurm?

2020-04-17 Thread Dean Schulze
Is there an alternative to munge when running slurm? Munge issues are a common problem in slurm, and munge doesn't give any useful information when a problem occurs. An alternative that at least gave some useful information when a problem occurs would be a big improvement. Thanks.

Re: [slurm-users] Munge decode failing on new node

2020-04-17 Thread Dean Schulze
There is no ntp service running on any of my nodes, and all but this one is working. I haven't heard that ntp is a requirement for slurm, just that the time be synchronized across the cluster. And it is. On Wed, Apr 15, 2020 at 12:17 PM Carlos Fenoy wrote: > I’d check ntp as your encoding

Re: [slurm-users] follow-up: [Still broken]CentOS 7 CUDA 8.0 can't find plugin cons_tres

2020-04-17 Thread Lisa Kay Weihl
I went back and built the slurm-19.05.6 rpms using: rpmbuld -ta slurm-19.05.6.tar.bz2 for slurm-19.05.6. It still failed with: Error: Package: slurm-19.05.6-1.el7.x86_64 Requires: libnvidia-ml.so.1()(64bit) Now I remember why I went back to 18.08. It was because this post

Re: [slurm-users] [EXTERNAL] Follow-up-slurm-users Digest, Vol 30, Issue 32

2020-04-17 Thread Renfro, Michael
Can’t speak for everyone, but I went to Slurm 19.05 some months back, and haven't had any problems with CUDA 10.0 or 10.1 (or 8.0, 9.0, or 9.1). > On Apr 17, 2020, at 8:46 AM, Lisa Kay Weihl wrote: > > External Email Warning > > This email originated from outside the university. Please use

Re: [slurm-users] [EXTERNAL] Follow-up-slurm-users Digest, Vol 30, Issue 32

2020-04-17 Thread Lisa Kay Weihl
Wow. I did not catch that version issue. I saw that there were issues with the newest Slurm and how CUDA 10+ installs so I avoided that even though we have CUDA 8. I did have Slurm 19 downloaded so I'm thinking I ran into an issue with that and went back to 18 but now that I have more

Re: [slurm-users] slurm-20.02.1-1 failed rpmbuild with error File not found

2020-04-17 Thread Ole Holm Nielsen
On 17-04-2020 11:47, Ole Holm Nielsen wrote: On 17-04-2020 10:38, Christian Anthon wrote: It would be neat to have these build requirements / install requirements built into the spec file. I agree with you, and it seems that the SchedMD pages no longer list the build prerequisites (I think

Re: [slurm-users] slurm-20.02.1-1 failed rpmbuild with error File not found

2020-04-17 Thread Ole Holm Nielsen
On 17-04-2020 10:38, Christian Anthon wrote: It would be neat to have these build requirements / install requirements built into the spec file. I agree with you, and it seems that the SchedMD pages no longer list the build prerequisites (I think there was some information in the past). Try

Re: [slurm-users] slurm-20.02.1-1 failed rpmbuild with error File not found

2020-04-17 Thread Felix Farcas
Hello I did install  mariadb-server and mariadb-devel and all worked fine Thank you Felix On 4/17/2020 11:38 AM, Christian Anthon wrote: It would be neat to have these build requirements / install requirements built into the spec file. Cheers, Christian. On 17/04/2020 10.08, Ole Holm

Re: [slurm-users] slurm-20.02.1-1 failed rpmbuild with error File not found

2020-04-17 Thread Christian Anthon
It would be neat to have these build requirements / install requirements built into the spec file. Cheers, Christian. On 17/04/2020 10.08, Ole Holm Nielsen wrote: Hi Felix, Please make sure to install all prerequisite packages on the Slurm build host.  I have summarized this information in

Re: [slurm-users] slurm-20.02.1-1 failed rpmbuild with error File not found

2020-04-17 Thread Ole Holm Nielsen
Hi Felix, Please make sure to install all prerequisite packages on the Slurm build host. I have summarized this information in my Slurm Wiki page: https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#build-slurm-rpms /Ole On 17-04-2020 09:11, Felix Farcas wrote: I am trying to build a

Re: [slurm-users] slurm-20.02.1-1 failed rpmbuild with error File not found

2020-04-17 Thread Christian Anthon
As such it is a mistake in the rpm spec file. But you just need mariadb-devel, or possibly mysql-devel installed. Cheers, Christian. On 17/04/2020 09.11, Felix Farcas wrote: Hello I am trying to build a rpm for a new server and I get the following error: Requires(interp): /bin/sh /bin/sh

[slurm-users] slurm-20.02.1-1 failed rpmbuild with error File not found

2020-04-17 Thread Felix Farcas
Hello I am trying to build a rpm for a new server and I get the following error: Requires(interp): /bin/sh /bin/sh /bin/sh Requires(rpmlib): rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(CompressedFileNames) <= 3.0.4-1 Requires(post): /bin/sh Requires(preun):