[slurm-users] Error running jobs with srun

2017-11-08 Thread Elisabetta Falivene
I'm getting this message anytime I try to execute any job on my cluster. (node01 is the name of my first of eight nodes and is up and running) Trying a python simple script: *root@mycluster:/tmp# srun python test.py * *slurmd[node01]: error: task/cgroup: unable to build job physical cores* */usr/b

Re: [slurm-users] Error running jobs with srun

2017-11-08 Thread Elisabetta Falivene
Wow, thank you. There's a way to check which directories the master and The nodes share? Il mercoledì 8 novembre 2017, Lachlan Musicman ha scritto: > On 9 November 2017 at 09:19, Elisabetta Falivene > wrote: > >> I'm getting this message anytime I try to exec

Re: [slurm-users] Error running jobs with srun

2017-11-08 Thread Elisabetta Falivene
I am the admin and I have no documentation :D I'll try The third option. Thank you very much Il giovedì 9 novembre 2017, Lachlan Musicman ha scritto: > On 9 November 2017 at 10:35, Elisabetta Falivene > wrote: > >> Wow, thank you. There's a way to check which director

Re: [slurm-users] Error running jobs with srun

2017-11-09 Thread Elisabetta Falivene
raised before the execution of the job. What does it mean? Thank you, thank you, thank you! 2017-11-09 1:07 GMT+01:00 Lachlan Musicman : > On 9 November 2017 at 10:54, Elisabetta Falivene > wrote: > >> I am the admin and I have no documentation :D I'll try The third option

[slurm-users] Cluster not booting after upgrade to debian jessie

2018-01-08 Thread Elisabetta Falivene
Here I am again. In the end, I did the upgrade from debian 7 wheezy to debian 8 jessie in order to update Slurm and solve some issues with it. It seemed it all went well. Even slurm problem seemed solved. Then I rebooted the machine and the problems began. I can't boot the master anymore returning

Re: [slurm-users] Cluster not booting after upgrade to debian jessie

2018-01-09 Thread Elisabetta Falivene
yboard so i'm truly able to do anything. 2018-01-08 12:26 GMT+01:00 Markus Köberl : > On Monday, 8 January 2018 11:39:32 CET Elisabetta Falivene wrote: > > Here I am again. > > In the end, I did the upgrade from debian 7 wheezy to debian 8 jessie in > > order to update Slur

Re: [slurm-users] Cluster not booting after upgrade to debian jessie

2018-01-09 Thread Elisabetta Falivene
al ramdisk and make sure it has the modules you need. > > So boot the system in kernel 3.2 and then run: > mkinitrd 3.16.0-4-amd64 > > > How was the kernel version 3.16.0-4-amd64 installed? > > > On 9 January 2018 at 13:16, Elisabetta Falivene > wrote: > >>

Re: [slurm-users] Cluster not booting after upgrade to debian jessie

2018-01-09 Thread Elisabetta Falivene
> > > Let me guess: you're running multi-socket systems, and the kernel > version behind that "3.16.0-4" label is 3.16.51-2, not 3.16.43-2? > Nope. On the nodes the version is 3.16.43-2, and on the master dpkg points that the unloaded kernel is 3.16.43-2+deb8u5 > There seems to be an issue with

Re: [slurm-users] Cluster not booting after upgrade to debian jessie

2018-01-09 Thread Elisabetta Falivene
> Ciao Elisabetta, > Ciao Gennaro! :) > > On Tue, Jan 09, 2018 at 01:40:19PM +0100, Elisabetta Falivene wrote: > > The new kernel was installed during an upgrade from Debian 7 Wheezy to > > Debian 8 Jessie. The upgrade went ok on the 8 nodes of the cluster, but > not

[slurm-users] Slurm not starting

2018-01-15 Thread Elisabetta Falivene
I did an upgrade from wheezy to jessie (automatically with a normal dist-upgrade) on a cluster with 8 nodes (up, running and reachable) and from slurm 2.3.4 to 14.03.9. Overcame some problems booting kernel (thank you vey much to Gennaro Oliva, btw), now the system is running correctly with kernel

Re: [slurm-users] Slurm not starting

2018-01-15 Thread Elisabetta Falivene
> Anyway I suggest to update the operating system to stretch and fix your > configuration under a more recent version of slurm. I think I'll soon arrive to that :) b 2018-01-15 14:08 GMT+01:00 Gennaro Oliva : > Ciao Elisabetta, > > On Mon, Jan 15, 2018 at 01:13:27PM +0100,

Re: [slurm-users] Slurm not starting

2018-01-15 Thread Elisabetta Falivene
log or running "slurmd -Dvvv" > > > On Jan 15, 2018 06:42, "Elisabetta Falivene" > wrote: > >> > Anyway I suggest to update the operating system to stretch and fix your >> > configuration under a more recent version of slurm. >> >> I

Re: [slurm-users] Slurm not starting

2018-01-15 Thread Elisabetta Falivene
15 16:43 GMT+01:00 Carlos Fenoy : > Are you trying to start the slurmd in the headnode or a compute node? > > Can you provide the slurm.conf file? > > Regards, > Carlos > > On Mon, Jan 15, 2018 at 4:30 PM, Elisabetta Falivene < > e.faliv...@ilabroma.com> wrote: >

Re: [slurm-users] Slurm not starting

2018-01-15 Thread Elisabetta Falivene
atch queueing system is due to hostname resolution" >> >> >> On 15 January 2018 at 16:30, Elisabetta Falivene > > wrote: >> >>> slurmd -Dvvv says >>> >>> slurmd: fatal: Unable to determine this slurmd's NodeName >>> >&g

Re: [slurm-users] Slurm not starting

2018-01-15 Thread Elisabetta Falivene
Fenoy : > Hi, > > you can not start the slurmd on the headnode. Try running the same command > on the compute nodes and check the output. If there is any issue it should > display the reason. > > Regards, > Carlos > > On Mon, Jan 15, 2018 at 4:50 PM, Elisabetta Faliv

Re: [slurm-users] Slurm not starting

2018-01-16 Thread Elisabetta Falivene
> > slurmd: debug2: _slurm_connect failed: Connection refused >> slurmd: debug2: Error connecting slurm stream socket at 192.168.1.1:6817: >> Connection refused >> > > This sounds like the compute node cannot connect back to > slurmctld on the management node, you should check that the > IP address

Re: [slurm-users] Slurm not starting

2018-01-16 Thread Elisabetta Falivene
> It seems like the pidfile in systemd and slurm.conf are different. Check > if they are the same and if not adjust the slurm.conf pid files. That > should prevent systemd from killing slurm. > Emh, sorry, how I can do this? > On Mon, 15 Jan 2018, 18:24 Elisabetta Falivene, &g

Re: [slurm-users] Slurm not starting

2018-01-16 Thread Elisabetta Falivene
-01-16 13:25 GMT+01:00 Elisabetta Falivene : > > It seems like the pidfile in systemd and slurm.conf are different. Check >> if they are the same and if not adjust the slurm.conf pid files. That >> should prevent systemd from killing slurm. >> > Emh, sorry, how I can d

Re: [slurm-users] Slurm not starting

2018-01-17 Thread Elisabetta Falivene
Ciao Gennaro! > > *NodeName=node[01-08] CPUs=16 RealMemory=16000 State=UNKNOWN* > > to > > *NodeName=node[01-08] CPUs=16 RealMemory=15999 State=UNKNOWN* > > > > Now, slurm works and the nodes are running. There is only one minor > problem > > > > *error: Node node04 has low real_memory size (7984

[slurm-users] Slurm and available libraries

2018-01-17 Thread Elisabetta Falivene
Hi, let's say I need to execute a python script with slurm. The script require a particular library installed on the system like numpy. If the library is not installed to the system, it is necessary to install it on the master AND the nodes, right? This has to be done on each machine separately or

Re: [slurm-users] Slurm and available libraries

2018-01-18 Thread Elisabetta Falivene
So EasyBuild + Lmod seems the best solution. I'll try. :) Thank you all! betta 2018-01-17 17:53 GMT+01:00 Christopher Samuel : > On 18/01/18 03:50, Patrick Goetz wrote: > > Can anyone shed some light on the situation? I'm very surprised that >> a module script isn't just an explicit command that