Hi Carlos, Is there a way to test that? Are there certain ports that need to be 
open? Thanks.

From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of 
Carlos Fenoy
Sent: Tuesday, July 17, 2018 11:55 AM
To: Slurm User Community List
Subject: Re: [slurm-users] 'srun hostname' hangs on the command line

The communication from the compute nodes to the login nodes may be block by the 
firewall. That will prevent srun from running properly
Sent from my iPhone

On 17 Jul 2018, at 10:16, John Hearns 
<hear...@googlemail.com<mailto:hear...@googlemail.com>> wrote:
Ronan, as far as I can see this means that you cannot launch a job.

What state are the compute nodes in when you run sinfo?


On 17 July 2018 at 10:08, Buckley, Ronan 
<ronan.buck...@dell.com<mailto:ronan.buck...@dell.com>> wrote:
Yes, srun just hangs. Commands like sinfo and squeue run fine.
I also have no slurm logs in /var/log ??

From: slurm-users 
[mailto:slurm-users-boun...@lists.schedmd.com<mailto:slurm-users-boun...@lists.schedmd.com>]
 On Behalf Of John Hearns
Sent: Tuesday, July 17, 2018 8:57 AM

To: Slurm User Community List
Subject: Re: [slurm-users] 'srun hostname' hangs on the command line

Ronan, sorry to ask but this is a bit unclear.

Are you unable to launch ANY sessions with srun?
In which case you need to look at the logs to see why the job is not being 
scheduled.

Is it only the hostname command which fails?

I would guess very much you have already run an ssh into a node and run the 
hostname command manually.



On 17 July 2018 at 09:50, Buckley, Ronan 
<ronan.buck...@dell.com<mailto:ronan.buck...@dell.com>> wrote:
Yes I do.

From: slurm-users 
[mailto:slurm-users-boun...@lists.schedmd.com<mailto:slurm-users-boun...@lists.schedmd.com>]
 On Behalf Of Williams, Gareth (IM&T, Clayton)
Sent: Tuesday, July 17, 2018 12:33 AM
To: Slurm User Community List
Subject: Re: [slurm-users] 'srun hostname' hangs on the command line

Do you get the same problem as a non-root user?

From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of 
Buckley, Ronan
Sent: Tuesday, 17 July 2018 12:53 AM
To: slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>
Subject: [slurm-users] 'srun hostname' hangs on the command line

Hi All,

Verbose mode doesn’t show much.
I hashed out the hostnames.
Any ideas/suggestions?

# srun hostname
^Csrun: interrupt (one more within 1 sec to abort)
srun: task 0: unknown
^Z
[1]+  Stopped                 srun hostname
#

# srun -v hostname
srun: defined options for program `srun'
srun: --------------- ---------------------
srun: user           : `root'
srun: uid            : 0
srun: gid            : 0
srun: cwd            : /root
srun: ntasks         : 1 (default)
srun: nodes          : 1 (default)
srun: jobid          : 4294967294 (default)
srun: partition      : default
srun: profile        : `NotSet'
srun: job name       : `(null)'
srun: reservation    : `(null)'
srun: burst_buffer   : `(null)'
srun: wckey          : `(null)'
srun: cpu_freq_min   : 4294967294
srun: cpu_freq_max   : 4294967294
srun: cpu_freq_gov   : 4294967294
srun: switches       : -1
srun: wait-for-switches : -1
srun: distribution   : unknown
srun: cpu_bind       : default (0)
srun: mem_bind       : default (0)
srun: verbose        : 1
srun: slurmd_debug   : 0
srun: immediate      : false
srun: label output   : false
srun: unbuffered IO  : false
srun: overcommit     : false
srun: threads        : 60
srun: checkpoint_dir : /var/slurm/checkpoint
srun: wait           : 0
srun: nice           : -2
srun: account        : (null)
srun: comment        : (null)
srun: dependency     : (null)
srun: exclusive      : false
srun: bcast          : false
srun: qos            : (null)
srun: constraints    :
srun: geometry       : (null)
srun: reboot         : yes
srun: rotate         : no
srun: preserve_env   : false
srun: network        : (null)
srun: propagate      : NONE
srun: prolog         : (null)
srun: epilog         : (null)
srun: mail_type      : NONE
srun: mail_user      : (null)
srun: task_prolog    : (null)
srun: task_epilog    : (null)
srun: multi_prog     : no
srun: sockets-per-node  : -2
srun: cores-per-socket  : -2
srun: threads-per-core  : -2
srun: ntasks-per-node   : -2
srun: ntasks-per-socket : -2
srun: ntasks-per-core   : -2
srun: plane_size        : 4294967294
srun: core-spec         : NA
srun: power             :
srun: remote command    : `hostname'
srun: Waiting for nodes to boot (delay looping 450 times @ 0.100000 secs x 
index)
srun: Nodes ####### are ready for job
srun: jobid 50871: nodes(1):`#######', cpu counts: 64(x1)
srun: launching 50871.0 on host #######, 1 tasks: 0
srun: route default plugin loaded
srun: error: timeout waiting for task launch, started 0 of 1 tasks
srun: Job step 50871.0 aborted before step completely launched.
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: Timed out waiting for job step to complete
#

Rgds



Reply via email to