Hi Everyone,

Will try to keep this brief - we are testing slurm 2.6.2 on our Cray test 
system (a TDS for our XE6).  We have slurmctld running on the sdb and slurmd 
running on a set of service nodes.  Users log into an external server (an 
esLogin node) and can successfully run batch jobs with sbatch.

Our trouble is with interactive jobs.  These jobs run as expected when 
submitted from the sdb, which is internal to the Cray mainframe.  But from the 
external esLogin host, there is problem with the Cray job service - 

esLogin> salloc --partition=debug --nodes=1 --time=1:00 --exclusive hostname
Can't open proc file /proc/job
salloc: error: No SGI job container ID detected - please enable the Cray job 
service via /etc/init.d/job
salloc: Granted job allocation 90
esLogin
salloc: Relinquishing job allocation 90
salloc: Job allocation 90 has been revoked.

salloc is trying to grant an allocation on the esLogin server itself (which is 
not and should be not running the Cray job service) instead of on one of the 
internal nodes running slurmd.  

Should this be able to work?  What am I missing?

Thanks and apologies in advance if this is a slurm 101 mistake, James           
                          =

Reply via email to