Hi Everyone, Will try to keep this brief - we are testing slurm 2.6.2 on our Cray test system (a TDS for our XE6). We have slurmctld running on the sdb and slurmd running on a set of service nodes. Users log into an external server (an esLogin node) and can successfully run batch jobs with sbatch.
Our trouble is with interactive jobs. These jobs run as expected when submitted from the sdb, which is internal to the Cray mainframe. But from the external esLogin host, there is problem with the Cray job service - esLogin> salloc --partition=debug --nodes=1 --time=1:00 --exclusive hostname Can't open proc file /proc/job salloc: error: No SGI job container ID detected - please enable the Cray job service via /etc/init.d/job salloc: Granted job allocation 90 esLogin salloc: Relinquishing job allocation 90 salloc: Job allocation 90 has been revoked. salloc is trying to grant an allocation on the esLogin server itself (which is not and should be not running the Cray job service) instead of on one of the internal nodes running slurmd. Should this be able to work? What am I missing? Thanks and apologies in advance if this is a slurm 101 mistake, James =