My issue today is I can’t get interactive jobs to run, when started on login nodes. They run fine started from the head, but users are not allowed direct login there. The issue seems to be that SLURM_SUBMIT_HOST is getting set to the public network hostname instead of the private network hostname. For some reason this isn’t an issue with batch jobs. Any idea how to best control this behavior? From the compute node point of view the login node public address is not reachable. I am not exactly certain that variable is the direct culprit. But I get an error message from orte that leads you to believe mpi is trying to use the external address. I get bunches of logged martians on head coming from the login node kernels too.
I'm really new to SLURM. We have been on torque/maui for the past 3 years.
