Dear SLURM, Running into this error(srun: debug3: got eof-stdout msg on _server_read header), which I believe is the root cause, that is causing an "interactive" job that is allocated resources to be immediately terminated. As far as SLURM and the job is concerned, the status of the job shows completed. I have ran out of all options, debugged pam stack and everything but couldn't get the remote bash shell launched. Next obvious step was to add debug in slurmstepd sources files and see if I can interpret what is going on but wanted to see if I can get helped here. Also wanted to mention I have another set of nodes that I have exactly the same setup and works perfectly fine for the interactive jobs. Only difference between the two node types is one is RHEL6.5 and one is RHEL6.7.
Any help is greatly appreciated. Thank you, Amit Here is what I tried and see .. #srun -d9 -p gpu --exclusive --pty $SHELL Or #srun -d9 -p gpu -n1 --x11=first --pty $SHELL srun: job 12908567 queued and waiting for resources srun: job 12908567 has been allocated resources srun: error: x11: job has no allocated nodes defined login01:/users/ahkumar $ With additional debug flags I see something is causing the stdout output return eof msg. Below is a snippet from SlurmdLogFile. I do have correctly installed the slurm-spank-x11 plugin [2016-10-02T21:43:39.825] [12894576.0] Handling REQUEST_SIGNAL_CONTAINER [2016-10-02T21:43:39.825] [12894576.0] _handle_signal_container for step=12894576.0 uid=0 signal=995 [2016-10-02T21:43:39.825] [12894576.0] Leaving _handle_request: SLURM_SUCCESS [2016-10-02T21:43:39.825] [12894576.0] Entering _handle_request [2016-10-02T21:43:39.825] [12894576.0] Leaving _handle_accept [2016-10-02T21:43:39.841] [12894576.0] Entering _task_read for obj 23a7240 [2016-10-02T21:43:39.841] [12894576.0] error in _task_read: Input/output error [2016-10-02T21:43:39.841] [12894576.0] got eof on task [2016-10-02T21:43:39.841] [12894576.0] ************************ -1 bytes read from task STDOUT [2016-10-02T21:43:39.841] [12894576.0] Entering _send_eof_msg [2016-10-02T21:43:39.841] [12894576.0] Myname in build_hashtbl: (slurmstepd) [2016-10-02T21:43:39.841] [12894576.0] ======================== Enqueued eof message [2016-10-02T21:43:39.841] [12894576.0] Leaving _send_eof_msg