Hi,

Just some additional testing results ...

Our IT guy turned off the firewall on a Submit Host and Execution Host for 
experimental purposes.  That got me further but not all the way.  Here is the 
verbose log from qrsh:

waiting for interactive job to be scheduled ...
Your interactive job 460937 has been successfully scheduled.
Establishing /usr/bin/ssh -X session to host sim.domain.com ...
ssh_exchange_identification: Connection closed by remote host
/usr/bin/ssh -X exited with exit code 255
reading exit code from shepherd ... 129

We aren't yet able to get around the ssh -X error.  Any ideas?

But even if we could, we still need to figure out which ports of the firewall 
need to be opened up.  Every time we ran an experiment, the port number that 
was used for SSH was different.  I hope we don't have to open up too big a 
range of ports.

Feedback would be welcomed.

Best regards,

-- 
Mun



> -----Original Message-----
> Hi William, et al.,
> 
> > On Mon, May 11, 2020 at 09:30:14PM +0000, Mun Johl wrote:
> > > Hi William, et al.,
> > > [Mun] Thanks for the tip; I'm still trying to get back to where I can 
> > > launch qsrh again.  Even after I put the requisite
> /etc/pam.d/sshd
> > line at the head of the file I'm still getting the "Your "qrsh" request 
> > could not be scheduled, try again later." message for some
> reason.
> > But I will continue to debug that issue.
> >
> > The pam_sge-qrsh-setup.so shouldn't have anything to do with this since
> > the message occurs before any attempt to launch the job.  You could try
> > running a qrsh -w p or and/or qrsh -w v to get a report on why the qrsh
> > isn't being scheduled.  They aren't always easy to read and -w v doesn't
> > reliably ignore exclusive vars in use but can nevertheless be helpful.
> 
> [Mun] With 'qrsh -w p' and 'qrsh -w v' I got the following output:
> verification: found suitable queue(s)
> 
> I then replaced the -w option with -verbose which produced the following 
> output:
> 
> waiting for interactive job to be scheduled ...timeout (54 s) expired while 
> waiting on socket fd 4
> Your "qrsh" request could not be scheduled, try again later.
> 
> I have no idea what is meant by "socket fd 4"; but that leads me to believe 
> we have some sort of blocked port or something.
> 
> Are there any additional ports that need to be opened up in order to use 
> 'qrsh & ssh -X' ?
> 
> One last noteworthy item that recently occurred to me is that when SGE was 
> initially installed on our servers, we had a different
> domain name.  Late last year we were acquired and our domain changed.  
> However, our /etc/hosts still has the old domain simply
> because SGE couldn't deal with the change in the domain--or rather, it was 
> the easiest course of action for me to take and keep SGE
> working.  I wonder if that is in some way interfering with 'qrsh & ssh -X'?
> 
> I am going to try and do some additional debug today and will report any 
> progress.
> 
> Thank you and regards,
> 
> --
> Mun

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to