And of course the error comes up again after sending the previous email... However, I can report that this issue is not SSH related. I tried the 'builtin' option for the rsh and rlogin commands and I still see the same error.
Any other ideas? Thanks, Brendan ________________________________________ From: Brendan Moloney Sent: Friday, November 09, 2012 3:31 PM To: Reuti Cc: [email protected] Subject: RE: [gridengine users] Intermittent commlib errors with MPI jobs I spent some time researching this issue in the context of OpenSSH and found some mentions of similar problems due to the initial handshake package being too large (http://serverfault.com/questions/265244/ssh-client-problem-connection-reset-by-peer). I was dubious that this was my problem but after manually specifying the cypher to use ('-c aes256-ctr') I haven't seen the problem again. With the number of submissions I have done now I would expect to have seen the issue several times, so I am fairly sure it is fixed. Will keep an eye on it of course. >>>> Sometimes I get "Connection reset by peer" > >After a long time or instantly? There are some setting in ssh to avoid a >timeout in ssh_config resp. ~/.ssh/config: > >Host * > Compression yes > ServerAliveInterval 900 Seems to happen fast enough that it is not a timeout issue. >> I am indeed using SSH with a wrapper script for adding the group ID: >> >> qlogin_command /usr/global/bin/qlogin-wrapper >> qlogin_daemon /usr/global/bin/rshd-wrapper >> rlogin_command /usr/bin/ssh >> rlogin_daemon /usr/global/bin/rshd-wrapper >> rsh_command /usr/bin/ssh >> rsh_daemon /usr/global/bin/rshd-wrapper > It's also possible to set different methods for each of the three pairs. So, > rsh_command/rsh_daemon could be set to builtin and the others left as they > are. Would this be appropriate for your intended setup of X11 forwarding? So using the builtin option would still allow enforcement of memory/time limits on parallel jobs? Thanks, Brendan _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
