And of course the error comes up again after sending the previous email...

However, I can report that this issue is not SSH related. I tried the 'builtin' 
option for the rsh and rlogin commands and I still see the same error.

Any other ideas?

Thanks,
Brendan

________________________________________
From: Brendan Moloney
Sent: Friday, November 09, 2012 3:31 PM
To: Reuti
Cc: [email protected]
Subject: RE: [gridengine users] Intermittent commlib errors with MPI jobs

I spent some time researching this issue in the context of OpenSSH and found 
some mentions of similar problems due to the initial handshake package being 
too large 
(http://serverfault.com/questions/265244/ssh-client-problem-connection-reset-by-peer).
  I was dubious that this was my problem but after manually specifying the 
cypher to use ('-c aes256-ctr') I haven't seen the problem again. With the 
number of submissions I have done now I would expect to have seen the issue 
several times, so I am fairly sure it is fixed.  Will keep an eye on it of 
course.

>>>> Sometimes I get "Connection reset by peer"
>
>After a long time or instantly? There are some setting in ssh to avoid a 
>timeout in ssh_config resp. ~/.ssh/config:
>
>Host *
>    Compression yes
>    ServerAliveInterval 900

Seems to happen fast enough that it is not a timeout issue.

>> I am indeed using SSH with a wrapper script for adding the group ID:
>>
>> qlogin_command               /usr/global/bin/qlogin-wrapper
>> qlogin_daemon                /usr/global/bin/rshd-wrapper
>> rlogin_command               /usr/bin/ssh
>> rlogin_daemon                /usr/global/bin/rshd-wrapper
>> rsh_command                  /usr/bin/ssh
>> rsh_daemon                   /usr/global/bin/rshd-wrapper

> It's also possible to set different methods for each of the three pairs. So, 
> rsh_command/rsh_daemon could be set to builtin and the others left as they 
> are. Would this be appropriate for your intended setup of X11 forwarding?

So using the builtin option would still allow enforcement of memory/time limits 
on parallel jobs?

Thanks,
Brendan

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to