Woops, I should clarify, no specific ssh command from VCL is hanging. It
times out while waiting for the computer to respond to ssh. Any manual ssh
commands that I run from the management node hang if I don't run them with
the ConnectTimeout argument.

Cameron


On Tue, Jan 28, 2014 at 4:04 PM, Aaron Coburn <[email protected]> wrote:

>  I encountered something similar a while ago -- though I believe it was in
> version 2.2.1. Basically, if vcld sent an ssh command at a particular
> moment as sshd is first starting up on the windows VM, the command could
> hang and derail the entire workflow (image capture, image load, etc). This
> hasn't happened in a while, and I believe that it was fixed in version 2.3.
>
>  Well, at least, there is code in version 2.3 and later that can kill any
> ssh commands if they exceed a certain length of time (by using the
> 'timeout' option in the run_ssh_command() function call)
>
>  Are you able to figure out what the ssh command is? Does it vary? (Not
> all commands are sent with timeout values). If you encounter a hung ssh
> command, you can usually find it by examining the processlist on the
> management node and then make sure that that call was executed with a
> timeout value. You may also want to verify that the ssh option -o
> ConnectTimeout=X is part of the command passed to the VM.
>
>
>
>  Aaron
>
>
>   On Jan 28, 2014, at 4:25 PM, Cameron Mann <[email protected]>
> wrote:
>
>  Hi Aaron,
>
>  I haven't seen a case of one becoming unresponsive after running for a
> while, it's always been from the moment they come online.
>
>  We're running VCL 2.3.
>
>  Cameron
>
>
> On Tue, Jan 28, 2014 at 12:37 PM, Aaron Coburn <[email protected]>wrote:
>
>>  Cameron,
>>
>>  When this issue emerges, is it with VMs that have been running for a
>> while and then become unresponsive, or are they unresponsive from the
>> moment they come on line?
>>
>>  Also, which version of the VCL are you using?
>>
>>  Aaron
>>
>>
>>
>>
>> --
>> Aaron Coburn
>> System Administrator / Programmer
>> Web Services, Amherst College
>>
>>
>>
>>
>>  On Jan 28, 2014, at 1:30 PM, Cameron Mann <[email protected]>
>> wrote:
>>
>>  Hi all,
>>
>>  We've been running into an issue intermittently with sshd on some of
>> our Windows images where it appears to be running but stops responding.
>>
>>  Symptoms:
>> - vm is pingable
>> - ssh attempts hang, no error message
>> - packet capture on the vm shows syn from client, syn ack from sshd, ack
>> from client, then nothing
>> - sshd.log appears normal
>> - sshd process does not respond to stop/restart and must be killed
>> manually, but starts accepting connections after being started again (full
>> reboot also works)
>>
>>  There's no apparent pattern between the failures that I've been able to
>> find, even using the same image the failure doesn't happen reliably. I also
>> haven't been able to isolate the problem to a specific subset of our images
>> so I haven't been able to compare a broken installation with a working one.
>> I've also tried updating all the Cygwin packages and re-running the
>> cygwin-sshd-config.sh script which made no difference.
>>
>>  Has anyone run into something similar?
>>
>>  Thanks,
>> Cameron
>>
>>
>>
>
>

Reply via email to