Woops, I should clarify, no specific ssh command from VCL is hanging. It times out while waiting for the computer to respond to ssh. Any manual ssh commands that I run from the management node hang if I don't run them with the ConnectTimeout argument.
Cameron On Tue, Jan 28, 2014 at 4:04 PM, Aaron Coburn <[email protected]> wrote: > I encountered something similar a while ago -- though I believe it was in > version 2.2.1. Basically, if vcld sent an ssh command at a particular > moment as sshd is first starting up on the windows VM, the command could > hang and derail the entire workflow (image capture, image load, etc). This > hasn't happened in a while, and I believe that it was fixed in version 2.3. > > Well, at least, there is code in version 2.3 and later that can kill any > ssh commands if they exceed a certain length of time (by using the > 'timeout' option in the run_ssh_command() function call) > > Are you able to figure out what the ssh command is? Does it vary? (Not > all commands are sent with timeout values). If you encounter a hung ssh > command, you can usually find it by examining the processlist on the > management node and then make sure that that call was executed with a > timeout value. You may also want to verify that the ssh option -o > ConnectTimeout=X is part of the command passed to the VM. > > > > Aaron > > > On Jan 28, 2014, at 4:25 PM, Cameron Mann <[email protected]> > wrote: > > Hi Aaron, > > I haven't seen a case of one becoming unresponsive after running for a > while, it's always been from the moment they come online. > > We're running VCL 2.3. > > Cameron > > > On Tue, Jan 28, 2014 at 12:37 PM, Aaron Coburn <[email protected]>wrote: > >> Cameron, >> >> When this issue emerges, is it with VMs that have been running for a >> while and then become unresponsive, or are they unresponsive from the >> moment they come on line? >> >> Also, which version of the VCL are you using? >> >> Aaron >> >> >> >> >> -- >> Aaron Coburn >> System Administrator / Programmer >> Web Services, Amherst College >> >> >> >> >> On Jan 28, 2014, at 1:30 PM, Cameron Mann <[email protected]> >> wrote: >> >> Hi all, >> >> We've been running into an issue intermittently with sshd on some of >> our Windows images where it appears to be running but stops responding. >> >> Symptoms: >> - vm is pingable >> - ssh attempts hang, no error message >> - packet capture on the vm shows syn from client, syn ack from sshd, ack >> from client, then nothing >> - sshd.log appears normal >> - sshd process does not respond to stop/restart and must be killed >> manually, but starts accepting connections after being started again (full >> reboot also works) >> >> There's no apparent pattern between the failures that I've been able to >> find, even using the same image the failure doesn't happen reliably. I also >> haven't been able to isolate the problem to a specific subset of our images >> so I haven't been able to compare a broken installation with a working one. >> I've also tried updating all the Cygwin packages and re-running the >> cygwin-sshd-config.sh script which made no difference. >> >> Has anyone run into something similar? >> >> Thanks, >> Cameron >> >> >> > >
