Hi Florian, Neil, DuDu, You are very right, one possible reason for this error message is OpenNEbula attempting two simultaneous monitoring in the same host. One possible solution is to increase the HOST_MONITORING_INTERVAL (in our latest development revision of OpenNebula, we already increased that to 10 minutes, 600 seconds). And, of course, using the snmp driver also proved to be a great solution for this scalability issue.
Hope it helps, -Tino -- Constantino Vázquez Blanco | dsa-research.org/tinova Virtualization Technology Engineer / Researcher OpenNebula Toolkit | opennebula.org On Mon, Jul 19, 2010 at 6:18 PM, Floris Sluiter <floris.slui...@sara.nl>wrote: > Hi Dudu, Tino and all, > > > > We have seen the exact same message (Command execution fail and bad > interpreter: Text file busy)) on our cluster last week when we expanded it > from 12 to 16 hosts (with add host)and deploying 10 Vmachines at the same > time. We did not have multiple instances of opennebula running, we only > added to a running one, so it is unlikely that was the issue (the cluster > was already running stable for a while). We investigated and thought it was > a timing issue with the monitoring (ssh) driver set to 60 seconds and having > many hosts and many VMs. > > We started using the ssh-monitoring driver again in after the latest update > to opennebula, before that we used our in hous developed snmp monitoring > driver. > > When we deployed our snmp driver, the error message stopped and for the > last week we have a stable cloud again, now with 16 hosts… > > For people who think see the same timing issues as we did , the snmp_driver > is available in the ecosystem (but make sure you know what snmp is before > you try ;-)): http://opennebula.org/software:ecosystem:snmp_im_driver > > Regards, > > > > Floris > > HPC project leader > > Sara > > > > > > *From:* users-boun...@lists.opennebula.org [mailto: > users-boun...@lists.opennebula.org] *On Behalf Of *Tino Vazquez > *Sent:* maandag 19 juli 2010 16:15 > *To:* DuDu > *Cc:* users@lists.opennebula.org > *Subject:* Re: [one-users] oned hang > > > > Dear DuDu, > > > > This happens when two monitorization actions take place at the same time. > > > > First thing, which OpenNebula version are you using? > > > > Are you per chance running two OpenNebula instances? Did you change the > host polling time? > > > > Regards, > > > > -Tino > > > -- > Constantino Vázquez Blanco | dsa-research.org/tinova > Virtualization Technology Engineer / Researcher > OpenNebula Toolkit | opennebula.org > > On Wed, Jul 14, 2010 at 3:13 PM, DuDu <black...@gmail.com> wrote: > > > > Hi, > > > > We deployed a small cluster of opennebula, with 8 hosts. It is the default > opennebula installation, however, we found that after several days of > running, oned hung. All CLI commands hang too. No new logs generated in > one_xmlrpc.log. And there are quite some error message like the following in > oned.log: > > > > [r...@vm-container-31-0 logdir]# tail oned.log > Wed Jul 14 14:51:02 2010 [InM][I]: Warning: untrusted X11 forwarding setup > failed: xauth key data not generated > Wed Jul 14 14:51:02 2010 [InM][I]: Warning: No xauth data; using fake > authentication data for X11 forwarding. > Wed Jul 14 14:51:02 2010 [InM][I]: bash: > /tmp/one-im//one_im-c4718299a313d89398ea693104dcce5f: /bin/sh: bad > interpreter: Text file busy > Wed Jul 14 14:51:02 2010 [InM][I]: ExitCode: 126 > Wed Jul 14 14:51:02 2010 [InM][I]: Command execution fail: 'mkdir -p > /tmp/one-im/; cat > /tmp/one-im//one_im-f3817715aa24450225bafb4c19b23822; if > [ "x$?" != "x0" ]; then exit -1; fi; chmod +x > /tmp/one-im//one_im-f3817715aa24450225bafb4c19b23822; > /tmp/one-im//one_im-f3817715aa24450225bafb4c19b23822' > Wed Jul 14 14:51:02 2010 [InM][I]: STDERR follows. > Wed Jul 14 14:51:02 2010 [InM][I]: Warning: untrusted X11 forwarding setup > failed: xauth key data not generated > Wed Jul 14 14:51:02 2010 [InM][I]: Warning: No xauth data; using fake > authentication data for X11 forwarding. > Wed Jul 14 14:51:02 2010 [InM][I]: bash: > /tmp/one-im//one_im-f3817715aa24450225bafb4c19b23822: /bin/sh: bad > interpreter: Text file busy > Wed Jul 14 14:51:02 2010 [InM][I]: ExitCode: 126 > > > > We have to sigkill oned and restart it. And that solves all problems. > > > > Any idea of this? > > > > Thanks! > > > _______________________________________________ > Users mailing list > Users@lists.opennebula.org > http://lists.opennebula.org/listinfo.cgi/users-opennebula.org > > >
_______________________________________________ Users mailing list Users@lists.opennebula.org http://lists.opennebula.org/listinfo.cgi/users-opennebula.org