Hi Florian, Neil, DuDu,

You are very right, one possible reason for this error message is OpenNEbula
attempting two simultaneous monitoring in the same host. One possible
solution is to increase the HOST_MONITORING_INTERVAL (in our latest
development revision of OpenNebula, we already increased that to 10 minutes,
600 seconds). And, of course, using the snmp driver also proved to be a
great solution for this scalability issue.

Hope it helps,

-Tino

--
Constantino Vázquez Blanco | dsa-research.org/tinova
Virtualization Technology Engineer / Researcher
OpenNebula Toolkit | opennebula.org


On Mon, Jul 19, 2010 at 6:18 PM, Floris Sluiter <floris.slui...@sara.nl>wrote:

>  Hi Dudu, Tino and all,
>
>
>
> We have seen the exact same message (Command execution fail and bad
> interpreter: Text file busy)) on our cluster last week when we expanded it
> from 12 to 16 hosts (with add host)and deploying 10 Vmachines at the same
> time. We did not have multiple instances of opennebula running, we only
> added to a running one,  so it is unlikely that was the issue (the cluster
> was already running stable for a while). We investigated and thought it was
> a timing issue with the monitoring (ssh) driver set to 60 seconds and having
> many hosts and many VMs.
>
> We started using the ssh-monitoring driver again in after the latest update
> to opennebula, before that we used our in hous developed snmp monitoring
> driver.
>
> When we deployed our snmp driver, the error message stopped and for the
> last week we have a stable cloud again, now with 16 hosts…
>
> For people who think see the same timing issues as we did , the snmp_driver
> is available in the ecosystem (but make sure you know what snmp is before
> you try ;-)): http://opennebula.org/software:ecosystem:snmp_im_driver
>
> Regards,
>
>
>
> Floris
>
> HPC project leader
>
> Sara
>
>
>
>
>
> *From:* users-boun...@lists.opennebula.org [mailto:
> users-boun...@lists.opennebula.org] *On Behalf Of *Tino Vazquez
> *Sent:* maandag 19 juli 2010 16:15
> *To:* DuDu
> *Cc:* users@lists.opennebula.org
> *Subject:* Re: [one-users] oned hang
>
>
>
> Dear DuDu,
>
>
>
> This happens when two monitorization actions take place at the same time.
>
>
>
> First thing, which OpenNebula version are you using?
>
>
>
> Are you per chance running two OpenNebula instances? Did you change the
> host polling time?
>
>
>
> Regards,
>
>
>
> -Tino
>
>
> --
> Constantino Vázquez Blanco | dsa-research.org/tinova
> Virtualization Technology Engineer / Researcher
> OpenNebula Toolkit | opennebula.org
>
>  On Wed, Jul 14, 2010 at 3:13 PM, DuDu <black...@gmail.com> wrote:
>
>
>
> Hi,
>
>
>
> We deployed a small cluster of opennebula, with 8 hosts. It is the default
> opennebula installation, however, we found that after several days of
> running, oned hung. All CLI commands hang too. No new logs generated in
> one_xmlrpc.log. And there are quite some error message like the following in
> oned.log:
>
>
>
> [r...@vm-container-31-0 logdir]# tail oned.log
> Wed Jul 14 14:51:02 2010 [InM][I]: Warning: untrusted X11 forwarding setup
> failed: xauth key data not generated
> Wed Jul 14 14:51:02 2010 [InM][I]: Warning: No xauth data; using fake
> authentication data for X11 forwarding.
> Wed Jul 14 14:51:02 2010 [InM][I]: bash:
> /tmp/one-im//one_im-c4718299a313d89398ea693104dcce5f: /bin/sh: bad
> interpreter: Text file busy
> Wed Jul 14 14:51:02 2010 [InM][I]: ExitCode: 126
> Wed Jul 14 14:51:02 2010 [InM][I]: Command execution fail: 'mkdir -p
> /tmp/one-im/; cat > /tmp/one-im//one_im-f3817715aa24450225bafb4c19b23822; if
> [ "x$?" != "x0" ]; then exit -1; fi; chmod +x
> /tmp/one-im//one_im-f3817715aa24450225bafb4c19b23822;
> /tmp/one-im//one_im-f3817715aa24450225bafb4c19b23822'
> Wed Jul 14 14:51:02 2010 [InM][I]: STDERR follows.
> Wed Jul 14 14:51:02 2010 [InM][I]: Warning: untrusted X11 forwarding setup
> failed: xauth key data not generated
> Wed Jul 14 14:51:02 2010 [InM][I]: Warning: No xauth data; using fake
> authentication data for X11 forwarding.
> Wed Jul 14 14:51:02 2010 [InM][I]: bash:
> /tmp/one-im//one_im-f3817715aa24450225bafb4c19b23822: /bin/sh: bad
> interpreter: Text file busy
> Wed Jul 14 14:51:02 2010 [InM][I]: ExitCode: 126
>
>
>
> We have to sigkill oned and restart it. And that solves all problems.
>
>
>
> Any idea of this?
>
>
>
> Thanks!
>
>
> _______________________________________________
> Users mailing list
> Users@lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
>
>
_______________________________________________
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

Reply via email to