Hi Javier,

See my previous email. Another scenario is when "/tmp/one-collectd-client.pid" does not exist due to issues with /tmp.

A change seems to have been made to put a pid file in /tmp instead of /run or /var/run.

        Regards,
          Gerry


On 20/01/2014 17:44, Javier Fontan wrote:
I've been trying to reproduce the problem, that is, making OpenNebula
start a high amount of collectd-client processes. The only way I was
able to do it is when the file "/tmp/one-collectd-client.pid" exists
and has wrong permissions. Can you check the ownership and permissions
of that file?

On Mon, Jan 20, 2014 at 4:15 PM, Javier Fontan <jfon...@opennebula.org> wrote:
The problem seems to be the high amount of collectd processes running.
Try killing all "collectd-client.rb" processes. There should be only
one running per host.

In case you want to use the old method of monitoring you can follow this guide:

http://docs.opennebula.org/stable/administration/monitoring/imsshpullg.html#imsshpullg

On Mon, Jan 20, 2014 at 2:17 PM, Gerry O'Brien <ge...@scss.tcd.ie> wrote:
Hi Ruben,

     Below is the output of 'ps -ef | grep one' on a host that has been
disabled, rebooted and enabled. There are multiple versions of
collectd-client.rb kvm running.


     We have discovered today a serious issue that is having an adverse
effect on our DNS system. When the machines below was enabled, immediately
our DNS server is flooded with requests from the host (see a sample below).
      Our logs show that this has only started happening since the upgrade to
4.4. If we don't get a fix for this we will have to go back to 4.2, which is
something I really don't want to do.

         Regards,
             Gerry




oneadmin  3628     1  0 13:04 ?        00:00:00 ruby
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin  4600     1  0 13:05 ?        00:00:00 ruby
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin  6400     1  0 13:07 ?        00:00:00 ruby
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin  9003     1  0 13:08 ?        00:00:00 ruby
/var/tmp/one/im/kvm.d/collectd-client.rb kvm /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin 12953  3628  0 13:10 ?        00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin 12955  6400  0 13:10 ?        00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin 12969 12953  0 13:10 ?        00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin 12970 12969  0 13:10 ?        00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin 12972 12955  0 13:10 ?        00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin 12973 12972  0 13:10 ?        00:00:00 /bin/bash
/var/tmp/one/im/kvm.d/../run_probes kvm-probes /var/lib/one//datastores 4124
20 0 host101.scss.tcd.ie
oneadmin 13029 12973  0 13:10 ?        00:00:00 /bin/bash ./monitor_ds.sh
kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie
oneadmin 13030 12970  0 13:10 ?        00:00:00 /bin/bash ./monitor_ds.sh
kvm-probes /var/lib/one//datastores 4124 20 0 host101.scss.tcd.ie



-2014 13:14:26.675 client 134.226.59.101#52314: query: host101.scss.tcd.ie
IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query:
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.680 client 134.226.59.101#51356: query:
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query:
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.822 client 134.226.59.101#47870: query:
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:26.824 client 134.226.59.101#58734: query:
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.825 client 134.226.59.101#58734: query:
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query:
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.952 client 134.226.59.101#39659: query:
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:26.952 client 134.226.59.101#53975: query:
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:26.953 client 134.226.59.101#53975: query:
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query:
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:27.108 client 134.226.59.101#36294: query:
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: query:
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:27.109 client 134.226.59.101#59277: query:
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:27.347 client 134.226.59.101#49614: query:
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:27.348 client 134.226.59.101#49614: query:
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:27.350 client 134.226.59.101#44058: query:
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:27.357 client 134.226.59.101#44058: query:
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:27.458 client 134.226.59.101#51830: query:
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:27.458 client 134.226.59.101#51830: query:
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:27.461 client 134.226.59.101#38419: query:
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:27.461 client 134.226.59.101#38419: query:
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:31.184 client 134.226.59.101#38617: query:
host101.scss.tcd.ie IN A + (134.226.32.57)
20-Jan-2014 13:14:31.184 client 134.226.59.101#38617: query:
host101.scss.tcd.ie IN AAAA + (134.226.32.57)
20-Jan-2014 13:14:31.302 client 134.226







On 17/01/2014 17:45, Ruben S. Montero wrote:
Hi Gerry

Just to check, are you using 4.4 Final? We've seen this in the betas and
"thought" we fixed for the final version. Also could you check that there
are just one monitorization process at the hosts (collectd-client.sh, or
equiv should be the name of the process)

Also could you send us the lines from oned.log between Thu Jan 16 16:56:25
2014 and Thu Jan 16 17:25:43 2014; plus the first lines that includes you
oned.conf values (we are interested specially in those related to
monitoring interval)


Cheers

Ruben




On Fri, Jan 17, 2014 at 2:27 PM, Gerry O'Brien <ge...@scss.tcd.ie> wrote:

Hi,

      Below is a truncated log file for a VM. The monitor continually
cycles
through finding the machine RUNNING and stat UNKNOWN. This occurs for
many
many machines at the same time. All machines were created by a script.

      The VMs are Microsoft Windows 7 64bit Enterprise. Individual context
is created by a startup script. They run fine but eventually /var/log/one
is going overflow.

      Restarting oned seems to fix the problem but this is hardly a long
term solution.

      Any suggestions on what could be causing this?

          Regards,
              Gerry




Thu Jan 16 16:56:21 2014 [DiM][I]: New VM state is ACTIVE.
Thu Jan 16 16:56:22 2014 [LCM][I]: New VM state is PROLOG.
Thu Jan 16 16:56:22 2014 [VM][I]: Virtual Machine has no context
Thu Jan 16 16:56:22 2014 [LCM][I]: New VM state is BOOT
Thu Jan 16 16:56:22 2014 [VMM][I]: Generating deployment file:
/var/lib/one/vms/1788/deployment.0
Thu Jan 16 16:56:23 2014 [VMM][I]: ExitCode: 0
Thu Jan 16 16:56:23 2014 [VMM][I]: Successfully execute network driver
operation: pre.
Thu Jan 16 16:56:25 2014 [VMM][I]: ExitCode: 0
Thu Jan 16 16:56:25 2014 [VMM][I]: Successfully execute virtualization
driver operation: deploy.
Thu Jan 16 16:56:25 2014 [VMM][I]: ExitCode: 0
Thu Jan 16 16:56:25 2014 [VMM][I]: Successfully execute network driver
operation: post.
Thu Jan 16 16:56:25 2014 [LCM][I]: New VM state is RUNNING
Thu Jan 16 16:56:51 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan 16 16:59:01 2014 [VMM][I]: VM found again, state is RUNNING
Thu Jan 16 16:59:23 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan 16 17:01:41 2014 [VMM][I]: VM found again, state is RUNNING
Thu Jan 16 17:01:58 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan 16 17:04:18 2014 [VMM][I]: VM found again, state is RUNNING
Thu Jan 16 17:04:39 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan 16 17:06:55 2014 [VMM][I]: VM found again, state is RUNNING
Thu Jan 16 17:07:06 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan 16 17:09:31 2014 [VMM][I]: VM found again, state is RUNNING
Thu Jan 16 17:09:31 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan 16 17:12:22 2014 [VMM][I]: VM found again, state is RUNNING
Thu Jan 16 17:12:27 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan 16 17:15:11 2014 [VMM][I]: VM found again, state is RUNNING
Thu Jan 16 17:15:22 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan 16 17:17:49 2014 [VMM][I]: VM found again, state is RUNNING
Thu Jan 16 17:18:00 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan 16 17:20:27 2014 [VMM][I]: VM found again, state is RUNNING
Thu Jan 16 17:20:34 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan 16 17:23:04 2014 [VMM][I]: VM found again, state is RUNNING
Thu Jan 16 17:23:08 2014 [LCM][I]: New VM state is UNKNOWN
Thu Jan 16 17:25:41 2014 [VMM][I]: VM found again, state is RUNNING
Thu Jan 16 17:25:43 2014 [LCM][I]: New VM state is UNKNOWN

--
Gerry O'Brien

Systems Manager
School of Computer Science and Statistics
Trinity College Dublin
Dublin 2
IRELAND

00 353 1 896 1341

_______________________________________________
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org



--
Gerry O'Brien

Systems Manager
School of Computer Science and Statistics
Trinity College Dublin
Dublin 2
IRELAND

00 353 1 896 1341

_______________________________________________
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


--
Javier Fontán Muiños
Developer
OpenNebula - The Open Source Toolkit for Data Center Virtualization
www.OpenNebula.org | @OpenNebula | github.com/jfontan




--
Gerry O'Brien

Systems Manager
School of Computer Science and Statistics
Trinity College Dublin
Dublin 2
IRELAND

00 353 1 896 1341

_______________________________________________
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

Reply via email to