[one-users] Fixing high cpu load due to leap second bug (Was: Sunstone does not load any stats)

Jhon Masschelein Thu, 05 Jul 2012 01:24:14 -0700

Hi,

As apprantly this is not yet general knowledge on this list: rebootingis usually not required to fix the leap second bug (unless the systemhas become unresponsive and you need to reset it, of course).

On all my servers (cloud and others) I was able to fix this by issuingthe command:


date -s "`date`"

(Credit for this fix goes to the tweakers.net site, but I cannot findthe exact page link anymore.)


Wkr,

Jhon

On 07/05/2012 09:24 AM, Tao Craig wrote:

Nice catch, Rolandas!

I'm sorry to clog up the list with such simple fixes, but I guess thisjust goes to show anyone who may be reading this that sometimes thefix really is as simple as a reboot. I should have done that a longtime ago, but I got distracted with all the details.

Anyway, I manually forced ntp to resync and when that had no effect -Irebooted both the private and public cloud controllers. Sure enough,everything is nice and fast and stable now... ruby CPU usage went from100+ percent down to 0.3 percent and all my graphs are online.


Thanks again, everyone!

----- Original Message ----- From: "Rolandas Naujikas"<rolandas.nauji...@mif.vu.lt>

To: <users@lists.opennebula.org>

Cc: "Tao Craig" <t...@leadmesh.com>; "Hector Sanjuan"<hsanj...@opennebula.org>

Sent: Wednesday, July 04, 2012 10:34 PM

Subject: Re: [one-users] Sunstone does not load any stats. (UsersDigest, Vol 53, Issue 14)

Hi,

If that started to appear in Monday after Saturday leap second thenit could be related tohttp://it.slashdot.org/story/12/07/01/1920217/leap-second-bug-causes-crashes.Our opennebula server had increased load (to ~20) after that also.Reboot helped.


Regards, Rolandas Naujikas

On 2012-07-05 05:48, Tao Craig wrote:

Hi Hector,

At first, I see the orange spinning balls... then after some time, this
is replaced with "undefined". I also noticed that ruby seems to be
fairly stable until I try to load these graphs. Then one ruby script
will jumpt to 100+ CPU usage and pretty much stay there. Sometimes, I
will get a "Could not connect..." alert during this time and ruby will
return to normal.

Seeing stuff like this when I trace the PID of the ruby script:

rt_sigreturn(0x1a)                      = 121
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---

sunstone.log:

Wed Jul 04 19:32:04 2012 [I]: xx.xxx.xxx.xxx - - [04/Jul/2012 19:32:04]
"GET /vmtemplate?timeout=true HTTP/1.1" 200 3907 53.3096
Wed Jul 04 19:32:08 2012 [I]: xx.xxx.xxx.xxx - - [04/Jul/2012 19:32:07]
"GET /acl?timeout=true HTTP/1.1" 200 377 33.8135
Wed Jul 04 19:32:12 2012 [I]: xx.xxx.xxx.xxx - - [04/Jul/2012 19:32:11]
"GET /vnet?timeout=true HTTP/1.1" 200 649 39.9651
Wed Jul 04 19:32:22 2012 [I]: xx.xxx.xxx.xxx - - [04/Jul/2012 19:32:22]
"GET /datastore?timeout=true HTTP/1.1" 200 2335 55.5165
Wed Jul 04 19:32:28 2012 [I]: xx.xxx.xxx.xxx - - [04/Jul/2012 19:32:28]
"GET /user?timeout=true HTTP/1.1" 200 1505 44.8972
Wed Jul 04 19:32:41 2012 [I]: xx.xxx.xxx.xxx - - [04/Jul/2012 19:32:41]
"GET /cluster?timeout=true HTTP/1.1" 200 27 26.3875
Wed Jul 04 19:32:44 2012 [I]: xx.xxx.xxx.xxx - - [04/Jul/2012 19:32:44]
"GET /image?timeout=true HTTP/1.1" 200 2957 65.6978
Wed Jul 04 19:32:48 2012 [I]: xx.xxx.xxx.xxx - - [04/Jul/2012 19:32:48]
"GET /host?timeout=true HTTP/1.1" 200 11110 139.9275
Wed Jul 04 19:32:51 2012 [I]: xx.xxx.xxx.xxx - - [04/Jul/2012 19:32:51]
"GET /group?timeout=true HTTP/1.1" 200 796 36.0109
Wed Jul 04 19:32:58 2012 [I]: xx.xxx.xxx.xxx - - [04/Jul/2012 19:32:58]
"GET /datastore?timeout=true HTTP/1.1" 200 2335 56.8328

sunstone.error is empty, except for this:

== Sinatra/1.3.2 has taken the stage on 9869 for development withbackup

from Thin

"gem list"

addressable (2.2.8)
amazon-ec2 (0.9.17)
bcrypt-ruby (3.0.1)
curb (0.8.0)
daemons (1.1.8)
data_mapper (1.2.0)
data_objects (0.10.8)
dm-aggregates (1.2.0)
dm-constraints (1.2.0)
dm-core (1.2.0)
dm-do-adapter (1.2.0)
dm-migrations (1.2.0)
dm-mysql-adapter (1.2.0)
dm-serializer (1.2.1)
dm-sqlite-adapter (1.2.0)
dm-timestamps (1.2.0)
dm-transactions (1.2.0)
dm-types (1.2.1)
dm-validations (1.2.0)
do_mysql (0.10.8)
do_sqlite3 (0.10.8)
eventmachine (0.12.10)
fastercsv (1.5.4)
json (1.7.0, 1.6.7)
json_pure (1.6.7)
multi_json (1.0.4)
mysql (2.8.1)
net-ldap (0.3.1)
nokogiri (1.5.2)
rack (1.4.1)
rack-protection (1.2.0)
rake (0.8.7)
sequel (3.35.0)
sinatra (1.3.2)
sqlite3 (1.3.6)
stringex (1.3.3)
thin (1.3.1)
tilt (1.3.3)
uuidtools (2.1.2)
xml-simple (1.1.1)

I am running sunstone and opennebula on the same box... it does seem to
be ruby related, but I never had a problem until Monday. Prior to that,

nothing had changed on my end. I just came into the office on Mondayanddiscvoered I could not log in to the older, private cloud and thepublic

cloud was very slow. Upgrading the public cloud seemed to help (aside
from the issues mentioned above), but I can't upgrade the private cloud
just yet and I would rather identify the source of this problem first.

The CLI is very fast and responsive and other tools such as, VNCconsole

work fine.

I haven't really been using the self-service portal (although I would
like to in the future), but when I try to start it -I get the following
error:

Wed Jul 04 19:43:17 2012 [E]: Error initializing authentication system
Wed Jul 04 19:43:17 2012 [E]: [UserPoolInfo] User couldn't be
authenticated, aborting call.

Thanks again for your help.

----- Original Message ----- From: "Hector Sanjuan"
<hsanj...@opennebula.org>
To: <users@lists.opennebula.org>; "Tao Craig" <t...@leadmesh.com>
Sent: Wednesday, July 04, 2012 4:06 PM
Subject: Re: [one-users] Sunstone does not load any stats.


Hi,

monitoring graphs on my hosts and virtual machines no longer appear.

Is there an empty graph in place or is there an error message? Ifyou canattach sunstone.log and sunstone.error (if not empty) after tryingto see

those graphs etc. perhaps I see something...

It's not normal that the dashboard takes 30secs to load. I guess theCLI

is not so slow when issuing a listing command (onehost list, onevm list
etc..) right?

And what is the ruby script consuming 100% exactly? (grep pid from 'ps

aux' or press 'c' during the execution of 'top' to find the fullcommand).


If you have this long-wait problem in two different clouds and ruby is
consuming so much cpu I would think there is an issue with your boxes
configuration related to ruby perhaps. What's the output of 'gem list'?
Are you running sunstone and opennebula on the same box? Have you tried
Self-Service interface? Is it so slow as well?

Hector

En Thu, 05 Jul 2012 00:40:19 +0200, Tao Craig <t...@leadmesh.com>escribi?:

Hector,

Thanks for the prompt reply. I am ashamed to admit that browser cache
was the problem in this case. The dashboard still takes about 30
seconds  to load, but at least it is loading now. I noticed a few
other minor  issues though that I can not track down in my logs. For
example, the  monitoring graphs on my hosts and virtual machines no
longer appear.

... any advice?

Part of the reason I didn't catch the browser cache issue earlier is
because I have a second CentOS/KVM cloud running version 3.2.0 and the
dashboard recently stopped loading on it as well. This was not fixed
by clearing my browser cache. Eventually, I get a "Could not
connect..." alert and the page never finishes loading. During this
time, there is a ruby script consuming 100+ percent of CPU resources.
When I kill this script, the cloud is still functional but Sunstone is
no longer running.

The logs all appear normal as far as I can tell and all CLI commnands
work without error. Any suggestions here would be greatly appreciated
as well.

Thanks.
----- Original Message ----- From: "Hector Sanjuan"
<hsanj...@opennebula.org>
To: <users@lists.opennebula.org>
Sent: Wednesday, July 04, 2012 4:17 AM
Subject: Re: [one-users] Sunstone does not load any stats.

Hello,

can you try to remove browsers cache and see if that fixes it?

Hector

En Wed, 04 Jul 2012 02:10:34 +0200, Tao Craig <t...@leadmesh.com>
escribi?:

Hi everybody,

I recently upgraded my CentOS Open Nebula installation from 3.4 to
3.6 (Lagoon).

Prior to the upgrade, I noticed my Sunstone dashboard was loading
slowly on login (the page would load fine, but it took awhile to
load the graphs, number of hosts, etc). I saw there were some
improvements with the Sunstone dashboard with this upgrade, so I
applied it hoping it would help. Now, my Sunstone dashboard doesn't
load any stats or graphs... I just see those spinning orange dots
and the rest of the Sunstone interface does not work either (I'm
assuming because this information is never gathered).

There are no errors in my logs anywhere that I can find. The only
thing I am noticing is that ruby scripts are consuming a large
amount  of CPU resources.

If it helps, I am currently running 13 virtual machines on 9 hosts
and all "one" CLI commands work fine.

Any help would be appreciated.

Thanks.



-- Hector Sanjuan
OpenNebula Developer
_______________________________________________
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2193 / Virus Database: 2437/5109 - Release Date:
07/03/12





-----
No virus found in this message.
Checked by AVG - www.avg.com

Version: 2012.0.2193 / Virus Database: 2437/5111 - Release Date:07/04/12


_______________________________________________
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


--
Jhon Masschelein
Senior Systeemprogrammeur
SARA - HPCV

Science Park 140
1098 XG Amsterdam
T +31 (0)20 592 8099
F +31 (0)20 668 3167
M +31 (0)6 4748 9328
E jhon.masschel...@sara.nl
http://www.sara.nl



_______________________________________________
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

[one-users] Fixing high cpu load due to leap second bug (Was: Sunstone does not load any stats)

Reply via email to