Title: Re: [smokeping-users] Severe lag when restarting Smokeping and webpage timing out.
[Tue Mar 04 15:15:36 2014] [warn] [client 192.168.1.66] mod_fcgid: read data timeout in 40 seconds, referer: http://pipeline/

Looks like the smokeping cgi times out reading data. 
       Is this box I/O bound?
       What does top show when you try to get a web-page from SP? [load averages in particular]

In any case, you need to figure out why the CGI is failing to read the data in the allowed time of 40 secs.
Changing the default time-out might help if the box is I/O bound, but not totally buried. [And I'm not sure where that might be.]

However, if the box is seriously overloaded I/O wise, then waiting longer won't really solve your problem - it will just push the box further below the water.
[And this all gets back to - how many RRD's and how big are they. See the database section. Are there slaves? If so, how many?]

Finally:
>Is fping being ran as soon as the cgi script is executed from the webserver?

You appear to misunderstand how SP works. The daemon runs fping and logs the results and writes to the RRD's. The CGI pulls data from the RRD and generates graphs for the http output.

It appears from the debug log from SP that writing the data went fine. [At least for the small subset of targets.]
However reading the RRD's and generating the graphs appears to fail/timeout when reading the RRD's. [Or reading something - in any case.]

Is selinux or apparmour running? If so, then stop them or run in permissive mode and see if that helps.


-Greg


Forgot to add the smoke.log:

http://pastebin.com/20UbvJVx

At the bottom of the log you can see that I also tried timing fping (the same command that smokeping was running) and it looks like it took 19.3 seconds to run for a small number of machines. Would that cause it to time out? Is fping being ran as soon as the cgi script is executed from the webserver?



On Tue, Mar 4, 2014 at 4:10 PM, Brett Bronson <
brett.bron...@bigblockla.com> wrote:
Here is the apache error log that is listing smokeping:
http://pastebin.com/Knm1Cmw1

As for debug mode, here's my output:
http://pastebin.com/8txnhnkv

The host names do resolve; here's an example:
[04:07 PM]superuser@pipeline[/opt/smokeping/bin] > time fping larender001a
larender001a is alive

real    0m0.014s
user    0m0.000s
sys     0m0.000s



On Tue, Mar 4, 2014 at 3:32 PM, Brett Bronson <
brett.bron...@bigblockla.com> wrote:
Also, it looks like the version I have running is actually the latest, I assumed it would output the version as 2.6.9. Sorry


On Tue, Mar 4, 2014 at 3:29 PM, Brett Bronson <
brett.bron...@bigblockla.com> wrote:
Okay, it looks like I was actually using an older version of smokeping. I've removed it and installed the latest version on the site and my config is as follows: 
http://pastebin.com/ZsLE8uCp

Before, I was able to get smokeping to work fine up until I added the section:

+ nodes
menu = Render Node Latency
title = Render Node Latency (ICMP Pings)

++ larender001a
host = larender001a
++ larender001b
host = larender001b
++ larender001c
host = larender001c
++ larender001d
host = larender001d

++ larender002a
host = larender002a
++ larender002b
host = larender002b
++ larender002c
host = larender002c
++ larender002d
host = larender002d



Now that I look at the logs, it looks like it's still using the old version....
[ ... ]
Tue Mar  4 15:03:05 2014 - FPing: probing 5 targets with step 300 s and offset 116 s.
Tue Mar  4 15:16:01 2014 - Smokeping version 2.006009 successfully launched.
Tue Mar  4 15:16:01 2014 - Not entering multiprocess mode for just a single probe.
Tue Mar  4 15:16:01 2014 - FPing: probing 13 targets with step 300 s and offset 163 s.
Tue Mar  4 15:25:59 2014 - Smokeping version 2.006009 successfully launched.
Tue Mar  4 15:25:59 2014 - Not entering multiprocess mode for just a single probe.
Tue Mar  4 15:25:59 2014 - FPing: probing 13 targets with step 300 s and offset 159 s.

Before, I used sudo apt-get install smokeping to install, but I later removed it using sudo apt-get remove smokeping; however, it looks like it didn't remove the old version? Any idea how I could resolve this so that it loads up the newer version?





On Tue, Mar 4, 2014 at 2:28 PM, Gregory Sloop <
gr...@sloop.net> wrote:
I don't see a database section, so I assume it's somewhere else. [Nothing looks obviously wrong - but that was just a quick glance.]

But when you first start SP after adding a bunch of targets, it's going to have to allocate/create the RRD for each of the targets. 
[Also, are there slaves, because it will create X * 60 new RRD's - where X is how many slave SP instances you have. (In addition to the master RRD's) ]

I wouldn't think that would take 10m, but I can't see how much data you're stuffing in each RRD, or if you have slaves, which might help explain it.

As to why web-pages won't work, I'm not sure. Have you looked at the apache logs to see what they say? Or run SP in debug mode? [smokeping --debug
IIRC]

-Greg


Hello,

I recently updated my smokeping Target configuration to include about 60 of our machines in our render farm and noticed that restarting the smokeping service took about 10 minutes, and now our webpage will not load.

Any ideas?

My config:
http://pastebin.com/ibNmGhAF


-- 
Brett Bronson
Big Block | Pipeline TD
http://www.bigblockla.com
[m] 805-338-6520






-- 
Brett Bronson
Big Block | Pipeline TD
http://www.bigblockla.com
[m] 805-338-6520




-- 
Brett Bronson
Big Block | Pipeline TD
http://www.bigblockla.com
[m] 805-338-6520




-- 
Brett Bronson
Big Block | Pipeline TD
http://www.bigblockla.com
[m] 805-338-6520




-- 
Brett Bronson
Big Block | Pipeline TD
http://www.bigblockla.com
[m] 805-338-6520


-- 
Gregory Sloop, Principal: Sloop Network & Computer Consulting
Voice: 503.251.0452 x82
EMail: 
gr...@sloop.net
http://www.sloop.net
---
_______________________________________________
smokeping-users mailing list
smokeping-users@lists.oetiker.ch
https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users

Reply via email to