Stephen Nelson-Smith wrote: > Hi, > > I want to write a little script that parses an apache mod_status page. > > I want it to return simple the number of page requests a second and > the number of connections. > > It seems this is very complicated... I can do it in a shell one-liner: > > curl 10.1.2.201/server-status 2>&1 | grep -i request | grep dt | { > IFS='> ' read _ rps _; IFS='> ' read _ currRequests _ _ _ _ > idleWorkers _; echo $rps $currRequests $idleWorkers ; } > > But that's horrid. > > So is: > > $ eval `printf '<dt>3 requests currently being processed, 17 idle > workers</dt>\n <dt>2.82 requests/sec - 28.1 kB/second - 10.0 > kB/request</dt>\n' | sed -nr '/<dt>/ { N; > s@<dt>([0-9]*)[^,]*,([0-9]*).*<dt>([0-9.]*)[EMAIL PROTECTED]((\1+\2));[EMAIL > PROTECTED]; > }'` > $ echo "workers: $workers reqs/secs $requests" > workers: 20 reqs/sec 2.82 > > The page looks like this: > > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> > <html><head> > <title>Apache Status</title> > </head><body> > <h1>Apache Server Status for 10.1.2.201</h1> > > <dl><dt>Server Version: Apache/2.0.46 (Red Hat)</dt> > <dt>Server Built: Aug 1 2006 09:25:45 > </dt></dl><hr /><dl> > <dt>Current Time: Monday, 21-Apr-2008 14:29:44 BST</dt> > <dt>Restart Time: Monday, 21-Apr-2008 13:32:46 BST</dt> > <dt>Parent Server Generation: 0</dt> > <dt>Server uptime: 56 minutes 58 seconds</dt> > <dt>Total accesses: 10661 - Total Traffic: 101.5 MB</dt> > <dt>CPU Usage: u6.03 s2.15 cu0 cs0 - .239% CPU load</dt> > <dt>3.12 requests/sec - 30.4 kB/second - 9.7 kB/request</dt> > <dt>9 requests currently being processed, 11 idle workers</dt> > </body></html> > > How can/should I do this? > > S. > _______________________________________________ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > > I don't know how you get the page HTML, but let's assume each line is in an iterable, named html. It seems very straightforward to code:
for lineno, line in enumerate(html): x = line.find("requests/sec") if x >= 0: no_requests_sec = line[3:x] break for lineno, line in enumerate(html[lineno+1:]): x = line.find("requests currently being processed") if x >= 0: no_connections = line[3:x] That makes certain assumptions about the file format, such as the matching text and knowing that connections follows requests/sec, and does not assume that connections is the first line after requests/sec. -- Bob Gailer 919-636-4239 Chapel Hill, NC _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor