Stephen Nelson-Smith wrote:
Hi,

I want to write a little script that parses an apache mod_status page.

I want it to return simple the number of page requests a second and
the number of connections.

The page looks like this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html><head>
<title>Apache Status</title>
</head><body>
<h1>Apache Server Status for 10.1.2.201</h1>

<dl><dt>Server Version: Apache/2.0.46 (Red Hat)</dt>
<dt>Server Built: Aug  1 2006 09:25:45
</dt></dl><hr /><dl>
<dt>Current Time: Monday, 21-Apr-2008 14:29:44 BST</dt>
<dt>Restart Time: Monday, 21-Apr-2008 13:32:46 BST</dt>
<dt>Parent Server Generation: 0</dt>
<dt>Server uptime:  56 minutes 58 seconds</dt>
<dt>Total accesses: 10661 - Total Traffic: 101.5 MB</dt>
<dt>CPU Usage: u6.03 s2.15 cu0 cs0 - .239% CPU load</dt>
<dt>3.12 requests/sec - 30.4 kB/second - 9.7 kB/request</dt>
<dt>9 requests currently being processed, 11 idle workers</dt>
</body></html>

How can/should I do this?

For data this predictable, simple regex matching will probably work fine.

If 'data' is the above text, then this seems to get what you want:

In [17]: import re
In [18]: re.search(r'[\d.]+ requests/sec', data).group()
Out[18]: '3.12 requests/sec'
In [19]: re.search(r'\d+ requests currently being processed', data).group()
Out[19]: '9 requests currently being processed'

Kent
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to