[Tutor] Logfile Manipulation

Stephen Nelson-Smith Sun, 08 Nov 2009 21:42:46 -0800

I've got a large amount of data in the form of 3 apache and 3 varnish
logfiles from 3 different machines.  They are rotated at 0400.  The
logfiles are pretty big - maybe 6G per server, uncompressed.


I've got to produce a combined logfile for 0000-2359 for a given day,
with a bit of filtering (removing lines based on text match, bit of
substitution).

I've inherited a nasty shell script that does this but it is very slow
and not clean to read or understand.

I'd like to reimplement this in python.

Initial questions:

* How does Python compare in performance to shell, awk etc in a big
pipeline?  The shell script kills the CPU
* What's the best way to extract the data for a given time, eg 0000 -
2359 yesterday?

Any advice or experiences?

S.
-- 
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Logfile Manipulation

Reply via email to