On Sun, Nov 8, 2009 at 11:41 PM, Stephen Nelson-Smith <sanel...@gmail.com>wrote:

> I've got a large amount of data in the form of 3 apache and 3 varnish
> logfiles from 3 different machines.  They are rotated at 0400.  The
> logfiles are pretty big - maybe 6G per server, uncompressed.
>
> I've got to produce a combined logfile for 0000-2359 for a given day,
> with a bit of filtering (removing lines based on text match, bit of
> substitution).
>
> I've inherited a nasty shell script that does this but it is very slow
> and not clean to read or understand.
>
> I'd like to reimplement this in python.
>
> Initial questions:
>
> * How does Python compare in performance to shell, awk etc in a big
> pipeline?  The shell script kills the CPU
> * What's the best way to extract the data for a given time, eg 0000 -
> 2359 yesterday?
>
> Any advice or experiences?
>
>
go here and download the pdf!
http://www.dabeaz.com/generators-uk/

Someone posted this the other day, and I went and read through it and played
around a bit and it's exactly what you're looking for - plus it has one vs.
slide of python vs. awk.

I think you'll find the pdf highly useful and right on.

HTH,
Wayne
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to