On Sun, Nov 8, 2009 at 11:41 PM, Stephen Nelson-Smith <sanel...@gmail.com>wrote:
> I've got a large amount of data in the form of 3 apache and 3 varnish > logfiles from 3 different machines. They are rotated at 0400. The > logfiles are pretty big - maybe 6G per server, uncompressed. > > I've got to produce a combined logfile for 0000-2359 for a given day, > with a bit of filtering (removing lines based on text match, bit of > substitution). > > I've inherited a nasty shell script that does this but it is very slow > and not clean to read or understand. > > I'd like to reimplement this in python. > > Initial questions: > > * How does Python compare in performance to shell, awk etc in a big > pipeline? The shell script kills the CPU > * What's the best way to extract the data for a given time, eg 0000 - > 2359 yesterday? > > Any advice or experiences? > > go here and download the pdf! http://www.dabeaz.com/generators-uk/ Someone posted this the other day, and I went and read through it and played around a bit and it's exactly what you're looking for - plus it has one vs. slide of python vs. awk. I think you'll find the pdf highly useful and right on. HTH, Wayne
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor