I've got a large amount of data in the form of 3 apache and 3 varnish logfiles from 3 different machines. They are rotated at 0400. The logfiles are pretty big - maybe 6G per server, uncompressed.
I've got to produce a combined logfile for 0000-2359 for a given day, with a bit of filtering (removing lines based on text match, bit of substitution). I've inherited a nasty shell script that does this but it is very slow and not clean to read or understand. I'd like to reimplement this in python. Initial questions: * How does Python compare in performance to shell, awk etc in a big pipeline? The shell script kills the CPU * What's the best way to extract the data for a given time, eg 0000 - 2359 yesterday? Any advice or experiences? S. -- Stephen Nelson-Smith Technical Director Atalanta Systems Ltd www.atalanta-systems.com _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor