And the problem I have with the below is that I've discovered that the input logfiles aren't strictly ordered - ie there is variance by a second or so in some of the entries.
I can sort the biggest logfile (800M) using unix sort in about 1.5 mins on my workstation. That's not really fast enough, with potentially 12 other files.... Hrm... S. On Mon, Nov 9, 2009 at 1:35 PM, Stephen Nelson-Smith <sanel...@gmail.com> wrote: > Hi, > >> If you create iterators from the files that yield (timestamp, entry) >> pairs, you can merge the iterators using one of these recipes: >> http://code.activestate.com/recipes/491285/ >> http://code.activestate.com/recipes/535160/ > > Could you show me how I might do that? > > So far I'm at the stage of being able to produce loglines: > > #! /usr/bin/env python > import gzip > class LogFile: > def __init__(self, filename, date): > self.f=gzip.open(filename,"r") > for logline in self.f: > self.line=logline > self.stamp=" ".join(self.line.split()[3:5]) > if self.stamp.startswith(date): > break > > def getline(self): > ret=self.line > self.line=self.f.readline() > self.stamp=" ".join(self.line.split()[3:5]) > return ret > > logs=[LogFile("a/access_log-20091105.gz","[05/Nov/2009"),LogFile("b/access_log-20091105.gz","[05/Nov/2009"),LogFile("c/access_log-20091105.gz","[05/Nov/2009")] > while True: > print [x.stamp for x in logs] > nextline=min((x.stamp,x) for x in logs) > print nextline[1].getline() > > > -- > Stephen Nelson-Smith > Technical Director > Atalanta Systems Ltd > www.atalanta-systems.com > -- Stephen Nelson-Smith Technical Director Atalanta Systems Ltd www.atalanta-systems.com _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor