Forwarding for list visibility ----- Original Message ----- From: "Brian Gustin" <[EMAIL PROTECTED]> To: "Alan Gauld" <[EMAIL PROTECTED]> Sent: Monday, February 20, 2006 2:23 AM Subject: Re: [Tutor] File handling: open a file at specified byte?
> > > look at the file tell() and seek() methods. > > > > They will tell you the current location and allow you to move to a > > specific location. > > > OK..I did try using seek and tell, and couldnt get working code to do > what I needed it to, however, it did lead me to discover the fileinput > module, so.. Ive tested it on my test file, and it works quite well, I'd > like to see if you can offer any better suggestions - keeping in mind a > log file can grow to as large as 3 GB, so memory management will bee > important, as will execution time (I will need this parser to execute on > a file as large as 3 - 4 GB in under 10 minutes time, ideally shooting > for less than 1 minute) > > Code follows: > ##START CODE ########## > #!/usr/bin/python > #for testing of tux parser > # read "live" log file and parse it into separate domain files > import string > import re > import fileinput > > myfiles={} > line=1 > last=0 > try: > bkmk = open('bookmark','r') > last = bkmk.readline() > bkmk.close() > except: > pass > for outputdata in fileinput.input('./testfile.tuxlog'): > #sourcelist.sort() > #print outputdata > if fileinput.filelineno() < int(last): > continue > else: > info = re.search('(?<=GET )([a-zA-Z0-9\-\.]+)', outputdata) > try: > namecheck = info.group(0) > except AttributeError: > continue > try: > namecheck=namecheck.replace('www.','') > check = re.search('(\.[a-z]+$)',namecheck) > if check == None: > domain = 'Errors' > else: > res = re.search('(\ (301|404|403|302)\ 0)',outputdata) > if res == None: > domain = namecheck > else: > domain = '404_301errors' > outputdata=outputdata.replace(' '+domain+'/',' /') > if myfiles.has_key(domain): > domhandle = myfiles.get(domain) > else: > > domhandle=open('/var/log/tuxp/'+domain+'-access.log.1','w+') > myfiles[domain] = domhandle > > > domhandle.write(outputdata) > except: > continue > bookmark = fileinput.lineno() #get the last line no handled. could > this instead be run just before closing the handle? > rel = open('./bookmark','w') > rel.write(str(bookmark)) > rel.close() > #print "BOOKMARK: %s"%bookmark, > #print domain+' - ', > #print namecheck, > # line +=1 > #print str(line)+"\n" > #print fileinput.filelineno() > fileinput.close() > > > ############ END CODE############ > _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor