richard kappler wrote: > I'm writing a script that reads from an in-service log file in xml format > that can grow to a couple gigs in 24 hours, then gets zipped out and > restarts at zero. My script must check to see if new entries have been > made, find specific lines based on 2 different start tags, and from those > lines extract data between the start and end tags (hopefully including the > tags) and write it to a file. I've got the script to read the file, see if > it's grown, find the appropriate lines and write them to a file. I still > need to strip out just the data I need (between the open and close tags) > instead of writing the entire line, and also to reset eof when the nightly > zip / new log file creation occurs. I could use some guidance on stripping > out the data, at the moment I'm pretty lost, and I've got an idea about > the nightly reset but any comments about that would be welcome as well. > Oh, and the painful bit is that I can't use any modules that aren't > included in the initial Python install. My code is appended below.
You'll probably end up with something closer to your original idea and Steven's suggestions, but let me just state that the idea of a line and xml exist in two distinct worlds. Here's my (sketchy) attempt to treat xml as xml rather than lines in a text file. (I got some hints here: http://effbot.org/zone/element-iterparse.htm). import os import sys import time from xml.etree.ElementTree import iterparse, tostring class Rollover(Exception): pass class File: def __init__(self, filename, sleepinterval=1): self.size = 0 self.sleepinterval = sleepinterval self.filename = filename self.file = open(filename, "rb") def __enter__(self): return self def __exit__(self, etype, evalue, traceback): self.file.close() def read(self, size=0): while True: s = self.file.read(size) if s: return s else: time.sleep(self.sleepinterval) self.check_rollover() def check_rollover(self): newsize = os.path.getsize(self.filename) if newsize < self.size: raise Rollover() self.size = newsize WANTED_TAGS = {"usertag1", "SeMsg"} while True: try: with File("log.txt") as f: context = iterparse(f, events=("start", "end")) event, root = next(context) wanted_count = 0 for event, element in context: if event == "start" and element.tag in WANTED_TAGS: wanted_count += 1 else: assert event == "end" if element.tag in WANTED_TAGS: wanted_count -= 1 print("LOGGING") print( tostring(element)) if wanted_count == 0: root.clear() except Rollover as err: print("doing a rollover", file=sys.stderr) _______________________________________________ Tutor maillist - [email protected] To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
