Hi Gabriel, Your remarks fixed my problem. Now my code looks as below, and behaves as expected.
Thanks Gabriel. Merry Christmas and Happy Hanukkah, Ron. $ cat generator.py #!/usr/bin/env python import gzip from Debug import _line as line class LogStream(): def __init__(self, filename): self.filename = filename self.input_file = self.open_file(filename) def open_file(self, in_file): try: f = gzip.GzipFile(in_file, "r") f.readline() except IOError: f = open(in_file, "r") f.readline() f.seek(0) return(f) def line_generator(self): print line()+". self.input_file.tell()==",self.input_file.tell() while True: line_ = self.input_file.readline() print line()+". self.input_file.tell()==",self.input_file.tell() if not line_: break yield line_.strip() if __name__ == "__main__": filename = "sac.log.50lines" log_stream = LogStream(filename) log_stream.input_file.seek(0) line_generator = log_stream.line_generator() line_ = line_generator.next() $ python generator.py 23. self.input_file.tell()== 0 26. self.input_file.tell()== 247 $ !wc wc -c sac.log.50lines 6623 sac.log.50lines $ -----Original Message----- From: MRAB [mailto:goo...@mrabarnett.plus.com] Sent: Wednesday, December 24, 2008 20:00 To: python-list@python.org Subject: Re: How to change a generator ? Gabriel Genellina wrote: > En Wed, 24 Dec 2008 15:03:58 -0200, MRAB <goo...@mrabarnett.plus.com> > escribió: > >>> I have a generator whose aim is to returns consecutive lines from a >>> file (the listing below is a simplified version). >>> However, as it is written now, the generator method changes the text >>> file pointer to end of file after first invocation. >>> Namely, the file pointer changes from 0 to 6623 on line 24. >>> >> It might be that the generator method of self.input_file is reading >> the file a chunk at a time for efficiency even though it's yielding a >> line at a time. > > I think this is the case too. > I can think of 3 alternatives: > > a) open the file unbuffered (bufsize=0). But I think this would > greatly decrease performance. > > b) keep track internally of file position (by adding each line length). > The file should be opened in binary mode in this case (to avoid any '\n' > translation). > > c) return line numbers only, instead of file positions. Seeking to a > certain line number requires to re-read the whole file from start; > depending on how often this is required, and how big is the file, this > might be acceptable. > readline() appears to work as expected, leaving the file position at the start of the next line.
-- http://mail.python.org/mailman/listinfo/python-list