Thomas A. Schmitz wrote: > On Oct 11, 2006, at 12:06 PM, Kent Johnson wrote: > >> I would take out the join in this, at least, and return a list of >> lines. You don't really have a paragraph, you have structured data. >> There is not need to throw away the structure. >> >> It might be even more useful to return a dictionary that maps field >> names to values. Also there doesn't seem to be any reason to make >> FileIterator a class, you can use just a generator function (Dick >> Moores take notice!): >> >> def readparagraphs(fw): >> self._fw = fw >> >> data = {} >> for line in fw: >> if line.isspace(): >> if data: >> yield data >> data = {} >> else: >> key, value = line.split(' : ') >> data[key] = value >> if data: >> yield data >> >> Now you don't need a regexp, you have usable data directly from the >> iterator. >> > > Thank you for your help, Kent! But I'm not sure if this is > practicable. As I said, a line-by-line approach does not work,
What I have outlined is not a line-by-line approach, it is still returning data for a paragraph at a time, but in a more usable format than your original iterator. Try printing out the values you get from the iterator, the same way you did with your original paragraph iterator. > for > two reasons: > 1. I want to combine and translate the results from two lines; You can do that with this approach. > 2. in the file, there are lines of the form > Publication : Denver, University of Colorado Press, 1776 > from which I need to extract three values (address, publisher, date), > and I may need to discard some other stuff from other lines. So I do > need a regex, I think. Unfortunately, the structure is not strong > enough to make a one on one translation viable, so I do need to > extract the values... Ok, so the dict from the iterator is still raw data that needs some processing. Something like for para in readparagraphs(open('mydata.txt')): # Your previous example if para.get('Type de notice') == 'monographie': print "@Book{," # publication data pubData = para.get('Publication') if pubData: address, publisher, date = pubData.split(', ') # do something with address, etc. Kent PS Please reply on-list. _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor