On Tue, 2 Mar 2010 05:22:43 pm Andrew Fithian wrote: > Hi tutor, > > I have a large text file that has chunks of data like this: > > headerA n1 > line 1 > line 2 > ... > line n1 > headerB n2 > line 1 > line 2 > ... > line n2 > > Where each chunk is a header and the lines that follow it (up to the > next header). A header has the number of lines in the chunk as its > second field.
And what happens if the header is wrong? How do you handle situations like missing headers and empty sections, header lines which are wrong, and duplicate headers? line 1 line 2 headerB 0 headerC 1 line 1 headerD 2 line 1 line 2 line 3 line 4 headerE 23 line 1 line 2 headerB 1 line 1 This is a policy decision: do you try to recover, raise an exception, raise a warning, pad missing lines as blank, throw away excess lines, or what? > I would like to turn this file into a dictionary like: > dict = {'headerA':[line 1, line 2, ... , line n1], 'headerB':[line1, > line 2, ... , line n2]} > > Is there a way to do this with a dictionary comprehension or do I > have to iterate over the file with a "while 1" loop? I wouldn't do either. I would treat this as a pipe-line problem: you have a series of lines that need to be processed. You can feed them through a pipe-line of filters: def skip_blanks(lines): """Remove leading and trailing whitespace, ignore blank lines.""" for line in lines: line = line.strip() if line: yield line def collate_section(lines): """Return a list of lines that belong in a section.""" current_header = "" accumulator = [] for line in lines: if line.startswith("header"): yield (current_header, accumulator) current_header = line accumulator = [] else: accumulator.append(line) yield (current_header, accumulator) Then put them together like this: fp = open("my_file.dat", "r") data = {} # don't shadow the built-in dict non_blank_lines = skip_blanks(fp) sections = collate_sections(non_blank_lines) for (header, lines) in sections: data[header] = lines Of course you can add your own error checking. -- Steven D'Aprano _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor