Hello Steven,
Is there a big difference to write your first functions as below because
I am not familiar with yield keyword?
def skip_blanks(lines):
"""Remove leading and trailing whitespace, ignore blank lines."""
return [line.strip() in lines if line.strip()]
I tried to write as well the second function but it is not as straight
forward.
I begin to understand the use of yield in it.
Regards
Karim
Steven D'Aprano wrote:
On Tue, 2 Mar 2010 05:22:43 pm Andrew Fithian wrote:
Hi tutor,
I have a large text file that has chunks of data like this:
headerA n1
line 1
line 2
...
line n1
headerB n2
line 1
line 2
...
line n2
Where each chunk is a header and the lines that follow it (up to the
next header). A header has the number of lines in the chunk as its
second field.
And what happens if the header is wrong? How do you handle situations
like missing headers and empty sections, header lines which are wrong,
and duplicate headers?
line 1
line 2
headerB 0
headerC 1
line 1
headerD 2
line 1
line 2
line 3
line 4
headerE 23
line 1
line 2
headerB 1
line 1
This is a policy decision: do you try to recover, raise an exception,
raise a warning, pad missing lines as blank, throw away excess lines,
or what?
I would like to turn this file into a dictionary like:
dict = {'headerA':[line 1, line 2, ... , line n1], 'headerB':[line1,
line 2, ... , line n2]}
Is there a way to do this with a dictionary comprehension or do I
have to iterate over the file with a "while 1" loop?
I wouldn't do either. I would treat this as a pipe-line problem: you
have a series of lines that need to be processed. You can feed them
through a pipe-line of filters:
def skip_blanks(lines):
"""Remove leading and trailing whitespace, ignore blank lines."""
for line in lines:
line = line.strip()
if line:
yield line
def collate_section(lines):
"""Return a list of lines that belong in a section."""
current_header = ""
accumulator = []
for line in lines:
if line.startswith("header"):
yield (current_header, accumulator)
current_header = line
accumulator = []
else:
accumulator.append(line)
yield (current_header, accumulator)
Then put them together like this:
fp = open("my_file.dat", "r")
data = {} # don't shadow the built-in dict
non_blank_lines = skip_blanks(fp)
sections = collate_sections(non_blank_lines)
for (header, lines) in sections:
data[header] = lines
Of course you can add your own error checking.
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor