Alan Gauld, 21.12.2010 15:11:
"Stefan Behnel" wrote
And I thought a 1G file was extreme... Do these people stop to think that
with XML as much as 80% of their "data" is just description (ie the tags).
As I already said, it compresses well. In run-length compressed XML
files, the tags can easily take up a negligible amount of space compared
to the more widely varying data content
I understand how compression helps with the data transmission aspect.
compress rather well). And depending on how fast your underlying storage
is, decompressing and parsing the file may still be faster than parsing a
huge uncompressed file directly.
But I don't understand how uncompressing a file before parsing it can
be faster than parsing the original uncompressed file?
I didn't say "uncompressing a file *before* parsing it". I meant
uncompressing the data *while* parsing it. Just like you have to decode it
for parsing, it's just an additional step to decompress it before decoding.
Depending on the performance relation between I/O speed and decompression
speed, it can be faster to load the compressed data and decompress it into
the parser on the fly. lxml.etree (or rather libxml2) internally does that
for you, for example, if it detects compressed input when parsing from a file.
Note that these performance differences are tricky to prove in benchmarks,
as repeating the benchmark usually means that the file is already cached in
memory after the first run, so the decompression overhead will dominate in
the second run. That's not what you will see in a clean run or for huge
files, though.
Stefan
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor