On Tue, Dec 21, 2010 at 10:03 AM, Stefan Behnel <stefan...@behnel.de> wrote: > Alan Gauld, 21.12.2010 15:11: >> >> "Stefan Behnel" wrote >>>> >>>> And I thought a 1G file was extreme... Do these people stop to think >>>> that >>>> with XML as much as 80% of their "data" is just description (ie the >>>> tags). >>> >>> As I already said, it compresses well. In run-length compressed XML >>> files, the tags can easily take up a negligible amount of space compared >>> to the more widely varying data content >> >> I understand how compression helps with the data transmission aspect. >> >>> compress rather well). And depending on how fast your underlying storage >>> is, decompressing and parsing the file may still be faster than parsing a >>> huge uncompressed file directly. >> >> But I don't understand how uncompressing a file before parsing it can >> be faster than parsing the original uncompressed file? > > I didn't say "uncompressing a file *before* parsing it".
He didn't say utilizing code below Python either, but others will argue the microseconds matter, and if that's YOUR standard, then keep it for client and self. I meant > uncompressing the data *while* parsing it. Just like you have to decode it > for parsing, it's just an additional step to decompress it before decoding. > Depending on the performance relation between I/O speed and decompression > speed, it can be faster to load the compressed data and decompress it into > the parser on the fly. lxml.etree (or rather libxml2) internally does that > for you, for example, if it detects compressed input when parsing from a > file. > > Note that these performance differences are tricky to prove in benchmarks, Tricky and proven, then tell me what real time, and this is in reference to a recent c++ discussion, is python used in ,andhow could it be utilized in....say an aviation system to avoid a collision when milliseconds are on the line? > as repeating the benchmark usually means that the file is already cached in > memory after the first run, so the decompression overhead will dominate in > the second run. That's not what you will see in a clean run or for huge > files, though. > > Stefan > > _______________________________________________ > Tutor maillist - tu...@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > -- They're installing the breathalyzer on my email account next week. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor