On Tue, Dec 21, 2010 at 5:49 AM, Stefan Behnel <stefan...@behnel.de> wrote: > David Hutto, 21.12.2010 11:29: >> >> On Tue, Dec 21, 2010 at 5:19 AM, Stefan Behnel wrote: >>> >>> Alan Gauld, 21.12.2010 10:58: >>>>> >>>>> 22 Jan 2009 ... Stripping Illegal Characters from XML in Python>> >>>> >>>> ... I'd be asking Python to process 6.4 gigabytes of CSV into >>>> 6.5 gigabytes of XML 1. ..... In fact, what happened was that >>>> the parsing didn't work and the whole db was ... >>>> >>>> And I thought a 1G file was extreme... Do these people stop to think >>>> that >>>> with XML as much as 80% of their "data" is just description (ie the >>>> tags). >>> >>> As I already said, it compresses well. In run-length compressed XML >>> files, >>> the tags can easily take up a negligible amount of space compared to the >>> more widely varying data content (although that also commonly tends to >>> compress rather well). And depending on how fast your underlying storage >>> is, >>> decompressing and parsing the file may still be faster than parsing a >>> huge >>> uncompressed file directly. So, again, the shear uncompressed file size >>> is >>> *not* a very interesting argument. >> >> However, could they (as mentioned elsewhere, and by other in another >> form)mitigate the damage by using smaller tags exclusively? > > Why should that have a (noticeable) impact on the compressed file? It's the > inherent nature of compression to reduce redundancy, which in XML files > usually includes the redundancy of repeated tag names (even if the > compression is not specifically XML aware). > > It's a very bad idea to use short and obfuscated tag names to reduce the > storage size.
Maybe my style is a form of bad coder example, in some areas(present company accepted). For example, I have a dictionary that has codes within a text file, that point to other lines for verbs, adj, nouns, etc. So <a> doesn't have to mean a it could mean <a> = <antonym>, but would that help in making the initial usage of <a> in the xml file faster, or slower, by parsing for <a> then relating <a> to <antonym>? That's like coding in assembler to reduce the size of the > source code. Haven't gotten to assembler yet, almost there. Just use compression for storage, or buy a larger hard disk for > your NAS. > > >> And also compressed is formatted, even for the tags, correct? > > The (lossless) compression doesn't change the content. google search later, I promise. > > Stefan > > _______________________________________________ > Tutor maillist - tu...@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > -- They're installing the breathalyzer on my email account next week. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor