Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

Stefan Behnel Tue, 21 Dec 2010 01:13:05 -0800

David Hutto, 21.12.2010 09:55:

On Tue, Dec 21, 2010 at 3:52 AM, Stefan Behnel wrote:

Chris Fuller, 21.12.2010 03:27:


This isn't XML, it's an abomination of XML.  Best to not treat it as XML.
Good thing you're only after one class of tags.  Here's what I'd do.  I'll
give a general solution, but there are two parameters / four cases that
could
make the code simpler, I'll just point them out at the end.

Iterate over the file descriptor, reading in line-by-line.  This will be
slow
on a huge file, but probably not so bad if you're only doing it once.


Note that it's not unlikely that this is actually *slower* than using a real
XML parser:


Or a 'real' language like C or C++ maybe to increase, or in Python's
case, bypass, the interpreter?

While this may be a little faster than Python code (although I suspect thatbenchmarking is needed to prove either way), I doubt that it's worth theoverhead in code writing. If I can write a couple of lines of Python codethat are easy to validate and almost as fast as C code, why would I want towrite and debug hundreds of lines of code in C or C++, just to see that Ineed to tune my benchmark to notice the difference?

But then, people even write XML handling code in Java, where neitherperformance nor code size is a suitable argument.


Stefan

_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Trying to parse a HUGE(1gb) xml file in python

Reply via email to