On Wed, Nov 21, 2007 at 09:02:47AM -0800, Srinivas Iyyer wrote:
> Dear tutors,
>
> I use ElementTree for XML works. I have a 1.3GB file
> to parse.
>
>
> I takes a lot of time to open my input XML file.
>
> Is that because of my hardware limitation or am I
> using a blunt method to load the file.
>
> my computer config:
> Inte(R)
> Pentium(R)4 CPU 2.80GHz
> 2.79GHz, 0.99GB of RAM
>
> from elementtree import ElementTree
> myfile = open('myXML.out','r')
>
> Do you suggest any tip to circumvent the file opening
> problem.
If time is the problem, you might want to look at:
- cElementTree -- See notes about cElementTree on this page:
http://effbot.org/zone/elementtree-13-intro.htm
- lxml -- http://codespeak.net/lxml/
If size/resources/memory are the issue, as must be the case for
you, then SAX can be a solution. But, switching to SAX requires a
very radical redesign of your application.
You might also want to investigate pulldom. It's in the Python
standard library. A quote:
"PullDOM has 80% of the speed of SAX and 80% of the convenience
of the DOM. There are still circumstances where you might need
SAX (speed freak!) or DOM (complete random access). But IMO
there are a lot more circumstances where the PullDOM middle
ground is exactly what you need."
The Python standard documentation on pulldom is next to none, but
here are several links:
http://www.prescod.net/python/pulldom.html
http://www.ibm.com/developerworks/xml/library/x-tipulldom.html
http://www.idealliance.org/papers/dx_xml03/papers/06-02-03/06-02-03.html
http://www.idealliance.org/papers/dx_xml03/papers/06-02-03/06-02-03.html#pull
Hope this helps.
Dave
--
Dave Kuhlman
http://www.rexx.com/~dkuhlman
_______________________________________________
Tutor maillist - [email protected]
http://mail.python.org/mailman/listinfo/tutor