Em Segunda 11 Setembro 2006 12:59, Kent Johnson escreveu: > Tiago Saboga wrote: > > Em Segunda 11 Setembro 2006 12:24, Kent Johnson escreveu: > >> Tiago Saboga wrote: > >>> Em Segunda 11 Setembro 2006 11:15, Kent Johnson escreveu: > >>>> Tiago Saboga wrote: > >>>> How big is the XML? 25 seconds is a long time...I would look at > >>>> cElementTree (implementation of ElementTree in C), it is pretty fast. > >>>> http://effbot.org/zone/celementtree.htm > >>> > >>> It's about 10k. Hey, it seems easy, but I'd like not to start over > >>> again. Of course, if it's the only solution... 25 (28, in fact, for the > >>> cp man page) isn't really acceptable. > >> > >> That's tiny! No way it should take 25 seconds to parse a 10k file. > >> > >> Have you tried saving the file separately and parsing from disk? That > >> would help determine if the interprocess pipe is the problem. > > > > Just tried, and - incredible - it took even longer: 46s. But in the > > second run it came back to 25s. I really don't understand what's going > > on. I did some other tests, and I found that all the code before > > "parser.parse(stout)" runs almost instantly; it then takes all the > > running somewhere between this call and the first event; and the rest is > > almost instantly again. Any ideas? > > What did you try, buffering or reading from a file? If parsing from a > file takes 25 secs, I am amazed...
I read from a file, and before you ask, no, I'm not working in a 286 and compiling my kernel at the same time... ;-) In fact, I decided to strip down both my code and the xml file. I've stripped the code to almost nothing, having yet a 23s time. And the same with the xml file... until I cut out the second line, with the dtd [1]. And surprise: I've a nice time. So I put it all together again, but have the following caveat: there's an error that did not raise previously:] Traceback (most recent call last): File "./liftopy.py", line 130, in ? parser.parse(stout) File "/usr/lib/python2.3/site-packages/_xmlplus/sax/expatreader.py", line 109, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib/python2.3/site-packages/_xmlplus/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/usr/lib/python2.3/site-packages/_xmlplus/sax/expatreader.py", line 220, in feed self._err_handler.fatalError(exc) File "/usr/lib/python2.3/site-packages/_xmlplus/sax/handler.py", line 38, in fatalError raise exception xml.sax._exceptions.SAXParseException: /home/tiago/Computador/python/opy/manraw/doclift/cp.1.xml.stripped:279:16: undefined entity Ok, the guilty line (279) has a "©" that was probably defined in the dtd, but as it doesn't know what is the right dtd... But wait... How does python read the dtd? It fetches it from the net? I tried it (disconnected) and the answer is yes, it fetches it from the net. So that's the problem! But how do I avoid it? I'll search. But if you can spare me some time, you'll make me a little happier. [1] - The line is as follows: <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"> Thanks! Tiago. _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor