The file size is 112 Kb. Most lines look this way: <parameter name="roty" type="Parameter" sourceclassname="nosource">
I'll give a try to ElementTree. Bernard On 9/14/05, Kent Johnson <[EMAIL PROTECTED]> wrote: > Bernard Lebel wrote: > > Thanks for that pointer Kent, I'll check it out. Also thanks for > > letting me know I'm not nuts! :-) > > > > Alan's suggestion about BeautifulSoup is actually excellent. The > > documentation is nice and the tool is very easy to use. > > > > However is it normal that to parse a 2618 lines xml file it takes > > 20-30 seconds or so? > > That seems slow to me unless the lines are really long! How many bytes is the > file? But I don't have much experience with BeautifulSoup. > > ElementTree is fast and cElementTree (the C implementation) is really fast. I > have used it to read, process and write a 28 MB XML file, it took about 10 > seconds. > > Kent > > > > > > > Thanks > > Bernard > > > > > > > > On 9/14/05, Kent Johnson <[EMAIL PROTECTED]> wrote: > > > >>Bernard Lebel wrote: > >> > >>>Thanks Alan, > >>> > >>>I'll check BeautifulSoup asap. > >>> > >>>I'm using regex simply because I have no clue where to start to parse > >>>XML. I have read the various xml tools available in the Python > >>>library, however I'm a complete loss at what to make out of them. Many > >>>of them seem to use some programming standards, wich I am completely > >>>unfamiliar with (this is the first time that I dig into XML writing > >>>and parsing). > >>> > >>>I don't know where to start to learn about all these standards, and as > >>>usual with new programming things, the documentation is hard to > >>>swallow (it usually is written more as a reference than a proper user > >>>guide/tutorial). I have to admit this is very frustrating, so if I'm > >>>looking at things from a wrong perspective please advise me, I need > >>>it. > >> > >>I agree that the Python XML story is confusing even for the files in the > >>standard library. Worse, the (IMO) best solutions are not to be found in > >>the standard lib or PyXML at all. > >> > >>The std lib and PyXML are based on the DOM and SAX standards. These > >>standards were designed to be "language-neutral" - there are > >>implementations in Python, Java and other languages. The good side of this > >>is, if you learn how to use them, the knowledge is pretty portable to other > >>languages. The bad side is, the APIs defined by the standard are IMO clunky > >>and painful to use, especially in Python. > >> > >>There is a current thread on comp.lang.python discussing this with good > >>suggestions and pointers to more info: > >>http://groups.google.com/group/comp.lang.python/browse_frm/thread/a48891aa645ead13/dcd8fdc20b4b191b?hl=en#dcd8fdc20b4b191b > >> > >>My personal preference is ElementTree. Beautiful Soup is good too though I > >>have only tried it with HTML. If I was running on Linux I would try lxml > >>which uses the ElementTree API and adds full XPath support. Amara looks > >>like the Cadillac solution - big and cushy. I haven't tried it. Uche's > >>articles (referenced in the thread above) have pointers to many other > >>choices but these seem to be the most popular. > >> > >>My favorite XML lib is actually dom4j which is in Java. It works great with > >>Jython. > >> > >>Kent > >> > >> > >>>So right now I'm just taking a shortcut and using ultra-simple > >>>re-based parser to retrieve the tags I'm looking for. I know it will > >>>probably be slow, but hopefully I'll get familiar with sophisticated > >>>parsing in the future and improve my code. As it stands right now, > >>>even the re syntax is not super easy to learn. > >> > >>For what you are doing re seems fine to me. You can get in trouble using > >>re's with XML because of nested tags, variations in spelling and order, > >>probably a bunch of other things. But for simple stuff it can work fine. > >> > >>Kent > >> > >> > >>> > >>>Kent: That works (of course!). Thanks a bunch once again! > >>> > >>> > >>>Thanks > >>>Bernard > >>> > >>>On 9/14/05, Alan G <[EMAIL PROTECTED]> wrote: > >>> > >>> > >>>>Hi Bernard, > >>>> > >>>> > >>>> > >>>>>Hello, yet another regular expression question :-) > >>>>> > >>>>>So I have this xml file that I'm trying to find a > >>>>>specific tag in. > >>>> > >>>>I'm always suspicious when I see regular expression > >>>>and xml/html in the same context. regex are not good > >>>>for parsing xml/html files and it's usually much easier > >>>>to use a proper parser - such as beautiful soup. > >>>> > >>>>http://www.crummy.com/software/BeautifulSoup/ > >>>> > >>>>Is there any special reason why you are using a regex > >>>>sledgehammer to crack this particular nut? Or is it > >>>>just to gain experience using regex? > >>>> > >>>>Alan G. > >>>> > >>> > >>>_______________________________________________ > >>>Tutor maillist - Tutor@python.org > >>>http://mail.python.org/mailman/listinfo/tutor > >>> > >>> > >> > >>_______________________________________________ > >>Tutor maillist - Tutor@python.org > >>http://mail.python.org/mailman/listinfo/tutor > >> > > > > _______________________________________________ > > Tutor maillist - Tutor@python.org > > http://mail.python.org/mailman/listinfo/tutor > > > > > > _______________________________________________ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor