Re: [Tutor] Another regular expression question

Bernard Lebel Wed, 14 Sep 2005 08:03:07 -0700

The file size is 112 Kb. Most lines look this way:

<parameter name="roty" type="Parameter" sourceclassname="nosource">



I'll give a try to ElementTree.


Bernard



On 9/14/05, Kent Johnson <[EMAIL PROTECTED]> wrote:
> Bernard Lebel wrote:
> > Thanks for that pointer Kent, I'll check it out. Also thanks for
> > letting me know I'm not nuts! :-)
> >
> > Alan's suggestion about BeautifulSoup is actually excellent. The
> > documentation is nice and the tool is very easy to use.
> >
> > However is it normal that to parse a 2618 lines xml file it takes
> > 20-30 seconds or so?
> 
> That seems slow to me unless the lines are really long! How many bytes is the 
> file? But I don't have much experience with BeautifulSoup.
> 
> ElementTree is fast and cElementTree (the C implementation) is really fast. I 
> have used it to read, process and write a 28 MB XML file, it took about 10 
> seconds.
> 
> Kent
> 
> >
> >
> > Thanks
> > Bernard
> >
> >
> >
> > On 9/14/05, Kent Johnson <[EMAIL PROTECTED]> wrote:
> >
> >>Bernard Lebel wrote:
> >>
> >>>Thanks Alan,
> >>>
> >>>I'll check BeautifulSoup asap.
> >>>
> >>>I'm using regex simply because I have no clue where to start to parse
> >>>XML. I have read the various xml tools available in the Python
> >>>library, however I'm a complete loss at what to make out of them. Many
> >>>of them seem to use some programming standards, wich I am completely
> >>>unfamiliar with (this is the first time that I dig into XML writing
> >>>and parsing).
> >>>
> >>>I don't know where to start to learn about all these standards, and as
> >>>usual with new programming things, the documentation is hard to
> >>>swallow (it usually is written more as a reference than a proper user
> >>>guide/tutorial). I have to admit this is very frustrating, so if I'm
> >>>looking at things from a wrong perspective please advise me, I need
> >>>it.
> >>
> >>I agree that the Python XML story is confusing even for the files in the 
> >>standard library. Worse, the (IMO) best solutions are not to be found in 
> >>the standard lib or PyXML at all.
> >>
> >>The std lib and PyXML are based on the DOM and SAX standards. These 
> >>standards were designed to be "language-neutral" - there are 
> >>implementations in Python, Java and other languages. The good side of this 
> >>is, if you learn how to use them, the knowledge is pretty portable to other 
> >>languages. The bad side is, the APIs defined by the standard are IMO clunky 
> >>and painful to use, especially in Python.
> >>
> >>There is a current thread on comp.lang.python discussing this with good 
> >>suggestions and pointers to more info:
> >>http://groups.google.com/group/comp.lang.python/browse_frm/thread/a48891aa645ead13/dcd8fdc20b4b191b?hl=en#dcd8fdc20b4b191b
> >>
> >>My personal preference is ElementTree. Beautiful Soup is good too though I 
> >>have only tried it with HTML. If I was running on Linux I would try lxml 
> >>which uses the ElementTree API and adds full XPath support. Amara looks 
> >>like the Cadillac solution - big and cushy. I haven't tried it. Uche's 
> >>articles (referenced in the thread above) have pointers to many other 
> >>choices but these seem to be the most popular.
> >>
> >>My favorite XML lib is actually dom4j which is in Java. It works great with 
> >>Jython.
> >>
> >>Kent
> >>
> >>
> >>>So right now I'm just taking a shortcut and using ultra-simple
> >>>re-based parser to retrieve the tags I'm looking for. I know it will
> >>>probably be slow, but hopefully I'll get familiar with sophisticated
> >>>parsing in the future and improve my code. As it stands right now,
> >>>even the re syntax is not super easy to learn.
> >>
> >>For what you are doing re seems fine to me. You can get in trouble using 
> >>re's with XML because of nested tags, variations in spelling and order, 
> >>probably a bunch of other things. But for simple stuff it can work fine.
> >>
> >>Kent
> >>
> >>
> >>>
> >>>Kent: That works (of course!). Thanks a bunch once again!
> >>>
> >>>
> >>>Thanks
> >>>Bernard
> >>>
> >>>On 9/14/05, Alan G <[EMAIL PROTECTED]> wrote:
> >>>
> >>>
> >>>>Hi Bernard,
> >>>>
> >>>>
> >>>>
> >>>>>Hello, yet another regular expression question :-)
> >>>>>
> >>>>>So I have this xml file that I'm trying to find a
> >>>>>specific tag in.
> >>>>
> >>>>I'm always suspicious when I see regular expression
> >>>>and xml/html in the same context. regex are not good
> >>>>for parsing xml/html files and it's usually much easier
> >>>>to use a proper parser - such as beautiful soup.
> >>>>
> >>>>http://www.crummy.com/software/BeautifulSoup/
> >>>>
> >>>>Is there any special reason why you are using a regex
> >>>>sledgehammer to crack this particular nut? Or is it
> >>>>just to gain experience using regex?
> >>>>
> >>>>Alan G.
> >>>>
> >>>
> >>>_______________________________________________
> >>>Tutor maillist  -  [email protected]
> >>>http://mail.python.org/mailman/listinfo/tutor
> >>>
> >>>
> >>
> >>_______________________________________________
> >>Tutor maillist  -  [email protected]
> >>http://mail.python.org/mailman/listinfo/tutor
> >>
> >
> > _______________________________________________
> > Tutor maillist  -  [email protected]
> > http://mail.python.org/mailman/listinfo/tutor
> >
> >
> 
> _______________________________________________
> Tutor maillist  -  [email protected]
> http://mail.python.org/mailman/listinfo/tutor
>
_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Another regular expression question

Reply via email to