On 26/04/06, Liam Clarke <[EMAIL PROTECTED]> wrote: > Hi Frank, just bear in mind that the pattern: > > patObj = re.compile("<title>.*</title>", re.DOTALL) > > will match > > <title> > This is my title > </title> > > But, it'll also match > > <title> > This is my title > </title> > <p>Some content here</p> > <title> > Another title; not going to happen with a title tag in HTML, but > more an illustration > </title> > > All of that. > > Got to watch .* with re.DOTALL; try using .*? instead, it makes it > non-greedy. Functionality for your current use case won't change, but > you won't spend ages when you have a different use case trying to > figure out why half your data is matching. >_<
When you only want a tag like <title> with no nested tags, I sometimes use: <title>[^<]*</title> though for anything but the most trivial cases, it's often better to use BeautifulSoup Ed _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor