Re: [2.5] Regex doesn't support MULTILINE?

2007-07-23 Thread Gilles Ganault
On Sun, 22 Jul 2007 05:34:17 -0300, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote: >Try to avoid using ".*" and ".+" (even the non greedy forms); in this >case, I think you want the scan to stop when it reaches the ending >or any other tag, so use: [^<]* instead. > >BTW, better to use a raw st

Re: [2.5] Regex doesn't support MULTILINE?

2007-07-22 Thread Gabriel Genellina
En Sun, 22 Jul 2007 01:56:32 -0300, Gilles Ganault <[EMAIL PROTECTED]> escribió: > Incidently, as far as using Re alone is concerned, it appears that > re.MULTILINE isn't enough to get Re to include newlines: re.DOTLINE > must be added. > > Problem is, when I add re.DOTLINE, the search takes les

Re: [2.5] Regex doesn't support MULTILINE?

2007-07-21 Thread Jay Loden
Gilles Ganault wrote: > Problem is, when I add re.DOTLINE, the search takes less than a second > for a 500KB file... and about 1mn30 for a file that's 1MB, with both > files holding similar contents. > > Why such a huge difference in performance? > > = Using Re = > import re >

Re: [2.5] Regex doesn't support MULTILINE?

2007-07-21 Thread Gilles Ganault
On Sat, 21 Jul 2007 22:18:56 -0400, Carsten Haese <[EMAIL PROTECTED]> wrote: >That's your problem right there. RE is not the right tool for that job. >Use an actual HTML parser such as BeautifulSoup Thanks a lot for the tip. I tried it, and it does look interesting, although I've been unsuccessful

Re: [2.5] Regex doesn't support MULTILINE?

2007-07-21 Thread Carsten Haese
On Sat, 2007-07-21 at 19:22 -0700, Paul Rubin wrote: > Carsten Haese <[EMAIL PROTECTED]> writes: > > Use an actual HTML parser such as BeautifulSoup > > (http://www.crummy.com/software/BeautifulSoup/) and your life will be > > much easier. > > BeautifulSoup is a lot simpler to use than RE's but a

Re: [2.5] Regex doesn't support MULTILINE?

2007-07-21 Thread Paul Rubin
Carsten Haese <[EMAIL PROTECTED]> writes: > Use an actual HTML parser such as BeautifulSoup > (http://www.crummy.com/software/BeautifulSoup/) and your life will be > much easier. BeautifulSoup is a lot simpler to use than RE's but a heck of a lot slower. I ended up having to use RE's last time I

Re: [2.5] Regex doesn't support MULTILINE?

2007-07-21 Thread Carsten Haese
On Sun, 2007-07-22 at 04:09 +0200, Gilles Ganault wrote: > Hello > > I'm trying to extract information from a web page using the Re module, That's your problem right there. RE is not the right tool for that job. Use an actual HTML parser such as BeautifulSoup (http://www.crummy.com/software/Beaut

[2.5] Regex doesn't support MULTILINE?

2007-07-21 Thread Gilles Ganault
Hello I'm trying to extract information from a web page using the Re module, but it doesn't seem to support MULTILINE: = import re #NO CRLF : works response = "Blablabla" #CRLF : doesn't work response = "Blablabla\r\n" pattern = "Bla.+?" p = re.compile(pattern,re.IGNORECASE|re.MULT