On Sun, 22 Jul 2007 05:34:17 -0300, "Gabriel Genellina"
<[EMAIL PROTECTED]> wrote:
>Try to avoid using ".*" and ".+" (even the non greedy forms); in this
>case, I think you want the scan to stop when it reaches the ending
>or any other tag, so use: [^<]* instead.
>
>BTW, better to use a raw st
En Sun, 22 Jul 2007 01:56:32 -0300, Gilles Ganault <[EMAIL PROTECTED]>
escribió:
> Incidently, as far as using Re alone is concerned, it appears that
> re.MULTILINE isn't enough to get Re to include newlines: re.DOTLINE
> must be added.
>
> Problem is, when I add re.DOTLINE, the search takes les
Gilles Ganault wrote:
> Problem is, when I add re.DOTLINE, the search takes less than a second
> for a 500KB file... and about 1mn30 for a file that's 1MB, with both
> files holding similar contents.
>
> Why such a huge difference in performance?
>
> = Using Re =
> import re
>
On Sat, 21 Jul 2007 22:18:56 -0400, Carsten Haese
<[EMAIL PROTECTED]> wrote:
>That's your problem right there. RE is not the right tool for that job.
>Use an actual HTML parser such as BeautifulSoup
Thanks a lot for the tip. I tried it, and it does look interesting,
although I've been unsuccessful
On Sat, 2007-07-21 at 19:22 -0700, Paul Rubin wrote:
> Carsten Haese <[EMAIL PROTECTED]> writes:
> > Use an actual HTML parser such as BeautifulSoup
> > (http://www.crummy.com/software/BeautifulSoup/) and your life will be
> > much easier.
>
> BeautifulSoup is a lot simpler to use than RE's but a
Carsten Haese <[EMAIL PROTECTED]> writes:
> Use an actual HTML parser such as BeautifulSoup
> (http://www.crummy.com/software/BeautifulSoup/) and your life will be
> much easier.
BeautifulSoup is a lot simpler to use than RE's but a heck of a lot
slower. I ended up having to use RE's last time I
On Sun, 2007-07-22 at 04:09 +0200, Gilles Ganault wrote:
> Hello
>
> I'm trying to extract information from a web page using the Re module,
That's your problem right there. RE is not the right tool for that job.
Use an actual HTML parser such as BeautifulSoup
(http://www.crummy.com/software/Beaut
Hello
I'm trying to extract information from a web page using the Re module,
but it doesn't seem to support MULTILINE:
=
import re
#NO CRLF : works
response = "Blablabla"
#CRLF : doesn't work
response = "Blablabla\r\n"
pattern = "Bla.+?"
p = re.compile(pattern,re.IGNORECASE|re.MULT