Michel Bouwmans wrote:
> I don't think HTMLParser was doing anything wrong here. I needed to parse a
> HTML document, but it contained script-blocks with document.write's in
> them. I only care for the content outside these blocks but HTMLParser will
> choke on such a block when it isn't encapsulat
.org
>> Subject: RE: Stripping scripts from HTML with regular expressions
>>
>>
>> Thanks! That did the trick. :) I was trying to use HTMLParser but that
>> choked on the script-blocks that didn't contain comment-indicators.
>> Guess I
>> can now move on
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:python-
> [EMAIL PROTECTED] On Behalf Of Michel Bouwmans
> Sent: Wednesday, April 09, 2008 5:44 PM
> To: python-list@python.org
> Subject: RE: Stripping scripts from HTML with regular expressions
>
>
>
On Apr 9, 2:38 pm, Michel Bouwmans <[EMAIL PROTECTED]> wrote:
> Hey everyone,
>
> I'm trying to strip all script-blocks from a HTML-file using regex.
>
> I tried the following in Python:
>
> testfile = open('testfile')
> testhtml = testfile.read()
> regex = re.compile(']*>(.*?)', re.DOTALL)
> resul
PM
> > To: python-list@python.org
> > Subject: Stripping scripts from HTML with regular expressions
> >
> > Hey everyone,
> >
> > I'm trying to strip all script-blocks from a HTML-file using regex.
> >
>
> [Insert obligatory comment abo
Reedick, Andrew wrote:
>
>
>> -Original Message-
>> From: [EMAIL PROTECTED] [mailto:python-
>> [EMAIL PROTECTED] On Behalf Of Michel Bouwmans
>> Sent: Wednesday, April 09, 2008 3:38 PM
>> To: python-list@python.org
>> Subject: Stripping
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:python-
> [EMAIL PROTECTED] On Behalf Of Michel Bouwmans
> Sent: Wednesday, April 09, 2008 3:38 PM
> To: python-list@python.org
> Subject: Stripping scripts from HTML with regular expressions
>
> Hey ever
Michel Bouwmans wrote:
> I'm trying to strip all script-blocks from a HTML-file using regex.
You might want to take a look at lxml.html instead, which comes with an HTML
cleaner module:
http://codespeak.net/lxml/lxmlhtml.html#cleaning-up-html
Stefan
--
http://mail.python.org/mailman/listinfo/py
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:python-
> [EMAIL PROTECTED] On Behalf Of Michel Bouwmans
> Sent: Wednesday, April 09, 2008 3:38 PM
> To: python-list@python.org
> Subject: Stripping scripts from HTML with regular expressions
>
> Hey ever
Hey everyone,
I'm trying to strip all script-blocks from a HTML-file using regex.
I tried the following in Python:
testfile = open('testfile')
testhtml = testfile.read()
regex = re.compile(']*>(.*?)', re.DOTALL)
result = regex.sub('', blaat)
print result
This strips far more away then just the
10 matches
Mail list logo