John Nagle wrote:
I've been parsing existing HTML with BeautifulSoup, and occasionally
hit content which has something like Design Advertising, that is,
an instead of an amp;. Is there some way I can get BeautifulSoup
to clean those up? There are various parsing options related to
On 26 Dec 2006 04:22:38 -0800, placid [EMAIL PROTECTED] wrote:
So do you want to remove or replace them with amp; ? If you want
to replace it try the following;
I think he wants to replace them, but just the invalid ones. I.e.,
This this amp; that
would become
This amp; this amp; that
Felipe Almeida Lessa [EMAIL PROTECTED] wrote:
On 26 Dec 2006 04:22:38 -0800, placid [EMAIL PROTECTED] wrote:
So do you want to remove or replace them with amp; ? If you
want to replace it try the following;
I think he wants to replace them, but just the invalid ones. I.e.,
This this
Felipe Almeida Lessa wrote:
On 26 Dec 2006 04:22:38 -0800, placid [EMAIL PROTECTED] wrote:
So do you want to remove or replace them with amp; ? If you want
to replace it try the following;
I think he wants to replace them, but just the invalid ones. I.e.,
This this amp; that
Duncan Booth skrev:
Felipe Almeida Lessa [EMAIL PROTECTED] wrote:
On 26 Dec 2006 04:22:38 -0800, placid [EMAIL PROTECTED] wrote:
So do you want to remove or replace them with amp; ? If you
want to replace it try the following;
I think he wants to replace them, but just
Andreas Lysdal [EMAIL PROTECTED] wrote:
P.S. apos is handled specially as it isn't technically a
valid html entity (and Python doesn't include it in its entity
list), but it is an xml entity and recognised by many browsers so some
people might use it in html.
Hey i fund this site:
John Nagle wrote:
Felipe Almeida Lessa wrote:
On 26 Dec 2006 04:22:38 -0800, placid [EMAIL PROTECTED] wrote:
So do you want to remove or replace them with amp; ? If you want
to replace it try the following;
I think he wants to replace them, but just the invalid ones. I.e.,
I've been parsing existing HTML with BeautifulSoup, and occasionally
hit content which has something like Design Advertising, that is,
an instead of an amp;. Is there some way I can get BeautifulSoup
to clean those up? There are various parsing options related to
handling, but none of them