Sebastien Noel" <[EMAIL PROTECTED]> wrote
> comments = soup.findAll(text=" ")
> [comment.extract() for comment in comments]
Umm, why comments here and not langcanada?
Just curious...
> # Add some class attributes
> for h1s in range(len(soup.findAll("h1"))):
> le_h1 = soup.findAll("h1")[h1s]
> le_h1["class"] = "heading1_main"
>
> for h2s in range(len(soup.findAll("h2"))):
> le_h2 = soup.findAll("h2")[h2s]
> le_h2["class"] = "heading2_main"
You could abstract this into a function with a few parametes
and put it into a loop, and thus save a load of typing!
OK, Too much code to go through in detail, can you do a simple example
where you try to remove some tags and it doesn't work? Also did you
look at the ReplaceWith method? That may help you if you use
something like a SPAN or DIV tag...
I didn't see you writing anything back in that code but then I was
just scanning
it and may have missed it... You extract them from the parse tree but
do you
ever write the modified tree out?
Alan G
_______________________________________________
Tutor maillist - [email protected]
http://mail.python.org/mailman/listinfo/tutor