Re: clean up html document created by Word

2007-03-30 Thread jd
Wow, thanks for all the great responses! Here's my summary: - demoronizer (from John Walker) is designed to solve some very particular problems that could be considered bugs. However, it doesn't remove the unnecessary html generated by Word. http://www.fourmilab.ch/webtools/demoroniser/ - The

Re: clean up html document created by Word

2007-03-30 Thread Claudio Grondi
jd wrote: > I am looking for python code (working or sample code) that can take an > html document created by Microsoft Word and clean it up (if you've > never had to look at a Word-generated html document, consider yourself > lucky ;-) Alternatively, if you know of a non-python solution, I'd > li

Re: clean up html document created by Word

2007-03-30 Thread Laurent Pointal
jd wrote: > I am looking for python code (working or sample code) that can take an > html document created by Microsoft Word and clean it up (if you've > never had to look at a Word-generated html document, consider yourself > lucky ;-) Alternatively, if you know of a non-python solution, I'd > l

Re: clean up html document created by Word

2007-03-30 Thread bearophileHUGS
jd: > I am looking for python code (working or sample code) that can take an > html document created by Microsoft Word and clean it up (if you've > never had to look at a Word-generated html document, consider yourself > lucky ;-) Alternatively, if you know of a non-python solution, I'd > like to

Re: clean up html document created by Word

2007-03-30 Thread Peter Otten
jkn wrote: > IIUC, the original poster is asking about 'cleaning up' in the sense > of removing the swathes of unnecessary and/or redundant 'cruft' that > Word puts in there, rather than making valid HTML out of invalid HTML. > Again, IIUC, HTMLtidy does not do this. >From that very page I linked

Re: clean up html document created by Word

2007-03-30 Thread Shane Geiger
Tidy can now perform wonders on HTML saved from Microsoft Word 2000! Word bulks out HTML files with stuff for round-tripping presentation between HTML and Word. If you are more concerned about using HTML on the Web, check out Tidy's "Word-2000"

Re: clean up html document created by Word

2007-03-30 Thread jkn
IIUC, the original poster is asking about 'cleaning up' in the sense of removing the swathes of unnecessary and/or redundant 'cruft' that Word puts in there, rather than making valid HTML out of invalid HTML. Again, IIUC, HTMLtidy does not do this. If Beautiful Soup does, then I'm intererested!

Re: clean up html document created by Word

2007-03-30 Thread Peter Otten
jd wrote: > I am looking for python code (working or sample code) that can take an > html document created by Microsoft Word and clean it up (if you've > never had to look at a Word-generated html document, consider yourself > lucky ;-) Alternatively, if you know of a non-python solution, I'd > l

Re: clean up html document created by Word

2007-03-30 Thread kyosohma
On Mar 30, 12:20 pm, "jd" <[EMAIL PROTECTED]> wrote: > I am looking for python code (working or sample code) that can take an > html document created by Microsoft Word and clean it up (if you've > never had to look at a Word-generated html document, consider yourself > lucky ;-) Alternatively, if

clean up html document created by Word

2007-03-30 Thread jd
I am looking for python code (working or sample code) that can take an html document created by Microsoft Word and clean it up (if you've never had to look at a Word-generated html document, consider yourself lucky ;-) Alternatively, if you know of a non-python solution, I'd like to hear about it.