A.T.Hofkamp wrote: > Dinesh B Vadhia wrote: >> I'm processing tens of thousands of html files and a few of them >> contain mismatched tags and ElementTree throws the error: >> >> "Unexpected error opening J:/F2/663/blahblah.html: mismatched tag: >> line 124, column 8" >> >> I now want to scan each file and simply identify each mismatched or >> unpaired > tags (by line number) in each file. I've read the ElementTree docs and > cannot > see anything obvious how to do this. I know this is a common problem but > feeling a bit clueless here - any ideas? >> > > Don't use elementTree, use BeautifulSoup instead. > > elementTree expects perfect input, typically generated by another computer. > BeautifulSoup is designed to handle your everyday HTML page, filled with > errors of all possible kinds.
But it also modifies the source html by default, adding closing tags, etc. Important to know, I suppose, if you intend to re-write the html files you parse with BeautifulSoup. Also, unless you're running python 3.0 or greater, use the 3.0.x series of BeautifulSoup -- otherwise you may run into the same issue. http://www.crummy.com/software/BeautifulSoup/3.1-problems.html HTH, Marty _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor