On Mon, Feb 22, 2010 at 1:06 AM, Lao Mao <laomao1...@googlemail.com> wrote: > Hi, > > I have an html file, with xml style comments in: > > <!-- > Some comments here > Blah > ... > --> > > I'd like to extract only the comments. My sense of smell suggests that > there's probably a library (maybe an xml library) that does this already.
Take a look at BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/documentation.html Your code will look something like this (untested): from BeautifulSoup import BeautifulSoup, Comment data = open('myfile.html').read() soup = BeautifulSoup(data) current = soup while current: if isinstance(current, Comment): print current.string current = current.next > Otherwise, my current alogorithm looks a bit like this: > > * Iterate over file > * If current line contains <!--- > - Toggle 'is_comment' to yes > * If is_comment is yes, print the line > * If current line contains --> > - Toggle 'is_comment' to no > > This feels crude, but is it effective, or ok? It will break on comments like <!-- This is a comment <!-- still the same comment --> It will print too much if the comment doesn't start and end at the start and end of the line. Kent > > Thanks, > > Laomao > > _______________________________________________ > Tutor maillist - tu...@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > > _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor