I've been working through some of the early python challenges [1] and feel enthused to scratch a current itch. However, I want to sound out my idea for the itch before I start coding to get a perspective on the direction I should take.
I've recently bought a media player that also displays .txt files. My itch is to write a script that periodically goes to a news website and 'scrapes' all the relevant information from this. One of my favourites would the Guardian [2]. The Guardian provide RSS feeds and so I would like to grab an RSS list and then proceed to download the content for those 10 or so items. However, here's where the direction is needed. Obviously, my preferred delivery is .txt without all the <html> tags. Is there a quick and easy way to strip out html tags and remain with just the content? And, to be even more pickier, would it be possible to strip out navigation content and just remain with the bare bones of the story? Any pointers for particular libraries I should be looking at would be very helpful. I've already had a quick play with feedparser [3], which was intuitive and easy to program with. What about stripping the html? TIA Adam [1] http://www.pythonchallenge.com [2] http://www.guardian.co.uk [3] http://feedparser.org/ -- http://www.monkeez.org PGP key: 0x7111B833 _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor