On 2/2/07, Dave Kuhlman <[EMAIL PROTECTED]> wrote: > I'd like to implement and explore tools for analyzing Web pages. I > have in mind things like: > > - Tracing links from a Web page. Building a tree structure of > links to a specified depth. > > - Tracing links to a Web page. Showing incoming links to a > specified depth. > > - Word count, word frequency analysis, words in context, etc. > > - Etc. > > Basically, I'm interested in looking at the structure of the Web > and trying to help make it useful.
Sounds like an interesting project. > So, my question: Are there existing tools (in Python) of course for > this kind of thing. I'd like (1) not to reinvent what is already > there and (2) to make use of what already exists. Well, for your analysis phase, I would look at the Natural Language Tool Kit (NLTK) [1]. I haven't used it personally, but I have always wanted to try it out. The documentation is great. > I've done a few Web searches, but have not found that much of > interest. > > I plan to start with BeautifulSoup.py at a minimum. Maybe urllib2.urlopen + BeautifulSoup + nltk will be enough to get you going. Post back with any cool results. Christian _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com