I'm trying to build a search engine in python am stuck at the place where I
parse HTML to get useful text. One should ideally be able to parse the text
(out of HTML tags) along with its position (for phrase searches) and
font-size (to weigh words appropriately).

However, this part gets very tedious (especially with bad html and css) and
my code is already unwieldy. It seems to me that this task should've been a
part of any python based semi-sophisticated screen scraper and that it would
be a commonly solved problem. Yet, no amount of googling has returned
anything useful.

Any ideas?
_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Reply via email to