Girish Redekar ha scritto:
I'm trying to build a search engine in python am stuck at the place where I parse HTML to get useful text. One should ideally be able to parse the text (out of HTML tags) along with its position (for phrase searches) and font-size (to weigh words appropriately).
Words weight should be done using semantics, not style. However, if you really need it, for CSS parsing, there is cssutils package. I'm writing a CSS parser, too: http://hg.mperillo.ath.cx/pdfimg/file/tip/pdfimg/style/css/ using PLY, so it should easy to read/modify. It is still in very early stage. > [...] Regards Manlio Perillo _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com