Girish Redekar ha scritto:
I'm trying to build a search engine in python am stuck at the place where I parse HTML to get useful text. One should ideally be able to parse the text (out of HTML tags) along with its position (for phrase searches) and font-size (to weigh words appropriately).


Words weight should be done using semantics, not style.

However, if you really need it, for CSS parsing, there is cssutils package.
I'm writing a CSS parser, too:
http://hg.mperillo.ath.cx/pdfimg/file/tip/pdfimg/style/css/

using PLY, so it should easy to read/modify.
It is still in very early stage.



> [...]


Regards  Manlio Perillo
_______________________________________________
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com

Reply via email to