[web2py] Re: in trunk - scraping utils

2010-05-25 Thread mdipierro
The entire code is 40 lines and uses the python built-in html parser. It will not be a problem to maintain it. Actually we could even use this simplify both XML(...,sanitize) and gluon.contrib.markdown.WIKI On May 25, 12:50 am, Thadeus Burgess thade...@thadeusb.com wrote: So why our own?

[web2py] Re: in trunk - scraping utils

2010-05-25 Thread mdipierro
yet a better syntax and more API: 1) no more web2pyHTMLParser, use TAG(...) instead. and flatten (remove tags) a=TAG('divHellospanworld/span/div') print a divHellospanworld/span/div print a.element('span') spanworld/span print a.flatten() Helloworld 2) search by multiple conditions,

[web2py] Re: in trunk - scraping utils

2010-05-25 Thread Richard
Was going to say web2pyHTMLParser is too cumbersome - glad you changed to TAG I do some scraping with lxml so am also wary about including this, but the example look very convenient. On May 26, 1:11 am, mdipierro mdipie...@cs.depaul.edu wrote: Here is a one liner to remove all tags from a

[web2py] Re: in trunk - scraping utils

2010-05-25 Thread mdipierro
It makes assumptions. It fails if Python HTMLParser fails. For example: from gluon.html import TAG print TAG('c/bddd/aeee') c/c/bddd/aeee print TAG('c/bdddeee') c/c/bdddeee/a print TAG('b x=bbbc/bdddeee') /a print TAG('b bbbc

Re: [web2py] Re: in trunk - scraping utils

2010-05-25 Thread Álvaro Justen
On Tue, May 25, 2010 at 12:11, mdipierro mdipie...@cs.depaul.edu wrote: Here is a one liner to remove all tags from a some html text: html = 'divhellospanworld/span/div' print TAG(html).flatten() helloworld Very good! -- Álvaro Justen - Turicas http://blog.justen.eng.br/ 21 9898-0141

[web2py] Re: in trunk - scraping utils

2010-05-25 Thread mdipierro
there are docstrings. I will write something more asap. On May 25, 10:28 pm, weheh richard_gor...@verizon.net wrote: This is very nice. I think Thadeus' point is well made. I agree it's useful. It is fringe, but I absolutely need this and will be using it on my current project. Where's the

[web2py] Re: in trunk - scraping utils

2010-05-24 Thread Kevin Bowling
Hmm, I wonder if this is worth the possible maintenance cost? It also transcends the role of a web framework and now you are getting into network programming. I have a currently deployed screen scraping app and found PyQuery to be more than adequate. There is also lxml directly, or Beautiful

Re: [web2py] Re: in trunk - scraping utils

2010-05-24 Thread Thadeus Burgess
So why our own? Because it converts it into web2py helpers. And you don't have to deal with installing anything other than web2py. -- Thadeus On Tue, May 25, 2010 at 12:14 AM, Kevin Bowling kevin.bowl...@gmail.com wrote: Hmm, I wonder if this is worth the possible maintenance cost?  It