[web2py] Encoded string and parsing with TAG() helper error

amphisia pui Wed, 31 Aug 2011 04:28:57 -0700

Hi to All,

I need to write a "wrapper" between a web2py application and a php one.
The php application is after a login form so I would like


1) get login form
2) parse login form
3) post username, password and data obtained in 2)
4) use web2py how  a user could do using a browser


This is the plan but I got stuck on 2) because of:

 >>> import requests
>>> url = "http://www.google.com";
>>> page = requests.get(url)
>>> page.status_code
200
>>> type(page.content)
<type 'str'>
>>> page_parsed = TAG(page.content)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Studio/web2py/gluon/html.py", line 1037, in __call__
    return web2pyHTMLParser(decoder.decoder(html)).tree
  File "/Studio/web2py/gluon/decoder.py", line 74, in decoder
    return buffer.decode(encoding).encode('utf8')
  File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe0 in position 6561:
invalid continuation byte

What is the better way to parse non ascii html strings using web2py tools?

I know about beautiful soup but I'd rather have an slim python installation
if I could.

Thanks

AP

[web2py] Encoded string and parsing with TAG() helper error

Reply via email to