Hello everybody.
I am using mod_python, and I am confronted with a problem I don't know
how to solve in an elegant way...
The problem is that I don't know what is the encoding of the
<req.unparsed_uri> strings...
My script runs in China, and I receive requests coded in both "utf-8"
and "gb18030" encoding...
The way I handle that is the following:
uri = req.unparsed_uri
try:
uri_utf8 = uri.decode("utf-8").encode("utf-8")
found_encoding = (uri_utf8 == uri)
except:
found_encoding = False
if not found_encoding:
uri_gb18030 = ""
try:
uri_gb18030 = uri.decode("gb18030").encode("gb18030")
found_encoding = (uri_gb18030 == uri)
except:
found_encoding = False
if found_encoding:
uri = uri.decode("gb18030").encode("utf-8")
else:
raise "### Failed to find encoding for uri '%s'..." % (uri)
I am not very pleased by that.
So, is there a way to know in which encoding the <unparsed_uri> is
coded? Is there a better way to determine the encoding?
I noticed the "content_encoding" member of the request, but it is always
set to None...
Thanks for your attention,
Daniel
--
http://mail.python.org/mailman/listinfo/python-list