Hello everybody.

I am using mod_python, and I am confronted with a problem I don't know how to solve in an elegant way...

The problem is that I don't know what is the encoding of the <req.unparsed_uri> strings...

My script runs in China, and I receive requests coded in both "utf-8" and "gb18030" encoding...

The way I handle that is the following:

       uri = req.unparsed_uri
try:
           uri_utf8 = uri.decode("utf-8").encode("utf-8")
           found_encoding = (uri_utf8 == uri)
       except:
           found_encoding = False
if not found_encoding:
           uri_gb18030 = ""
           try:
               uri_gb18030 = uri.decode("gb18030").encode("gb18030")
               found_encoding = (uri_gb18030 == uri)
           except:
               found_encoding = False
if found_encoding:
               uri = uri.decode("gb18030").encode("utf-8")
           else:
               raise "### Failed to find encoding for uri '%s'..." % (uri)

I am not very pleased by that.

So, is there a way to know in which encoding the <unparsed_uri> is coded? Is there a better way to determine the encoding? I noticed the "content_encoding" member of the request, but it is always set to None...


Thanks for your attention,
Daniel
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to