[mod_python] Knowing the encoding of the URI

Daniel Chiaramello Tue, 03 Mar 2009 08:44:50 -0800

Hello everybody.

I am using mod_python, and I am confronted with a problem I don't knowhow to solve in an elegant way...

The problem is that I don't know what is the encoding of the<req.unparsed_uri> strings...

My script runs in China, and I receive requests coded in both "utf-8"and "gb18030" encoding...


The way I handle that is the following:

       uri = req.unparsed_uri

try:

           uri_utf8 = uri.decode("utf-8").encode("utf-8")
           found_encoding = (uri_utf8 == uri)
       except:
           found_encoding = False

if not found_encoding:

           uri_gb18030 = ""
           try:
               uri_gb18030 = uri.decode("gb18030").encode("gb18030")
               found_encoding = (uri_gb18030 == uri)
           except:
               found_encoding = False

if found_encoding:

               uri = uri.decode("gb18030").encode("utf-8")
           else:
               raise "### Failed to find encoding for uri '%s'..." % (uri)

I am not very pleased by that.

So, is there a way to know in which encoding the <unparsed_uri> iscoded? Is there a better way to determine the encoding?I noticed the "content_encoding" member of the request, but it is alwaysset to None...



Thanks for your attention,
Daniel
--
http://mail.python.org/mailman/listinfo/python-list

[mod_python] Knowing the encoding of the URI

Reply via email to