Re: [sqlalchemy] Encoding issue with SqlAlchemy + cxOracle

Michael Bayer Thu, 18 Nov 2010 08:44:05 -0800

SQLAlchemy can do the encoding for you as Anthony suggests, in this case if you 
use the Unicode() type with the flag "convert_unicode='force'".   Unfortunately 
the 'force' value isn't accepted at the create_engine() level (thought it was, 
apparently not).    Its likely that WITH_UNICODE is the easier route for you at 
the moment.



On Nov 18, 2010, at 11:31 AM, Guilherme Menezes wrote:

> An update:
> 
> It seems that cx_Oracle always sends data in UTF-16 if WITH_UNICODE is
> unset and unicode() objects are passed.
> 
> http://sourceforge.net/mailarchive/message.php?msg_name=AANLkTinTVdfPCn15HgXL6zo7cvRdSoE1t0_Z4eqXTGmQ%40mail.gmail.com
> 
> Regards,
> Guilherme.
> 
> On Wed, Nov 17, 2010 at 2:16 PM, Michael Bayer <mike...@zzzcomputing.com> 
> wrote:
>> 
>> On Nov 17, 2010, at 10:26 AM, Guilherme Menezes wrote:
>> 
>>> Hi,
>>> 
>>> I am having an encoding issue using Python 2.6.2, SqlAlchemy 0.6.5 and
>>> cxOracle 5.0.4 to access an Oracle 11g database.
>>> 
>>> I am using the NLS_LANG=.AL32UTF8 enviroment variable. My table in
>>> Oracle is correctly configured to accept Unicode.
>>> 
>>> First, I compiled cxOracle without the WITH_UNICODE flag (as
>>> recommended by SqlAlchemy).  I was passing data to SqlAlchemy as
>>> unicode() objects. Everything worked without errors or warnings.
>>> 
>>> However, sometimes Oracle would complain that the string I was trying
>>> to insert into a VARCHAR2 field was too big (> 4000), even when the
>>> string size ( len(the_string.encode('utf-8')) ) was about 2300 bytes
>>> (before SqlAlchemy). I used a sniffer to verify that the Oracle client
>>> was sending two bytes for each character (even the ASCII ones),
>>> instead of sending two bytes only for special characters.
>>> 
>>> I repeated this insertion experiment using cx_Oracle directly and
>>> passing unicode() objects to it. Same thing: no errors or warnings,
>>> and again the sniffer shown that the Oracle client was sending two
>>> bytes per character. It seemed to me that cx_Oracle does not really
>>> supports unicode() objects; it accepts them but it does not encode()
>>> them to the correct encoding (as set in NLS_LANG variable).
>>> 
>>> However, when reading data from Oracle everything worked as expected
>>> (i.e. someone was converting the UTF-16(?) string to UTF-8, probably
>>> Oracle itself).
>>> 
>>> I tried two approaches to make the Oracle client send data in UTF-8
>>> (not UTF-16):
>>> 
>>> 1. Passing str() objects to SqlAlchemy. SqlAlchemy would then complain
>>> in a warning:
>>> * UnicodeWarning: Unicode equal comparison failed to convert both
>>> arguments to Unicode - interpreting them as being unequal *
>>> It worked, but it did not seem right, since I was receiving unicode()
>>> objects and sending str() objetcs to SqlAlchemy. Moreover, I did not
>>> know how things work inside SqlAlchemy, so I tried something else.
>>> 
>>> 2. Compiling cx_Oracle with the WITH_UNICODE flag solved the problem
>>> (utf-8 was sent to Oracle), but another warning was issued:
>>> * SAWarning: cx_Oracle is compiled under Python 2.xx using the
>>> WITH_UNICODE flag.  Consider recompiling cx_Oracle without this flag,
>>> which is in no way necessary for full support of Unicode. Otherwise,
>>> all string-holding bind parameters must be explicitly typed using
>>> SQLAlchemy's String type or one of its subtypes,or otherwise be passed
>>> as Python unicode.  Plain Python strings passed as bind parameters
>>> will be silently corrupted by cx_Oracle. *
>>> 
>>> Right now I am using cx_Oracle compiled with WITH_UNICODE, everything
>>> works, but this warning is worrying me. I tried to pass a str() object
>>> in UTF-8 to SqlAlchemy, and the execution halted with an encoding
>>> error (UnicodeDecodeError). In other words, cx_Oracle did not silently
>>> corrupted the data (at least when special characters were sent).
>>> 
>>> I would like to know if anyone has a better (safer) solution to my
>>> problem. Or am I safe enough by using the WITH_UNICODE flag and
>>> passing only unicode() objects?
>> 
>> 
>> Regarding the passing of utf-8 vs. utf-16 and the differences in 
>> WITH_UNICODE mode, you should really email the cx_oracle list about that as 
>> WITH_UNICODE to my knowledge only has to do with the Python interpretation 
>> of arguments, not its interaction with OCI.   If you email their list, make 
>> sure all code examples are purely derived from cx_oracle, to eliminate any 
>> doubt that each behavior is one of cx_oracle directly.  WITH_UNICODE was not 
>> intended for general use in Python 2.xx.   if your whole application really 
>> works with it, then there's no reason not to use it, its just that you'll 
>> always have to ensure that no non-Python unicode strings ever get sent to 
>> cx_oracle which can be fairly tedious.   It sounds like a bug that cx_oracle 
>> would be expanding into a two-byte-per-character stream like that.
>> 
>> Its possible that calling setinputsizes() with cx_oracle.UNICODE may be the 
>> key to cx_oracle's behavior here.   We call setinputsizes for every Oracle 
>> cursor but we currently exclude the string types from that list as it had 
>> some unwanted side effects.
>> 
>> Also note that VARCHAR2(4000) is measuring itself in bytes, not characters.  
>> For true Unicode support Oracle provides the NVARCHAR2 type, where the 
>> "length" specifies the number of characters, instead of bytes.   Recent 
>> versions of Oracle also support the form VARCHAR2(4000 CHAR) which will 
>> similarly measure the column in terms of characters based on the databases 
>> encoding instead of bytes.    Its worth investigating if using NVARCHAR2 
>> causes cx_oracle, or OCI, to change its behavior.
>> 
>> 
>>> 
>>> Thank you in advance.
>>> 
>>> Guilherme.
>>> 
>>> --
>>> You received this message because you are subscribed to the Google Groups 
>>> "sqlalchemy" group.
>>> To post to this group, send email to sqlalch...@googlegroups.com.
>>> To unsubscribe from this group, send email to 
>>> sqlalchemy+unsubscr...@googlegroups.com.
>>> For more options, visit this group at 
>>> http://groups.google.com/group/sqlalchemy?hl=en.
>>> 
>> 
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "sqlalchemy" group.
>> To post to this group, send email to sqlalch...@googlegroups.com.
>> To unsubscribe from this group, send email to 
>> sqlalchemy+unsubscr...@googlegroups.com.
>> For more options, visit this group at 
>> http://groups.google.com/group/sqlalchemy?hl=en.
>> 
>> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To post to this group, send email to sqlalch...@googlegroups.com.
> To unsubscribe from this group, send email to 
> sqlalchemy+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Encoding issue with SqlAlchemy + cxOracle

Reply via email to