Thank you.

I will report the problem to the cx_Oracle list and see what they have to say.

Regards,

Guilherme.


On Wed, Nov 17, 2010 at 2:16 PM, Michael Bayer <mike...@zzzcomputing.com> wrote:
>
> On Nov 17, 2010, at 10:26 AM, Guilherme Menezes wrote:
>
>> Hi,
>>
>> I am having an encoding issue using Python 2.6.2, SqlAlchemy 0.6.5 and
>> cxOracle 5.0.4 to access an Oracle 11g database.
>>
>> I am using the NLS_LANG=.AL32UTF8 enviroment variable. My table in
>> Oracle is correctly configured to accept Unicode.
>>
>> First, I compiled cxOracle without the WITH_UNICODE flag (as
>> recommended by SqlAlchemy).  I was passing data to SqlAlchemy as
>> unicode() objects. Everything worked without errors or warnings.
>>
>> However, sometimes Oracle would complain that the string I was trying
>> to insert into a VARCHAR2 field was too big (> 4000), even when the
>> string size ( len(the_string.encode('utf-8')) ) was about 2300 bytes
>> (before SqlAlchemy). I used a sniffer to verify that the Oracle client
>> was sending two bytes for each character (even the ASCII ones),
>> instead of sending two bytes only for special characters.
>>
>> I repeated this insertion experiment using cx_Oracle directly and
>> passing unicode() objects to it. Same thing: no errors or warnings,
>> and again the sniffer shown that the Oracle client was sending two
>> bytes per character. It seemed to me that cx_Oracle does not really
>> supports unicode() objects; it accepts them but it does not encode()
>> them to the correct encoding (as set in NLS_LANG variable).
>>
>> However, when reading data from Oracle everything worked as expected
>> (i.e. someone was converting the UTF-16(?) string to UTF-8, probably
>> Oracle itself).
>>
>> I tried two approaches to make the Oracle client send data in UTF-8
>> (not UTF-16):
>>
>> 1. Passing str() objects to SqlAlchemy. SqlAlchemy would then complain
>> in a warning:
>> * UnicodeWarning: Unicode equal comparison failed to convert both
>> arguments to Unicode - interpreting them as being unequal *
>> It worked, but it did not seem right, since I was receiving unicode()
>> objects and sending str() objetcs to SqlAlchemy. Moreover, I did not
>> know how things work inside SqlAlchemy, so I tried something else.
>>
>> 2. Compiling cx_Oracle with the WITH_UNICODE flag solved the problem
>> (utf-8 was sent to Oracle), but another warning was issued:
>> * SAWarning: cx_Oracle is compiled under Python 2.xx using the
>> WITH_UNICODE flag.  Consider recompiling cx_Oracle without this flag,
>> which is in no way necessary for full support of Unicode. Otherwise,
>> all string-holding bind parameters must be explicitly typed using
>> SQLAlchemy's String type or one of its subtypes,or otherwise be passed
>> as Python unicode.  Plain Python strings passed as bind parameters
>> will be silently corrupted by cx_Oracle. *
>>
>> Right now I am using cx_Oracle compiled with WITH_UNICODE, everything
>> works, but this warning is worrying me. I tried to pass a str() object
>> in UTF-8 to SqlAlchemy, and the execution halted with an encoding
>> error (UnicodeDecodeError). In other words, cx_Oracle did not silently
>> corrupted the data (at least when special characters were sent).
>>
>> I would like to know if anyone has a better (safer) solution to my
>> problem. Or am I safe enough by using the WITH_UNICODE flag and
>> passing only unicode() objects?
>
>
> Regarding the passing of utf-8 vs. utf-16 and the differences in WITH_UNICODE 
> mode, you should really email the cx_oracle list about that as WITH_UNICODE 
> to my knowledge only has to do with the Python interpretation of arguments, 
> not its interaction with OCI.   If you email their list, make sure all code 
> examples are purely derived from cx_oracle, to eliminate any doubt that each 
> behavior is one of cx_oracle directly.  WITH_UNICODE was not intended for 
> general use in Python 2.xx.   if your whole application really works with it, 
> then there's no reason not to use it, its just that you'll always have to 
> ensure that no non-Python unicode strings ever get sent to cx_oracle which 
> can be fairly tedious.   It sounds like a bug that cx_oracle would be 
> expanding into a two-byte-per-character stream like that.
>
> Its possible that calling setinputsizes() with cx_oracle.UNICODE may be the 
> key to cx_oracle's behavior here.   We call setinputsizes for every Oracle 
> cursor but we currently exclude the string types from that list as it had 
> some unwanted side effects.
>
> Also note that VARCHAR2(4000) is measuring itself in bytes, not characters.  
> For true Unicode support Oracle provides the NVARCHAR2 type, where the 
> "length" specifies the number of characters, instead of bytes.   Recent 
> versions of Oracle also support the form VARCHAR2(4000 CHAR) which will 
> similarly measure the column in terms of characters based on the databases 
> encoding instead of bytes.    Its worth investigating if using NVARCHAR2 
> causes cx_oracle, or OCI, to change its behavior.
>
>
>>
>> Thank you in advance.
>>
>> Guilherme.
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "sqlalchemy" group.
>> To post to this group, send email to sqlalch...@googlegroups.com.
>> To unsubscribe from this group, send email to 
>> sqlalchemy+unsubscr...@googlegroups.com.
>> For more options, visit this group at 
>> http://groups.google.com/group/sqlalchemy?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To post to this group, send email to sqlalch...@googlegroups.com.
> To unsubscribe from this group, send email to 
> sqlalchemy+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to