Hi,

I am having an encoding issue using Python 2.6.2, SqlAlchemy 0.6.5 and
cxOracle 5.0.4 to access an Oracle 11g database.

I am using the NLS_LANG=.AL32UTF8 enviroment variable. My table in
Oracle is correctly configured to accept Unicode.

First, I compiled cxOracle without the WITH_UNICODE flag (as
recommended by SqlAlchemy).  I was passing data to SqlAlchemy as
unicode() objects. Everything worked without errors or warnings.

However, sometimes Oracle would complain that the string I was trying
to insert into a VARCHAR2 field was too big (> 4000), even when the
string size ( len(the_string.encode('utf-8')) ) was about 2300 bytes
(before SqlAlchemy). I used a sniffer to verify that the Oracle client
was sending two bytes for each character (even the ASCII ones),
instead of sending two bytes only for special characters.

I repeated this insertion experiment using cx_Oracle directly and
passing unicode() objects to it. Same thing: no errors or warnings,
and again the sniffer shown that the Oracle client was sending two
bytes per character. It seemed to me that cx_Oracle does not really
supports unicode() objects; it accepts them but it does not encode()
them to the correct encoding (as set in NLS_LANG variable).

However, when reading data from Oracle everything worked as expected
(i.e. someone was converting the UTF-16(?) string to UTF-8, probably
Oracle itself).

I tried two approaches to make the Oracle client send data in UTF-8
(not UTF-16):

1. Passing str() objects to SqlAlchemy. SqlAlchemy would then complain
in a warning:
* UnicodeWarning: Unicode equal comparison failed to convert both
arguments to Unicode - interpreting them as being unequal *
It worked, but it did not seem right, since I was receiving unicode()
objects and sending str() objetcs to SqlAlchemy. Moreover, I did not
know how things work inside SqlAlchemy, so I tried something else.

2. Compiling cx_Oracle with the WITH_UNICODE flag solved the problem
(utf-8 was sent to Oracle), but another warning was issued:
* SAWarning: cx_Oracle is compiled under Python 2.xx using the
WITH_UNICODE flag.  Consider recompiling cx_Oracle without this flag,
which is in no way necessary for full support of Unicode. Otherwise,
all string-holding bind parameters must be explicitly typed using
SQLAlchemy's String type or one of its subtypes,or otherwise be passed
as Python unicode.  Plain Python strings passed as bind parameters
will be silently corrupted by cx_Oracle. *

Right now I am using cx_Oracle compiled with WITH_UNICODE, everything
works, but this warning is worrying me. I tried to pass a str() object
in UTF-8 to SqlAlchemy, and the execution halted with an encoding
error (UnicodeDecodeError). In other words, cx_Oracle did not silently
corrupted the data (at least when special characters were sent).

I would like to know if anyone has a better (safer) solution to my
problem. Or am I safe enough by using the WITH_UNICODE flag and
passing only unicode() objects?

Thank you in advance.

Guilherme.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to