Hi, I am having an encoding issue using Python 2.6.2, SqlAlchemy 0.6.5 and cxOracle 5.0.4 to access an Oracle 11g database.
I am using the NLS_LANG=.AL32UTF8 enviroment variable. My table in Oracle is correctly configured to accept Unicode. First, I compiled cxOracle without the WITH_UNICODE flag (as recommended by SqlAlchemy). I was passing data to SqlAlchemy as unicode() objects. Everything worked without errors or warnings. However, sometimes Oracle would complain that the string I was trying to insert into a VARCHAR2 field was too big (> 4000), even when the string size ( len(the_string.encode('utf-8')) ) was about 2300 bytes (before SqlAlchemy). I used a sniffer to verify that the Oracle client was sending two bytes for each character (even the ASCII ones), instead of sending two bytes only for special characters. I repeated this insertion experiment using cx_Oracle directly and passing unicode() objects to it. Same thing: no errors or warnings, and again the sniffer shown that the Oracle client was sending two bytes per character. It seemed to me that cx_Oracle does not really supports unicode() objects; it accepts them but it does not encode() them to the correct encoding (as set in NLS_LANG variable). However, when reading data from Oracle everything worked as expected (i.e. someone was converting the UTF-16(?) string to UTF-8, probably Oracle itself). I tried two approaches to make the Oracle client send data in UTF-8 (not UTF-16): 1. Passing str() objects to SqlAlchemy. SqlAlchemy would then complain in a warning: * UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal * It worked, but it did not seem right, since I was receiving unicode() objects and sending str() objetcs to SqlAlchemy. Moreover, I did not know how things work inside SqlAlchemy, so I tried something else. 2. Compiling cx_Oracle with the WITH_UNICODE flag solved the problem (utf-8 was sent to Oracle), but another warning was issued: * SAWarning: cx_Oracle is compiled under Python 2.xx using the WITH_UNICODE flag. Consider recompiling cx_Oracle without this flag, which is in no way necessary for full support of Unicode. Otherwise, all string-holding bind parameters must be explicitly typed using SQLAlchemy's String type or one of its subtypes,or otherwise be passed as Python unicode. Plain Python strings passed as bind parameters will be silently corrupted by cx_Oracle. * Right now I am using cx_Oracle compiled with WITH_UNICODE, everything works, but this warning is worrying me. I tried to pass a str() object in UTF-8 to SqlAlchemy, and the execution halted with an encoding error (UnicodeDecodeError). In other words, cx_Oracle did not silently corrupted the data (at least when special characters were sent). I would like to know if anyone has a better (safer) solution to my problem. Or am I safe enough by using the WITH_UNICODE flag and passing only unicode() objects? Thank you in advance. Guilherme. -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.