Thank you. I will report the problem to the cx_Oracle list and see what they have to say.
Regards, Guilherme. On Wed, Nov 17, 2010 at 2:16 PM, Michael Bayer <mike...@zzzcomputing.com> wrote: > > On Nov 17, 2010, at 10:26 AM, Guilherme Menezes wrote: > >> Hi, >> >> I am having an encoding issue using Python 2.6.2, SqlAlchemy 0.6.5 and >> cxOracle 5.0.4 to access an Oracle 11g database. >> >> I am using the NLS_LANG=.AL32UTF8 enviroment variable. My table in >> Oracle is correctly configured to accept Unicode. >> >> First, I compiled cxOracle without the WITH_UNICODE flag (as >> recommended by SqlAlchemy). I was passing data to SqlAlchemy as >> unicode() objects. Everything worked without errors or warnings. >> >> However, sometimes Oracle would complain that the string I was trying >> to insert into a VARCHAR2 field was too big (> 4000), even when the >> string size ( len(the_string.encode('utf-8')) ) was about 2300 bytes >> (before SqlAlchemy). I used a sniffer to verify that the Oracle client >> was sending two bytes for each character (even the ASCII ones), >> instead of sending two bytes only for special characters. >> >> I repeated this insertion experiment using cx_Oracle directly and >> passing unicode() objects to it. Same thing: no errors or warnings, >> and again the sniffer shown that the Oracle client was sending two >> bytes per character. It seemed to me that cx_Oracle does not really >> supports unicode() objects; it accepts them but it does not encode() >> them to the correct encoding (as set in NLS_LANG variable). >> >> However, when reading data from Oracle everything worked as expected >> (i.e. someone was converting the UTF-16(?) string to UTF-8, probably >> Oracle itself). >> >> I tried two approaches to make the Oracle client send data in UTF-8 >> (not UTF-16): >> >> 1. Passing str() objects to SqlAlchemy. SqlAlchemy would then complain >> in a warning: >> * UnicodeWarning: Unicode equal comparison failed to convert both >> arguments to Unicode - interpreting them as being unequal * >> It worked, but it did not seem right, since I was receiving unicode() >> objects and sending str() objetcs to SqlAlchemy. Moreover, I did not >> know how things work inside SqlAlchemy, so I tried something else. >> >> 2. Compiling cx_Oracle with the WITH_UNICODE flag solved the problem >> (utf-8 was sent to Oracle), but another warning was issued: >> * SAWarning: cx_Oracle is compiled under Python 2.xx using the >> WITH_UNICODE flag. Consider recompiling cx_Oracle without this flag, >> which is in no way necessary for full support of Unicode. Otherwise, >> all string-holding bind parameters must be explicitly typed using >> SQLAlchemy's String type or one of its subtypes,or otherwise be passed >> as Python unicode. Plain Python strings passed as bind parameters >> will be silently corrupted by cx_Oracle. * >> >> Right now I am using cx_Oracle compiled with WITH_UNICODE, everything >> works, but this warning is worrying me. I tried to pass a str() object >> in UTF-8 to SqlAlchemy, and the execution halted with an encoding >> error (UnicodeDecodeError). In other words, cx_Oracle did not silently >> corrupted the data (at least when special characters were sent). >> >> I would like to know if anyone has a better (safer) solution to my >> problem. Or am I safe enough by using the WITH_UNICODE flag and >> passing only unicode() objects? > > > Regarding the passing of utf-8 vs. utf-16 and the differences in WITH_UNICODE > mode, you should really email the cx_oracle list about that as WITH_UNICODE > to my knowledge only has to do with the Python interpretation of arguments, > not its interaction with OCI. If you email their list, make sure all code > examples are purely derived from cx_oracle, to eliminate any doubt that each > behavior is one of cx_oracle directly. WITH_UNICODE was not intended for > general use in Python 2.xx. if your whole application really works with it, > then there's no reason not to use it, its just that you'll always have to > ensure that no non-Python unicode strings ever get sent to cx_oracle which > can be fairly tedious. It sounds like a bug that cx_oracle would be > expanding into a two-byte-per-character stream like that. > > Its possible that calling setinputsizes() with cx_oracle.UNICODE may be the > key to cx_oracle's behavior here. We call setinputsizes for every Oracle > cursor but we currently exclude the string types from that list as it had > some unwanted side effects. > > Also note that VARCHAR2(4000) is measuring itself in bytes, not characters. > For true Unicode support Oracle provides the NVARCHAR2 type, where the > "length" specifies the number of characters, instead of bytes. Recent > versions of Oracle also support the form VARCHAR2(4000 CHAR) which will > similarly measure the column in terms of characters based on the databases > encoding instead of bytes. Its worth investigating if using NVARCHAR2 > causes cx_oracle, or OCI, to change its behavior. > > >> >> Thank you in advance. >> >> Guilherme. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "sqlalchemy" group. >> To post to this group, send email to sqlalch...@googlegroups.com. >> To unsubscribe from this group, send email to >> sqlalchemy+unsubscr...@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/sqlalchemy?hl=en. >> > > -- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To post to this group, send email to sqlalch...@googlegroups.com. > To unsubscribe from this group, send email to > sqlalchemy+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/sqlalchemy?hl=en. > > -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.