SQLAlchemy can do the encoding for you as Anthony suggests, in this case if you use the Unicode() type with the flag "convert_unicode='force'". Unfortunately the 'force' value isn't accepted at the create_engine() level (thought it was, apparently not). Its likely that WITH_UNICODE is the easier route for you at the moment.
On Nov 18, 2010, at 11:31 AM, Guilherme Menezes wrote: > An update: > > It seems that cx_Oracle always sends data in UTF-16 if WITH_UNICODE is > unset and unicode() objects are passed. > > http://sourceforge.net/mailarchive/message.php?msg_name=AANLkTinTVdfPCn15HgXL6zo7cvRdSoE1t0_Z4eqXTGmQ%40mail.gmail.com > > Regards, > Guilherme. > > On Wed, Nov 17, 2010 at 2:16 PM, Michael Bayer <mike...@zzzcomputing.com> > wrote: >> >> On Nov 17, 2010, at 10:26 AM, Guilherme Menezes wrote: >> >>> Hi, >>> >>> I am having an encoding issue using Python 2.6.2, SqlAlchemy 0.6.5 and >>> cxOracle 5.0.4 to access an Oracle 11g database. >>> >>> I am using the NLS_LANG=.AL32UTF8 enviroment variable. My table in >>> Oracle is correctly configured to accept Unicode. >>> >>> First, I compiled cxOracle without the WITH_UNICODE flag (as >>> recommended by SqlAlchemy). I was passing data to SqlAlchemy as >>> unicode() objects. Everything worked without errors or warnings. >>> >>> However, sometimes Oracle would complain that the string I was trying >>> to insert into a VARCHAR2 field was too big (> 4000), even when the >>> string size ( len(the_string.encode('utf-8')) ) was about 2300 bytes >>> (before SqlAlchemy). I used a sniffer to verify that the Oracle client >>> was sending two bytes for each character (even the ASCII ones), >>> instead of sending two bytes only for special characters. >>> >>> I repeated this insertion experiment using cx_Oracle directly and >>> passing unicode() objects to it. Same thing: no errors or warnings, >>> and again the sniffer shown that the Oracle client was sending two >>> bytes per character. It seemed to me that cx_Oracle does not really >>> supports unicode() objects; it accepts them but it does not encode() >>> them to the correct encoding (as set in NLS_LANG variable). >>> >>> However, when reading data from Oracle everything worked as expected >>> (i.e. someone was converting the UTF-16(?) string to UTF-8, probably >>> Oracle itself). >>> >>> I tried two approaches to make the Oracle client send data in UTF-8 >>> (not UTF-16): >>> >>> 1. Passing str() objects to SqlAlchemy. SqlAlchemy would then complain >>> in a warning: >>> * UnicodeWarning: Unicode equal comparison failed to convert both >>> arguments to Unicode - interpreting them as being unequal * >>> It worked, but it did not seem right, since I was receiving unicode() >>> objects and sending str() objetcs to SqlAlchemy. Moreover, I did not >>> know how things work inside SqlAlchemy, so I tried something else. >>> >>> 2. Compiling cx_Oracle with the WITH_UNICODE flag solved the problem >>> (utf-8 was sent to Oracle), but another warning was issued: >>> * SAWarning: cx_Oracle is compiled under Python 2.xx using the >>> WITH_UNICODE flag. Consider recompiling cx_Oracle without this flag, >>> which is in no way necessary for full support of Unicode. Otherwise, >>> all string-holding bind parameters must be explicitly typed using >>> SQLAlchemy's String type or one of its subtypes,or otherwise be passed >>> as Python unicode. Plain Python strings passed as bind parameters >>> will be silently corrupted by cx_Oracle. * >>> >>> Right now I am using cx_Oracle compiled with WITH_UNICODE, everything >>> works, but this warning is worrying me. I tried to pass a str() object >>> in UTF-8 to SqlAlchemy, and the execution halted with an encoding >>> error (UnicodeDecodeError). In other words, cx_Oracle did not silently >>> corrupted the data (at least when special characters were sent). >>> >>> I would like to know if anyone has a better (safer) solution to my >>> problem. Or am I safe enough by using the WITH_UNICODE flag and >>> passing only unicode() objects? >> >> >> Regarding the passing of utf-8 vs. utf-16 and the differences in >> WITH_UNICODE mode, you should really email the cx_oracle list about that as >> WITH_UNICODE to my knowledge only has to do with the Python interpretation >> of arguments, not its interaction with OCI. If you email their list, make >> sure all code examples are purely derived from cx_oracle, to eliminate any >> doubt that each behavior is one of cx_oracle directly. WITH_UNICODE was not >> intended for general use in Python 2.xx. if your whole application really >> works with it, then there's no reason not to use it, its just that you'll >> always have to ensure that no non-Python unicode strings ever get sent to >> cx_oracle which can be fairly tedious. It sounds like a bug that cx_oracle >> would be expanding into a two-byte-per-character stream like that. >> >> Its possible that calling setinputsizes() with cx_oracle.UNICODE may be the >> key to cx_oracle's behavior here. We call setinputsizes for every Oracle >> cursor but we currently exclude the string types from that list as it had >> some unwanted side effects. >> >> Also note that VARCHAR2(4000) is measuring itself in bytes, not characters. >> For true Unicode support Oracle provides the NVARCHAR2 type, where the >> "length" specifies the number of characters, instead of bytes. Recent >> versions of Oracle also support the form VARCHAR2(4000 CHAR) which will >> similarly measure the column in terms of characters based on the databases >> encoding instead of bytes. Its worth investigating if using NVARCHAR2 >> causes cx_oracle, or OCI, to change its behavior. >> >> >>> >>> Thank you in advance. >>> >>> Guilherme. >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "sqlalchemy" group. >>> To post to this group, send email to sqlalch...@googlegroups.com. >>> To unsubscribe from this group, send email to >>> sqlalchemy+unsubscr...@googlegroups.com. >>> For more options, visit this group at >>> http://groups.google.com/group/sqlalchemy?hl=en. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "sqlalchemy" group. >> To post to this group, send email to sqlalch...@googlegroups.com. >> To unsubscribe from this group, send email to >> sqlalchemy+unsubscr...@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/sqlalchemy?hl=en. >> >> > > -- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To post to this group, send email to sqlalch...@googlegroups.com. > To unsubscribe from this group, send email to > sqlalchemy+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/sqlalchemy?hl=en. > -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.