here's "data too large" for the size of 10: sqlalchemy.exc.DatabaseError: (cx_Oracle.DatabaseError) ORA-12899: value too large for column "SCOTT"."UTF8_TEST"."TEXT" (actual: 12, maximum: 10) [SQL: u'INSERT INTO utf8_test (text) VALUES (:text)'] [parameters: {'text': '\\u7535\\u8111'}] (Background on this error at: http://sqlalche.me/e/4xp6)
On Fri, Jan 12, 2018 at 10:33 AM, Mike Bayer <mike...@zzzcomputing.com> wrote: > On Fri, Jan 12, 2018 at 7:14 AM, Stefan Schwarzer > <sschwar...@sschwarzer.net> wrote: >> I have trouble inserting Chinese characters into an Oracle database with >> SQLAlchemy (and to some extent with cx_Oracle). I don't specifically need >> Chinese characters but I used them in a unit test to see if my code >> (hopefully) can handle "any" unicode characters. > > I can't reproduce your result, I'm not using any SQL tool I am instead > just getting the same data back and asserting it matches. Per the > author of cx_Oracle the setinputsizes call is no longer needed in > most cases in the 6.x series, but also in the 1.x series, the > autosetinputsizes call for UNICODE and STRING was omitted because I > still got poorer results. Round-tripping of unicode characters is > widely tested with cx_Oracle / SQLAlchemy. > > In SQLAlchemy 1.1 series and earlier, you can specify > exclude_setinputsizes=() to have STRING be part of the automatic > setinputsizes call. In SQLAlchemy 1.2 these features were all removed > as there was never any reason to pass most datatypes to setinputsizes. > in 1.2 you can still say > engine.dialect._include_setinputsizes.add(cx_Oracle.STRING) to re-add > it, but this should not be necessary (my test works with or without > it). > > Also which version of Python you're using matters, however I get a > round trip under both pythons. > > To round trip it, do this - note I had to expand the size of the > VARCHAR to fit your string, it was giving me a "data too large" error > before, so that might be a clue: > > import os > os.environ["NLS_LANG"] = "German_Germany.UTF8" > > import sqlalchemy as sa > > > metadata = sa.MetaData() > > test_table = sa.Table("utf8_test", metadata, sa.Column("text", > sa.Unicode(20))) > > e = sa.create_engine("oracle+cx_oracle://scott:tiger@oracle1120/xe", > echo='debug') > > metadata.drop_all(e) > metadata.create_all(e) > > data = "\u7535\u8111" > insert = test_table.insert().values(text=data) > with e.connect() as connection: > connection.execute(insert) > > sqla_result = connection.scalar(sa.select([test_table.c.text])) > > assert sqla_result == data > > please try that out and see if you aren't getting a true result. > note also that the table is being dropped and created within the > script itself. > > when I run in Python 3 I even see the correct characters displayed: > > 2018-01-12 10:21:37,658 INFO sqlalchemy.engine.base.Engine INSERT INTO > utf8_test (text) VALUES (:text) > 2018-01-12 10:21:37,658 INFO sqlalchemy.engine.base.Engine {'text': '电脑'} > 2018-01-12 10:21:37,663 INFO sqlalchemy.engine.base.Engine COMMIT > 2018-01-12 10:21:37,668 INFO sqlalchemy.engine.base.Engine SELECT > utf8_test.text > FROM utf8_test > 2018-01-12 10:21:37,668 INFO sqlalchemy.engine.base.Engine {} > 2018-01-12 10:21:37,672 DEBUG sqlalchemy.engine.base.Engine Col ('TEXT',) > 2018-01-12 10:21:37,672 DEBUG sqlalchemy.engine.base.Engine Row ('电脑',) > > The next step would be to raise this on cx_Oracle's tracker > (https://github.com/oracle/python-cx_Oracle/issues) to understand why > your setup needs setinputsizes to be called and mine doesn't (and has > not for years). > > > > > >> >> The setup: >> >> - SQLAlchemy 1.1.15 and 1.2 >> - Oracle database 11.2 >> - cx_Oracle 6.1 >> - environment variable set from the shell ( export >> NLS_LANG="German_Germany.UTF8" ) and in the code >> >> Database table: >> >> CREATE TABLE UTF8_TEST ( >> TEXT NVARCHAR2(10) >> ) >> >> >> 1) SQLAlchemy, version 1.1.15 and 1.2 >> >> import os >> os.environ["NLS_LANG"] = "German_Germany.UTF8" >> >> import sqlalchemy as sa >> >> >> metadata = sa.MetaData() >> >> test_table = sa.Table("utf8_test", metadata, sa.Column("text", >> sa.Unicode(10))) >> >> >> def test(host, port, service_name, user, password): >> connect_string = >> "oracle+cx_oracle://{}:{}@{}:{}/?service_name={}".format(user, password, >> host, port, service_name) >> engine = sa.engine.create_engine(connect_string, encoding="utf8") >> insert = test_table.insert().values(text="\u7535\u8111") >> with engine.connect() as connection: >> connection.execute(insert) >> >> >> if __name__ == "__main__": >> db_config = ... >> test(db_config.host, db_config.port, db_config.service_name, >> db_config.user, db_config.password) >> >> When this code is executed, it inserts something into the database, but it >> shows up as two upside-down question marks when viewed with Oracle SQL >> Developer. This doesn't seem to be an encoding/rendering problem in SQL >> Developer (see below for code snippet 3). >> >> >> 2) cx_Oracle, without `Cursor.setinputsizes` >> >> import os >> os.environ["NLS_LANG"] = "German_Germany.UTF8" >> >> import cx_Oracle >> >> >> def test(host, port, service_name, user, password): >> dsn = cx_Oracle.makedsn(host, port, service_name=service_name) >> conn = cx_Oracle.connect(user, password, dsn=dsn, encoding="utf8", >> nencoding="utf8") >> cur = conn.cursor() >> cur.execute("INSERT INTO UTF8_TEST (TEXT) VALUES (:t)", {"t": >> "\u7535\u8111"}) >> cur.close() >> conn.commit() >> >> >> if __name__ == "__main__": >> db_config = ... >> test(db_config.host, db_config.port, db_config.service_name, >> db_config.user, db_config.password) >> >> This code has the same effect as the SQLAlchemy code above. >> >> >> 3) cx_Oracle with `Cursor.setinputsizes` >> >> import os >> os.environ["NLS_LANG"] = "German_Germany.UTF8" >> >> import cx_Oracle >> >> >> def test(host, port, service_name, user, password): >> dsn = cx_Oracle.makedsn(host, port, service_name=service_name) >> conn = cx_Oracle.connect(user, password, dsn=dsn, encoding="utf8", >> nencoding="utf8") >> cur = conn.cursor() >> # Using `setinputsizes` makes the difference. >> cur.execute("SELECT TEXT FROM UTF8_TEST") >> description = cur.description >> # `t` is name of bind parameter below. >> cur.setinputsizes(t=description[0][1]) >> cur.execute("INSERT INTO UTF8_TEST (TEXT) VALUES (:t)", {"t": >> "\u7535\u8111"}) >> cur.close() >> conn.commit() >> >> >> if __name__ == "__main__": >> db_config = ... >> test(db_config.host, db_config.port, db_config.service_name, >> db_config.user, db_config.password) >> >> With the `setinputsizes` call, the Chinese characters appear in Oracle SQL >> Developer as they should. >> >> >> In case it matters, I tried to use some German special characters (the >> string "äß") and this works for all three variants. These two characters are >> encoded in UTF-8 to one byte each whereas the Chinese characters are encoded >> to several bytes each. >> >> >> Should the above code for SQLAlchemy work as-is and insert the Chinese >> charaters (I expected it)? >> >> Is there a workaround for now to have SQLAlchemy insert the correct >> characters? For example, is there a way to tell SQLAlchemy to use >> `Cursor.setinputsizes` for inserts? (I assume this would apply to updates, >> too.) >> >> Best regards, >> Stefan >> >> -- >> SQLAlchemy - >> The Python SQL Toolkit and Object Relational Mapper >> >> http://www.sqlalchemy.org/ >> >> To post example code, please provide an MCVE: Minimal, Complete, and >> Verifiable Example. See http://stackoverflow.com/help/mcve for a full >> description. >> --- >> You received this message because you are subscribed to the Google Groups >> "sqlalchemy" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to sqlalchemy+unsubscr...@googlegroups.com. >> To post to this group, send email to sqlalchemy@googlegroups.com. >> Visit this group at https://groups.google.com/group/sqlalchemy. >> For more options, visit https://groups.google.com/d/optout. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.