On Fri, Jan 12, 2018 at 7:14 AM, Stefan Schwarzer
<sschwar...@sschwarzer.net> wrote:
> I have trouble inserting Chinese characters into an Oracle database with
> SQLAlchemy (and to some extent with cx_Oracle). I don't specifically need
> Chinese characters but I used them in a unit test to see if my code
> (hopefully) can handle "any" unicode characters.

I can't reproduce your result, I'm not using any SQL tool I am instead
just getting the same data back and asserting it matches.  Per the
author of  cx_Oracle the setinputsizes call is no longer needed in
most cases in the 6.x series, but also in the 1.x series, the
autosetinputsizes call for UNICODE and STRING was omitted because I
still got poorer results.   Round-tripping of unicode characters is
widely tested with cx_Oracle / SQLAlchemy.

In SQLAlchemy 1.1 series and earlier, you can specify
exclude_setinputsizes=() to have STRING be part of the automatic
setinputsizes call.  In SQLAlchemy 1.2 these features were all removed
as there was never any reason to pass most datatypes to setinputsizes.
  in 1.2 you can still say
engine.dialect._include_setinputsizes.add(cx_Oracle.STRING) to re-add
it, but this should not be necessary (my test works with or without
it).

Also which version of Python you're using matters, however I get a
round trip under both pythons.

To round trip it, do this - note I had to expand the size of the
VARCHAR to fit your string, it was giving me a "data too large" error
before, so that might be a clue:

import os
os.environ["NLS_LANG"] = "German_Germany.UTF8"

import sqlalchemy as sa


metadata = sa.MetaData()

test_table = sa.Table("utf8_test", metadata, sa.Column("text", sa.Unicode(20)))

e = sa.create_engine("oracle+cx_oracle://scott:tiger@oracle1120/xe",
echo='debug')

metadata.drop_all(e)
metadata.create_all(e)

data = "\u7535\u8111"
insert = test_table.insert().values(text=data)
with e.connect() as connection:
    connection.execute(insert)

    sqla_result = connection.scalar(sa.select([test_table.c.text]))

    assert sqla_result == data

please try that out and see if you aren't getting a true result.
note also that the table is being dropped and created within the
script itself.

when I run in Python 3 I even see the correct characters displayed:

2018-01-12 10:21:37,658 INFO sqlalchemy.engine.base.Engine INSERT INTO
utf8_test (text) VALUES (:text)
2018-01-12 10:21:37,658 INFO sqlalchemy.engine.base.Engine {'text': '电脑'}
2018-01-12 10:21:37,663 INFO sqlalchemy.engine.base.Engine COMMIT
2018-01-12 10:21:37,668 INFO sqlalchemy.engine.base.Engine SELECT utf8_test.text
FROM utf8_test
2018-01-12 10:21:37,668 INFO sqlalchemy.engine.base.Engine {}
2018-01-12 10:21:37,672 DEBUG sqlalchemy.engine.base.Engine Col ('TEXT',)
2018-01-12 10:21:37,672 DEBUG sqlalchemy.engine.base.Engine Row ('电脑',)

The next step would be to raise this on cx_Oracle's tracker
(https://github.com/oracle/python-cx_Oracle/issues) to understand why
your setup needs setinputsizes to be called and mine doesn't (and has
not for years).





>
> The setup:
>
> - SQLAlchemy 1.1.15 and 1.2
> - Oracle database 11.2
> - cx_Oracle 6.1
> - environment variable set from the shell ( export
> NLS_LANG="German_Germany.UTF8" ) and in the code
>
> Database table:
>
> CREATE TABLE UTF8_TEST (
>   TEXT NVARCHAR2(10)
> )
>
>
> 1) SQLAlchemy, version 1.1.15 and 1.2
>
> import os
> os.environ["NLS_LANG"] = "German_Germany.UTF8"
>
> import sqlalchemy as sa
>
>
> metadata = sa.MetaData()
>
> test_table = sa.Table("utf8_test", metadata, sa.Column("text",
> sa.Unicode(10)))
>
>
> def test(host, port, service_name, user, password):
>     connect_string =
> "oracle+cx_oracle://{}:{}@{}:{}/?service_name={}".format(user, password,
> host, port, service_name)
>     engine = sa.engine.create_engine(connect_string, encoding="utf8")
>     insert = test_table.insert().values(text="\u7535\u8111")
>     with engine.connect() as connection:
>         connection.execute(insert)
>
>
> if __name__ == "__main__":
>     db_config = ...
>     test(db_config.host, db_config.port, db_config.service_name,
> db_config.user, db_config.password)
>
> When this code is executed, it inserts something into the database, but it
> shows up as two upside-down question marks when viewed with Oracle SQL
> Developer. This doesn't seem to be an encoding/rendering problem in SQL
> Developer (see below for code snippet 3).
>
>
> 2) cx_Oracle, without `Cursor.setinputsizes`
>
> import os
> os.environ["NLS_LANG"] = "German_Germany.UTF8"
>
> import cx_Oracle
>
>
> def test(host, port, service_name, user, password):
>     dsn = cx_Oracle.makedsn(host, port, service_name=service_name)
>     conn = cx_Oracle.connect(user, password, dsn=dsn, encoding="utf8",
> nencoding="utf8")
>     cur = conn.cursor()
>     cur.execute("INSERT INTO UTF8_TEST (TEXT) VALUES (:t)", {"t":
> "\u7535\u8111"})
>     cur.close()
>     conn.commit()
>
>
> if __name__ == "__main__":
>     db_config = ...
>     test(db_config.host, db_config.port, db_config.service_name,
> db_config.user, db_config.password)
>
> This code has the same effect as the SQLAlchemy code above.
>
>
> 3) cx_Oracle with `Cursor.setinputsizes`
>
> import os
> os.environ["NLS_LANG"] = "German_Germany.UTF8"
>
> import cx_Oracle
>
>
> def test(host, port, service_name, user, password):
>     dsn = cx_Oracle.makedsn(host, port, service_name=service_name)
>     conn = cx_Oracle.connect(user, password, dsn=dsn, encoding="utf8",
> nencoding="utf8")
>     cur = conn.cursor()
>     # Using `setinputsizes` makes the difference.
>     cur.execute("SELECT TEXT FROM UTF8_TEST")
>     description = cur.description
>     # `t` is name of bind parameter below.
>     cur.setinputsizes(t=description[0][1])
>     cur.execute("INSERT INTO UTF8_TEST (TEXT) VALUES (:t)", {"t":
> "\u7535\u8111"})
>     cur.close()
>     conn.commit()
>
>
> if __name__ == "__main__":
>     db_config = ...
>     test(db_config.host, db_config.port, db_config.service_name,
> db_config.user, db_config.password)
>
> With the `setinputsizes` call, the Chinese characters appear in Oracle SQL
> Developer as they should.
>
>
> In case it matters, I tried to use some German special characters (the
> string "äß") and this works for all three variants. These two characters are
> encoded in UTF-8 to one byte each whereas the Chinese characters are encoded
> to several bytes each.
>
>
> Should the above code for SQLAlchemy work as-is and insert the Chinese
> charaters (I expected it)?
>
> Is there a workaround for now to have SQLAlchemy insert the correct
> characters? For example, is there a way to tell SQLAlchemy to use
> `Cursor.setinputsizes` for inserts? (I assume this would apply to updates,
> too.)
>
> Best regards,
> Stefan
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to