Re: [sqlalchemy] Re: pandas to MS SQL DataWarehouse (to_sql)

Mike Bayer Wed, 13 Sep 2017 11:44:44 -0700

On Wed, Sep 13, 2017 at 2:41 PM, dirk.biesinger
<dirk.biesin...@gmail.com> wrote:
> I don't get why the table is getting created in the first place. A table
> with this name exists, and the option "if_exists='append'" should append the
> dataframe to the existing table.
> There should not be a dropping of the table (which I have not seen) nor
> creation of the table.
>
> And in case of creating the table, I think it should be possible to define
> the length of the field, so varchar([variable_to_be_submitted]).
> In my case I expect this particular table to grow to several hundred million
> rows, so assigned storage space is a factor.
>
> the existing table was created like this:
>
> CREATE TABLE dbo.DSI
> (
> a datetime
> b varchar(10) null,
> c varchar(100) null,
> d varchar(10) null,
> e varchar(100) null,
> f decimal (8,6) null,
> g decimal (8,6) null
> )
>
> If I read and understand the error stack correct, the cause is the create
> table statement, which I would very strongly hope to cause an error, as the
> table exists.
> Should the create table statement not be omitted if the option
> "if_exists='append'" option is set?


This is all on the Pandas side so you'd need to talk to them.


>
> On Wednesday, September 13, 2017 at 11:15:27 AM UTC-7, Mike Bayer wrote:
>>
>> OK so....don't use VARCHAR(max)?  What datatype would you like?  We
>> have most of them and you can make new ones too.
>>
>> On Wed, Sep 13, 2017 at 1:07 PM, dirk.biesinger
>> <dirk.bi...@gmail.com> wrote:
>> > Mike,
>> >
>> > here's the error stack (I had to mask some details):
>> > The columns (dataformats) in the create table statement are wrong. Also,
>> > this table does not have an index.
>> >
>> > 2017-09-13 15:07:50,200 INFO sqlalchemy.engine.base.Engine SELECT
>> > SERVERPROPERTY('ProductVersion')
>> > 2017-09-13 15:07:50,202 INFO sqlalchemy.engine.base.Engine ()
>> > 2017-09-13 15:07:50,246 INFO sqlalchemy.engine.base.Engine SELECT
>> > schema_name()
>> > 2017-09-13 15:07:50,247 INFO sqlalchemy.engine.base.Engine ()
>> > 2017-09-13 15:07:50,522 INFO sqlalchemy.engine.base.Engine SELECT
>> > CAST('test
>> > plain returns' AS VARCHAR(60)) AS anon_1
>> > 2017-09-13 15:07:50,522 INFO sqlalchemy.engine.base.Engine ()
>> > 2017-09-13 15:07:50,562 INFO sqlalchemy.engine.base.Engine SELECT
>> > CAST('test
>> > unicode returns' AS NVARCHAR(60)) AS anon_1
>> > 2017-09-13 15:07:50,563 INFO sqlalchemy.engine.base.Engine ()
>> > 2017-09-13 15:07:50,604 INFO sqlalchemy.engine.base.Engine SELECT
>> > [INFORMATION_SCHEMA].[COLUMNS].[TABLE_SCHEMA],
>> > [INFORMATION_SCHEMA].[COLUMNS].[TABLE_NAME],
>> > [INFORMATION_SCHEMA].[COLUMNS].[COLUMN_NAME],
>> > [INFORMATION_SCHEMA].[COLUMNS].[IS_NULLABLE],
>> > [INFORMATION_SCHEMA].[COLUMNS].[DATA_TYPE],
>> > [INFORMATION_SCHEMA].[COLUMNS].[ORDINAL_POSITION],
>> > [INFORMATION_SCHEMA].[COLUMNS].[CHARACTER_MAXIMUM_LENGTH],
>> > [INFORMATION_SCHEMA].[COLUMNS].[NUMERIC_PRECISION],
>> > [INFORMATION_SCHEMA].[COLUMNS].[NUMERIC_SCALE],
>> > [INFORMATION_SCHEMA].[COLUMNS].[COLUMN_DEFAULT],
>> > [INFORMATION_SCHEMA].[COLUMNS].[COLLATION_NAME]
>> > FROM [INFORMATION_SCHEMA].[COLUMNS]
>> > WHERE [INFORMATION_SCHEMA].[COLUMNS].[TABLE_NAME] = CAST(? AS
>> > NVARCHAR(max))
>> > AND [INFORMATION_SCHEMA].[COLUMNS].[TABLE_SCHEMA] = CAST(? AS
>> > NVARCHAR(max))
>> > 2017-09-13 15:07:50,605 INFO sqlalchemy.engine.base.Engine
>> > ('dbo.MSODS_DSI',
>> > 'dbo')
>> > 2017-09-13 15:07:52,151 INFO sqlalchemy.engine.base.Engine
>> > CREATE TABLE [dbo.MSODS_DSI] (
>> > [a] DATETIME NULL,
>> > [b] VARCHAR(max) NULL,
>> > [c] VARCHAR(max) NULL,
>> > [d] VARCHAR(max) NULL,
>> > [e] VARCHAR(max) NULL,
>> > [f] FLOAT(53) NULL,
>> > [g] FLOAT(53) NULL
>> > )
>> >
>> >
>> > 2017-09-13 15:07:52,152 INFO sqlalchemy.engine.base.Engine ()
>> > 2017-09-13 15:07:54,374 INFO sqlalchemy.engine.base.Engine ROLLBACK
>> >
>> > ---------------------------------------------------------------------------
>> > ProgrammingError                          Traceback (most recent call
>> > last)
>> >
>> > /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py
>> > in _execute_context(self, dialect, constructor, statement, parameters,
>> > *args)
>> >    1181                         parameters,
>> > -> 1182                         context)
>> >    1183         except BaseException as e:
>> >
>> >
>> > /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/default.py
>> > in do_execute(self, cursor, statement, parameters, context)
>> >     469     def do_execute(self, cursor, statement, parameters,
>> > context=None):
>> > --> 470         cursor.execute(statement, parameters)
>> >     471
>> >
>> > ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Driver 13 for SQL
>> > Server][SQL Server]The statement failed. Column 'b' has a data type that
>> > cannot participate in a columnstore index.\r\nOperation cancelled by
>> > user.
>> > (35343) (SQLExecDirectW)")
>> >
>> > The above exception was the direct cause of the following exception:
>> >
>> > ProgrammingError                          Traceback (most recent call
>> > last)
>> > <ipython-input-16-290e9c1020c9> in <module>()
>> >      17 #cnxn = pyodbc.connect(connection_str)
>> >      18 #engn.connect()
>> > ---> 19 df.to_sql(tbl_server_out, engn, if_exists='append', index=False)
>> >
>> >
>> > /home/saravji/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py
>> > in to_sql(self, name, con, flavor, schema, if_exists, index,
>> > index_label,
>> > chunksize, dtype)
>> >    1343         sql.to_sql(self, name, con, flavor=flavor,
>> > schema=schema,
>> >    1344                    if_exists=if_exists, index=index,
>> > index_label=index_label,
>> > -> 1345                    chunksize=chunksize, dtype=dtype)
>> >    1346
>> >    1347     def to_pickle(self, path, compression='infer'):
>> >
>> > /home/saravji/anaconda3/lib/python3.6/site-packages/pandas/io/sql.py in
>> > to_sql(frame, name, con, flavor, schema, if_exists, index, index_label,
>> > chunksize, dtype)
>> >     469     pandas_sql.to_sql(frame, name, if_exists=if_exists,
>> > index=index,
>> >     470                       index_label=index_label, schema=schema,
>> > --> 471                       chunksize=chunksize, dtype=dtype)
>> >     472
>> >     473
>> >
>> > /home/saravji/anaconda3/lib/python3.6/site-packages/pandas/io/sql.py in
>> > to_sql(self, frame, name, if_exists, index, index_label, schema,
>> > chunksize,
>> > dtype)
>> >    1148                          if_exists=if_exists,
>> > index_label=index_label,
>> >    1149                          schema=schema, dtype=dtype)
>> > -> 1150         table.create()
>> >    1151         table.insert(chunksize)
>> >    1152         if (not name.isdigit() and not name.islower()):
>> >
>> > /home/saravji/anaconda3/lib/python3.6/site-packages/pandas/io/sql.py in
>> > create(self)
>> >     596                     "'{0}' is not valid for
>> > if_exists".format(self.if_exists))
>> >     597         else:
>> > --> 598             self._execute_create()
>> >     599
>> >     600     def insert_statement(self):
>> >
>> > /home/saravji/anaconda3/lib/python3.6/site-packages/pandas/io/sql.py in
>> > _execute_create(self)
>> >     581         # Inserting table into database, add to MetaData object
>> >     582         self.table = self.table.tometadata(self.pd_sql.meta)
>> > --> 583         self.table.create()
>> >     584
>> >     585     def create(self):
>> >
>> >
>> > /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/sql/schema.py
>> > in create(self, bind, checkfirst)
>> >     754         bind._run_visitor(ddl.SchemaGenerator,
>> >     755                           self,
>> > --> 756                           checkfirst=checkfirst)
>> >     757
>> >     758     def drop(self, bind=None, checkfirst=False):
>> >
>> >
>> > /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py
>> > in _run_visitor(self, visitorcallable, element, connection, **kwargs)
>> >    1927                      connection=None, **kwargs):
>> >    1928         with self._optional_conn_ctx_manager(connection) as
>> > conn:
>> > -> 1929             conn._run_visitor(visitorcallable, element,
>> > **kwargs)
>> >    1930
>> >    1931     class _trans_ctx(object):
>> >
>> >
>> > /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py
>> > in _run_visitor(self, visitorcallable, element, **kwargs)
>> >    1536     def _run_visitor(self, visitorcallable, element, **kwargs):
>> >    1537         visitorcallable(self.dialect, self,
>> > -> 1538                         **kwargs).traverse_single(element)
>> >    1539
>> >    1540
>> >
>> >
>> > /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/sql/visitors.py
>> > in traverse_single(self, obj, **kw)
>> >     119             meth = getattr(v, "visit_%s" % obj.__visit_name__,
>> > None)
>> >     120             if meth:
>> > --> 121                 return meth(obj, **kw)
>> >     122
>> >     123     def iterate(self, obj):
>> >
>> >
>> > /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/sql/ddl.py 
>> > in
>> > visit_table(self, table, create_ok, include_foreign_key_constraints,
>> > _is_metadata_operation)
>> >     765             CreateTable(
>> >     766                 table,
>> > --> 767
>> > include_foreign_key_constraints=include_foreign_key_constraints
>> >     768             ))
>> >     769
>> >
>> >
>> > /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py
>> > in execute(self, object, *multiparams, **params)
>> >     943             raise exc.ObjectNotExecutableError(object)
>> >     944         else:
>> > --> 945             return meth(self, multiparams, params)
>> >     946
>> >     947     def _execute_function(self, func, multiparams, params):
>> >
>> >
>> > /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/sql/ddl.py 
>> > in
>> > _execute_on_connection(self, connection, multiparams, params)
>> >      66
>> >      67     def _execute_on_connection(self, connection, multiparams,
>> > params):
>> > ---> 68         return connection._execute_ddl(self, multiparams,
>> > params)
>> >      69
>> >      70     def execute(self, bind=None, target=None):
>> >
>> >
>> > /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py
>> > in _execute_ddl(self, ddl, multiparams, params)
>> >    1000             compiled,
>> >    1001             None,
>> > -> 1002             compiled
>> >    1003         )
>> >    1004         if self._has_events or self.engine._has_events:
>> >
>> >
>> > /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py
>> > in _execute_context(self, dialect, constructor, statement, parameters,
>> > *args)
>> >    1187                 parameters,
>> >    1188                 cursor,
>> > -> 1189                 context)
>> >    1190
>> >    1191         if self._has_events or self.engine._has_events:
>> >
>> >
>> > /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py
>> > in _handle_dbapi_exception(self, e, statement, parameters, cursor,
>> > context)
>> >    1400                 util.raise_from_cause(
>> >    1401                     sqlalchemy_exception,
>> > -> 1402                     exc_info
>> >    1403                 )
>> >    1404             else:
>> >
>> >
>> > /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/util/compat.py
>> > in raise_from_cause(exception, exc_info)
>> >     201     exc_type, exc_value, exc_tb = exc_info
>> >     202     cause = exc_value if exc_value is not exception else None
>> > --> 203     reraise(type(exception), exception, tb=exc_tb, cause=cause)
>> >     204
>> >     205 if py3k:
>> >
>> >
>> > /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/util/compat.py
>> > in reraise(tp, value, tb, cause)
>> >     184             value.__cause__ = cause
>> >     185         if value.__traceback__ is not tb:
>> > --> 186             raise value.with_traceback(tb)
>> >     187         raise value
>> >     188
>> >
>> >
>> > /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py
>> > in _execute_context(self, dialect, constructor, statement, parameters,
>> > *args)
>> >    1180                         statement,
>> >    1181                         parameters,
>> > -> 1182                         context)
>> >    1183         except BaseException as e:
>> >    1184             self._handle_dbapi_exception(
>> >
>> >
>> > /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/default.py
>> > in do_execute(self, cursor, statement, parameters, context)
>> >     468
>> >     469     def do_execute(self, cursor, statement, parameters,
>> > context=None):
>> > --> 470         cursor.execute(statement, parameters)
>> >     471
>> >     472     def do_execute_no_params(self, cursor, statement,
>> > context=None):
>> >
>> > ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000]
>> > [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]The statement
>> > failed.
>> > Column 'b' has a data type that cannot participate in a columnstore
>> > index.\r\nOperation cancelled by user. (35343) (SQLExecDirectW)") [SQL:
>> > '\nCREATE TABLE [dbo.MSODS_DSI] (\n\t[a] DATETIME NULL, \n\t[b]
>> > VARCHAR(max)
>> > NULL, \n\t[c] VARCHAR(max) NULL, \n\t[d] VARCHAR(max) NULL, \n\t[e]
>> > VARCHAR(max) NULL, \n\t[f] FLOAT(53) NULL, \n\t[g] FLOAT(53)
>> > NULL\n)\n\n']
>> >
>> >
>> >
>> > On Monday, September 11, 2017 at 3:34:47 PM UTC-7, dirk.biesinger wrote:
>> >>
>> >> I am encountering errors when trying to use the pd.to_sql function to
>> >> write a dataframe to MS SQL Data Warehouse.
>> >> The connection works when NOT using sqlalchemy engines.
>> >> I can read dataframes as well as row-by-row via select statements when
>> >> I
>> >> use pyodbc connections
>> >> I can write data via insert statements (as well as delete data) when
>> >> using
>> >> pyodbc.
>> >> However, when I try to connect using a sqlalchemy engine I run into a
>> >> string of error messages starting with:
>> >>
>> >> ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000]
>> >> [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Catalog view
>> >> 'dm_exec_sessions' is not supported in this version. (104385)
>> >> (SQLExecDirectW)")
>> >>
>> >>
>> >> I have searched online, and this exact error seems to have been
>> >> reported /
>> >> evaluated in May of this year as issue #3994:
>> >>
>> >>
>> >>
>> >>
>> >> https://bitbucket.org/zzzeek/sqlalchemy/issues/3994/azure-sql-datawarehouse-basic
>> >>
>> >>
>> >> I could not find a solution to this, and I'd really dislike to do a
>> >> line-wise or blob insert statement (I'm working with multiple datasets
>> >> that
>> >> each has a few million rows, so execution time is a consideration,
>> >> although
>> >> the result sets I'm getting are more like in the 100k lines area each.)
>> >>
>> >>
>> >> I get the same error messages even when I replace the pd.to_sql command
>> >> with a simple engine.connect()
>> >>
>> >>
>> >> Enclosed my installed packages (packages.list)
>> >>
>> >> Enclosed the full traceback (traceback.txt)
>> >>
>> >>
>> >> This is the code I'm using:
>> >>
>> >> connection_string =
>> >>
>> >> "mssql+pyodbc://<username>:<password>@<sqlhost>.database.windows.net:<port>/<database>?driver=ODBC+Driver+13+for+SQL+Server"
>> >> engn = sqlalchemy.engine.create_engine(connection_string, echo=True)
>> >> engn.connect()
>> >>
>> >>
>> >> I'm very well aware that MS SQL DataWarehouse behaves a bit different,
>> >> so
>> >> I'm open for some experimenting to get this issue narrowed down.
>> >>
>> >> In case it matters: I'm running an ubuntu 16.04 VM on azure with
>> >> jupyter
>> >> notebook server and python 3.6.1.
>> >>
>> >> Best,
>> >>
>> >> DB
>> >
>> > --
>> > SQLAlchemy -
>> > The Python SQL Toolkit and Object Relational Mapper
>> >
>> > http://www.sqlalchemy.org/
>> >
>> > To post example code, please provide an MCVE: Minimal, Complete, and
>> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>> > description.
>> > ---
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "sqlalchemy" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an
>> > email to sqlalchemy+...@googlegroups.com.
>> > To post to this group, send email to sqlal...@googlegroups.com.
>> > Visit this group at https://groups.google.com/group/sqlalchemy.
>> > For more options, visit https://groups.google.com/d/optout.
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Re: pandas to MS SQL DataWarehouse (to_sql)

Reply via email to