Re: [sqlalchemy] Re: pandas to MS SQL DataWarehouse (to_sql)

dirk.biesinger Wed, 13 Sep 2017 12:14:24 -0700

Got ya,

so we could solve the issue on the sqlalchemy end with the alteration of 
the pyodbc.py file.
I assume you'll include this in the next release?
The issue with creating a table when the option "if_exists='append'" is set 
in the df.to_sql() call, is a pandas problem.


Thank you for your help.

Best,
DB

On Wednesday, September 13, 2017 at 11:45:15 AM UTC-7, Mike Bayer wrote:
>
> On Wed, Sep 13, 2017 at 2:41 PM, dirk.biesinger 
> <dirk.bi...@gmail.com <javascript:>> wrote: 
> > I don't get why the table is getting created in the first place. A table 
> > with this name exists, and the option "if_exists='append'" should append 
> the 
> > dataframe to the existing table. 
> > There should not be a dropping of the table (which I have not seen) nor 
> > creation of the table. 
> > 
> > And in case of creating the table, I think it should be possible to 
> define 
> > the length of the field, so varchar([variable_to_be_submitted]). 
> > In my case I expect this particular table to grow to several hundred 
> million 
> > rows, so assigned storage space is a factor. 
> > 
> > the existing table was created like this: 
> > 
> > CREATE TABLE dbo.DSI 
> > ( 
> > a datetime 
> > b varchar(10) null, 
> > c varchar(100) null, 
> > d varchar(10) null, 
> > e varchar(100) null, 
> > f decimal (8,6) null, 
> > g decimal (8,6) null 
> > ) 
> > 
> > If I read and understand the error stack correct, the cause is the 
> create 
> > table statement, which I would very strongly hope to cause an error, as 
> the 
> > table exists. 
> > Should the create table statement not be omitted if the option 
> > "if_exists='append'" option is set? 
>
> This is all on the Pandas side so you'd need to talk to them. 
>
>
> > 
> > On Wednesday, September 13, 2017 at 11:15:27 AM UTC-7, Mike Bayer wrote: 
> >> 
> >> OK so....don't use VARCHAR(max)?  What datatype would you like?  We 
> >> have most of them and you can make new ones too. 
> >> 
> >> On Wed, Sep 13, 2017 at 1:07 PM, dirk.biesinger 
> >> <dirk.bi...@gmail.com> wrote: 
> >> > Mike, 
> >> > 
> >> > here's the error stack (I had to mask some details): 
> >> > The columns (dataformats) in the create table statement are wrong. 
> Also, 
> >> > this table does not have an index. 
> >> > 
> >> > 2017-09-13 15:07:50,200 INFO sqlalchemy.engine.base.Engine SELECT 
> >> > SERVERPROPERTY('ProductVersion') 
> >> > 2017-09-13 15:07:50,202 INFO sqlalchemy.engine.base.Engine () 
> >> > 2017-09-13 15:07:50,246 INFO sqlalchemy.engine.base.Engine SELECT 
> >> > schema_name() 
> >> > 2017-09-13 15:07:50,247 INFO sqlalchemy.engine.base.Engine () 
> >> > 2017-09-13 15:07:50,522 INFO sqlalchemy.engine.base.Engine SELECT 
> >> > CAST('test 
> >> > plain returns' AS VARCHAR(60)) AS anon_1 
> >> > 2017-09-13 15:07:50,522 INFO sqlalchemy.engine.base.Engine () 
> >> > 2017-09-13 15:07:50,562 INFO sqlalchemy.engine.base.Engine SELECT 
> >> > CAST('test 
> >> > unicode returns' AS NVARCHAR(60)) AS anon_1 
> >> > 2017-09-13 15:07:50,563 INFO sqlalchemy.engine.base.Engine () 
> >> > 2017-09-13 15:07:50,604 INFO sqlalchemy.engine.base.Engine SELECT 
> >> > [INFORMATION_SCHEMA].[COLUMNS].[TABLE_SCHEMA], 
> >> > [INFORMATION_SCHEMA].[COLUMNS].[TABLE_NAME], 
> >> > [INFORMATION_SCHEMA].[COLUMNS].[COLUMN_NAME], 
> >> > [INFORMATION_SCHEMA].[COLUMNS].[IS_NULLABLE], 
> >> > [INFORMATION_SCHEMA].[COLUMNS].[DATA_TYPE], 
> >> > [INFORMATION_SCHEMA].[COLUMNS].[ORDINAL_POSITION], 
> >> > [INFORMATION_SCHEMA].[COLUMNS].[CHARACTER_MAXIMUM_LENGTH], 
> >> > [INFORMATION_SCHEMA].[COLUMNS].[NUMERIC_PRECISION], 
> >> > [INFORMATION_SCHEMA].[COLUMNS].[NUMERIC_SCALE], 
> >> > [INFORMATION_SCHEMA].[COLUMNS].[COLUMN_DEFAULT], 
> >> > [INFORMATION_SCHEMA].[COLUMNS].[COLLATION_NAME] 
> >> > FROM [INFORMATION_SCHEMA].[COLUMNS] 
> >> > WHERE [INFORMATION_SCHEMA].[COLUMNS].[TABLE_NAME] = CAST(? AS 
> >> > NVARCHAR(max)) 
> >> > AND [INFORMATION_SCHEMA].[COLUMNS].[TABLE_SCHEMA] = CAST(? AS 
> >> > NVARCHAR(max)) 
> >> > 2017-09-13 15:07:50,605 INFO sqlalchemy.engine.base.Engine 
> >> > ('dbo.MSODS_DSI', 
> >> > 'dbo') 
> >> > 2017-09-13 15:07:52,151 INFO sqlalchemy.engine.base.Engine 
> >> > CREATE TABLE [dbo.MSODS_DSI] ( 
> >> > [a] DATETIME NULL, 
> >> > [b] VARCHAR(max) NULL, 
> >> > [c] VARCHAR(max) NULL, 
> >> > [d] VARCHAR(max) NULL, 
> >> > [e] VARCHAR(max) NULL, 
> >> > [f] FLOAT(53) NULL, 
> >> > [g] FLOAT(53) NULL 
> >> > ) 
> >> > 
> >> > 
> >> > 2017-09-13 15:07:52,152 INFO sqlalchemy.engine.base.Engine () 
> >> > 2017-09-13 15:07:54,374 INFO sqlalchemy.engine.base.Engine ROLLBACK 
> >> > 
> >> > 
> --------------------------------------------------------------------------- 
> >> > ProgrammingError                          Traceback (most recent call 
> >> > last) 
> >> > 
> >> > 
> /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py 
>
> >> > in _execute_context(self, dialect, constructor, statement, 
> parameters, 
> >> > *args) 
> >> >    1181                         parameters, 
> >> > -> 1182                         context) 
> >> >    1183         except BaseException as e: 
> >> > 
> >> > 
> >> > 
> /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/default.py
>  
>
> >> > in do_execute(self, cursor, statement, parameters, context) 
> >> >     469     def do_execute(self, cursor, statement, parameters, 
> >> > context=None): 
> >> > --> 470         cursor.execute(statement, parameters) 
> >> >     471 
> >> > 
> >> > ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Driver 13 for 
> SQL 
> >> > Server][SQL Server]The statement failed. Column 'b' has a data type 
> that 
> >> > cannot participate in a columnstore index.\r\nOperation cancelled by 
> >> > user. 
> >> > (35343) (SQLExecDirectW)") 
> >> > 
> >> > The above exception was the direct cause of the following exception: 
> >> > 
> >> > ProgrammingError                          Traceback (most recent call 
> >> > last) 
> >> > <ipython-input-16-290e9c1020c9> in <module>() 
> >> >      17 #cnxn = pyodbc.connect(connection_str) 
> >> >      18 #engn.connect() 
> >> > ---> 19 df.to_sql(tbl_server_out, engn, if_exists='append', 
> index=False) 
> >> > 
> >> > 
> >> > 
> /home/saravji/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py 
> >> > in to_sql(self, name, con, flavor, schema, if_exists, index, 
> >> > index_label, 
> >> > chunksize, dtype) 
> >> >    1343         sql.to_sql(self, name, con, flavor=flavor, 
> >> > schema=schema, 
> >> >    1344                    if_exists=if_exists, index=index, 
> >> > index_label=index_label, 
> >> > -> 1345                    chunksize=chunksize, dtype=dtype) 
> >> >    1346 
> >> >    1347     def to_pickle(self, path, compression='infer'): 
> >> > 
> >> > /home/saravji/anaconda3/lib/python3.6/site-packages/pandas/io/sql.py 
> in 
> >> > to_sql(frame, name, con, flavor, schema, if_exists, index, 
> index_label, 
> >> > chunksize, dtype) 
> >> >     469     pandas_sql.to_sql(frame, name, if_exists=if_exists, 
> >> > index=index, 
> >> >     470                       index_label=index_label, schema=schema, 
> >> > --> 471                       chunksize=chunksize, dtype=dtype) 
> >> >     472 
> >> >     473 
> >> > 
> >> > /home/saravji/anaconda3/lib/python3.6/site-packages/pandas/io/sql.py 
> in 
> >> > to_sql(self, frame, name, if_exists, index, index_label, schema, 
> >> > chunksize, 
> >> > dtype) 
> >> >    1148                          if_exists=if_exists, 
> >> > index_label=index_label, 
> >> >    1149                          schema=schema, dtype=dtype) 
> >> > -> 1150         table.create() 
> >> >    1151         table.insert(chunksize) 
> >> >    1152         if (not name.isdigit() and not name.islower()): 
> >> > 
> >> > /home/saravji/anaconda3/lib/python3.6/site-packages/pandas/io/sql.py 
> in 
> >> > create(self) 
> >> >     596                     "'{0}' is not valid for 
> >> > if_exists".format(self.if_exists)) 
> >> >     597         else: 
> >> > --> 598             self._execute_create() 
> >> >     599 
> >> >     600     def insert_statement(self): 
> >> > 
> >> > /home/saravji/anaconda3/lib/python3.6/site-packages/pandas/io/sql.py 
> in 
> >> > _execute_create(self) 
> >> >     581         # Inserting table into database, add to MetaData 
> object 
> >> >     582         self.table = self.table.tometadata(self.pd_sql.meta) 
> >> > --> 583         self.table.create() 
> >> >     584 
> >> >     585     def create(self): 
> >> > 
> >> > 
> >> > 
> /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/sql/schema.py 
>
> >> > in create(self, bind, checkfirst) 
> >> >     754         bind._run_visitor(ddl.SchemaGenerator, 
> >> >     755                           self, 
> >> > --> 756                           checkfirst=checkfirst) 
> >> >     757 
> >> >     758     def drop(self, bind=None, checkfirst=False): 
> >> > 
> >> > 
> >> > 
> /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py 
>
> >> > in _run_visitor(self, visitorcallable, element, connection, **kwargs) 
> >> >    1927                      connection=None, **kwargs): 
> >> >    1928         with self._optional_conn_ctx_manager(connection) as 
> >> > conn: 
> >> > -> 1929             conn._run_visitor(visitorcallable, element, 
> >> > **kwargs) 
> >> >    1930 
> >> >    1931     class _trans_ctx(object): 
> >> > 
> >> > 
> >> > 
> /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py 
>
> >> > in _run_visitor(self, visitorcallable, element, **kwargs) 
> >> >    1536     def _run_visitor(self, visitorcallable, element, 
> **kwargs): 
> >> >    1537         visitorcallable(self.dialect, self, 
> >> > -> 1538                         **kwargs).traverse_single(element) 
> >> >    1539 
> >> >    1540 
> >> > 
> >> > 
> >> > 
> /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/sql/visitors.py
>  
>
> >> > in traverse_single(self, obj, **kw) 
> >> >     119             meth = getattr(v, "visit_%s" % 
> obj.__visit_name__, 
> >> > None) 
> >> >     120             if meth: 
> >> > --> 121                 return meth(obj, **kw) 
> >> >     122 
> >> >     123     def iterate(self, obj): 
> >> > 
> >> > 
> >> > 
> /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/sql/ddl.py 
> in 
> >> > visit_table(self, table, create_ok, include_foreign_key_constraints, 
> >> > _is_metadata_operation) 
> >> >     765             CreateTable( 
> >> >     766                 table, 
> >> > --> 767 
> >> > include_foreign_key_constraints=include_foreign_key_constraints 
> >> >     768             )) 
> >> >     769 
> >> > 
> >> > 
> >> > 
> /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py 
>
> >> > in execute(self, object, *multiparams, **params) 
> >> >     943             raise exc.ObjectNotExecutableError(object) 
> >> >     944         else: 
> >> > --> 945             return meth(self, multiparams, params) 
> >> >     946 
> >> >     947     def _execute_function(self, func, multiparams, params): 
> >> > 
> >> > 
> >> > 
> /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/sql/ddl.py 
> in 
> >> > _execute_on_connection(self, connection, multiparams, params) 
> >> >      66 
> >> >      67     def _execute_on_connection(self, connection, multiparams, 
> >> > params): 
> >> > ---> 68         return connection._execute_ddl(self, multiparams, 
> >> > params) 
> >> >      69 
> >> >      70     def execute(self, bind=None, target=None): 
> >> > 
> >> > 
> >> > 
> /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py 
>
> >> > in _execute_ddl(self, ddl, multiparams, params) 
> >> >    1000             compiled, 
> >> >    1001             None, 
> >> > -> 1002             compiled 
> >> >    1003         ) 
> >> >    1004         if self._has_events or self.engine._has_events: 
> >> > 
> >> > 
> >> > 
> /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py 
>
> >> > in _execute_context(self, dialect, constructor, statement, 
> parameters, 
> >> > *args) 
> >> >    1187                 parameters, 
> >> >    1188                 cursor, 
> >> > -> 1189                 context) 
> >> >    1190 
> >> >    1191         if self._has_events or self.engine._has_events: 
> >> > 
> >> > 
> >> > 
> /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py 
>
> >> > in _handle_dbapi_exception(self, e, statement, parameters, cursor, 
> >> > context) 
> >> >    1400                 util.raise_from_cause( 
> >> >    1401                     sqlalchemy_exception, 
> >> > -> 1402                     exc_info 
> >> >    1403                 ) 
> >> >    1404             else: 
> >> > 
> >> > 
> >> > 
> /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/util/compat.py 
>
> >> > in raise_from_cause(exception, exc_info) 
> >> >     201     exc_type, exc_value, exc_tb = exc_info 
> >> >     202     cause = exc_value if exc_value is not exception else None 
> >> > --> 203     reraise(type(exception), exception, tb=exc_tb, 
> cause=cause) 
> >> >     204 
> >> >     205 if py3k: 
> >> > 
> >> > 
> >> > 
> /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/util/compat.py 
>
> >> > in reraise(tp, value, tb, cause) 
> >> >     184             value.__cause__ = cause 
> >> >     185         if value.__traceback__ is not tb: 
> >> > --> 186             raise value.with_traceback(tb) 
> >> >     187         raise value 
> >> >     188 
> >> > 
> >> > 
> >> > 
> /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/base.py 
>
> >> > in _execute_context(self, dialect, constructor, statement, 
> parameters, 
> >> > *args) 
> >> >    1180                         statement, 
> >> >    1181                         parameters, 
> >> > -> 1182                         context) 
> >> >    1183         except BaseException as e: 
> >> >    1184             self._handle_dbapi_exception( 
> >> > 
> >> > 
> >> > 
> /home/saravji/anaconda3/lib/python3.6/site-packages/sqlalchemy/engine/default.py
>  
>
> >> > in do_execute(self, cursor, statement, parameters, context) 
> >> >     468 
> >> >     469     def do_execute(self, cursor, statement, parameters, 
> >> > context=None): 
> >> > --> 470         cursor.execute(statement, parameters) 
> >> >     471 
> >> >     472     def do_execute_no_params(self, cursor, statement, 
> >> > context=None): 
> >> > 
> >> > ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] 
> >> > [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]The statement 
> >> > failed. 
> >> > Column 'b' has a data type that cannot participate in a columnstore 
> >> > index.\r\nOperation cancelled by user. (35343) (SQLExecDirectW)") 
> [SQL: 
> >> > '\nCREATE TABLE [dbo.MSODS_DSI] (\n\t[a] DATETIME NULL, \n\t[b] 
> >> > VARCHAR(max) 
> >> > NULL, \n\t[c] VARCHAR(max) NULL, \n\t[d] VARCHAR(max) NULL, \n\t[e] 
> >> > VARCHAR(max) NULL, \n\t[f] FLOAT(53) NULL, \n\t[g] FLOAT(53) 
> >> > NULL\n)\n\n'] 
> >> > 
> >> > 
> >> > 
> >> > On Monday, September 11, 2017 at 3:34:47 PM UTC-7, dirk.biesinger 
> wrote: 
> >> >> 
> >> >> I am encountering errors when trying to use the pd.to_sql function 
> to 
> >> >> write a dataframe to MS SQL Data Warehouse. 
> >> >> The connection works when NOT using sqlalchemy engines. 
> >> >> I can read dataframes as well as row-by-row via select statements 
> when 
> >> >> I 
> >> >> use pyodbc connections 
> >> >> I can write data via insert statements (as well as delete data) when 
> >> >> using 
> >> >> pyodbc. 
> >> >> However, when I try to connect using a sqlalchemy engine I run into 
> a 
> >> >> string of error messages starting with: 
> >> >> 
> >> >> ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] 
> >> >> [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Catalog view 
> >> >> 'dm_exec_sessions' is not supported in this version. (104385) 
> >> >> (SQLExecDirectW)") 
> >> >> 
> >> >> 
> >> >> I have searched online, and this exact error seems to have been 
> >> >> reported / 
> >> >> evaluated in May of this year as issue #3994: 
> >> >> 
> >> >> 
> >> >> 
> >> >> 
> >> >> 
> https://bitbucket.org/zzzeek/sqlalchemy/issues/3994/azure-sql-datawarehouse-basic
>  
> >> >> 
> >> >> 
> >> >> I could not find a solution to this, and I'd really dislike to do a 
> >> >> line-wise or blob insert statement (I'm working with multiple 
> datasets 
> >> >> that 
> >> >> each has a few million rows, so execution time is a consideration, 
> >> >> although 
> >> >> the result sets I'm getting are more like in the 100k lines area 
> each.) 
> >> >> 
> >> >> 
> >> >> I get the same error messages even when I replace the pd.to_sql 
> command 
> >> >> with a simple engine.connect() 
> >> >> 
> >> >> 
> >> >> Enclosed my installed packages (packages.list) 
> >> >> 
> >> >> Enclosed the full traceback (traceback.txt) 
> >> >> 
> >> >> 
> >> >> This is the code I'm using: 
> >> >> 
> >> >> connection_string = 
> >> >> 
> >> >> "mssql+pyodbc://<username>:<password>@<sqlhost>.database.windows.net:<port>/<database>?driver=ODBC+Driver+13+for+SQL+Server"
> >> >>  
>
> >> >> engn = sqlalchemy.engine.create_engine(connection_string, echo=True) 
> >> >> engn.connect() 
> >> >> 
> >> >> 
> >> >> I'm very well aware that MS SQL DataWarehouse behaves a bit 
> different, 
> >> >> so 
> >> >> I'm open for some experimenting to get this issue narrowed down. 
> >> >> 
> >> >> In case it matters: I'm running an ubuntu 16.04 VM on azure with 
> >> >> jupyter 
> >> >> notebook server and python 3.6.1. 
> >> >> 
> >> >> Best, 
> >> >> 
> >> >> DB 
> >> > 
> >> > -- 
> >> > SQLAlchemy - 
> >> > The Python SQL Toolkit and Object Relational Mapper 
> >> > 
> >> > http://www.sqlalchemy.org/ 
> >> > 
> >> > To post example code, please provide an MCVE: Minimal, Complete, and 
> >> > Verifiable Example. See http://stackoverflow.com/help/mcve for a 
> full 
> >> > description. 
> >> > --- 
> >> > You received this message because you are subscribed to the Google 
> >> > Groups 
> >> > "sqlalchemy" group. 
> >> > To unsubscribe from this group and stop receiving emails from it, 
> send 
> >> > an 
> >> > email to sqlalchemy+...@googlegroups.com. 
> >> > To post to this group, send email to sqlal...@googlegroups.com. 
> >> > Visit this group at https://groups.google.com/group/sqlalchemy. 
> >> > For more options, visit https://groups.google.com/d/optout. 
> > 
> > -- 
> > SQLAlchemy - 
> > The Python SQL Toolkit and Object Relational Mapper 
> > 
> > http://www.sqlalchemy.org/ 
> > 
> > To post example code, please provide an MCVE: Minimal, Complete, and 
> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> > description. 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "sqlalchemy" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to sqlalchemy+...@googlegroups.com <javascript:>. 
> > To post to this group, send email to sqlal...@googlegroups.com 
> <javascript:>. 
> > Visit this group at https://groups.google.com/group/sqlalchemy. 
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Re: pandas to MS SQL DataWarehouse (to_sql)

Reply via email to