On Mon, Aug 5, 2019, at 3:19 PM, Abdeali Kothari wrote:
> Hm, creating new tables with a new metadata to do this may be the approach to 
> take... 
> 
> Is there any issue with metadata.remove(table) ?

metadata.remove(table) is not well tested so while it does appear to try to 
repair foreign key constraints that refer to this table, it doesn't cover 
anything else that refers to the table. This means Core SELECT and other 
statements, ORM models, etc which refer to this table will potentially move 
into a corrupted state if the Table is replaced , so you need to make sure 
everything that refers to this Table is discarded once you remove it.


> I was hoping that using this I can just drop the table I'm using and create a 
> new one instead of recreating all the tables. 
> 
> We want to continue using hive, but we are considering using pyhive which has 
> the hive dialect for sqlalchemy. 
> 
> On Mon, Aug 5, 2019, 19:17 Mike Bayer <mike...@zzzcomputing.com> wrote:
>> __
>> 
>> 
>> On Sun, Aug 4, 2019, at 11:32 PM, Abdeali Kothari wrote:
>>> Thanks for the quick revert. I saw sqlalchemy-migration and have used 
>>> alembic a lot. Thanks for maintaining them, they are very useful!
>>> 
>>> Let me try describing my usecase better. 
>>> 
>>> I have a bunch of legacy scripts in pure SQL for hive for ETL tasks.
>>> Recently, this code has become very chaotic and large and I'm trying to 
>>> figure out a better way to implement it. And I've had a good experience 
>>> with sqlalchemy in MySQL and sqlite, so thought I should try using 
>>> sqlalchemy.
>>> 
>>> Primary benefits:
>>>  - Having the ability to change the back-end of the DB was a bonus - I. E. 
>>> If the data is small, use sqlite. If the data is large use Hive 
>>>  - The ability to have and introspect the schema at any point - by plotting 
>>> the ERD
>>>  - Ability to add foreign key relations that hive doesn't have native 
>>> support for, and creating relationships to simplify joins and so on (later) 
>>> 
>>> Now, as this is ETL we have a core set of tables initially and then we make 
>>> a bunch of temp tables, joins, group bys, etc. And there are newer tables 
>>> created on the fly - as most ETL operations go. 
>> 
>> so....you are saying that you are given some existing SQL scripts, that 
>> include CREATE TABLE statements, INSERT staetments, etc., and then as these 
>> scripts go on, the tables are also being altered? I wouldn't call that "ETL" 
>> because the "E" and "T" is already done, this is a "load" script. 
>> 
>> 
>>> 
>>> Currently I'm trying to mimic the Hive code in sqlalchemy tables as a first 
>>> step. 
>>> 
>>> As part of the Hive workloads, we do a bunch of map jobs to create new 
>>> columns and drop columns. 
>>> I'm not trying to get SQLAlchemy to create the SQL (yet) but I do want the 
>>> overall schema to be in sqlalchemy tables do I can move towards that. 
>>> 
>>> So, if I drop a column in hive, what's the best way I can make the 
>>> sqlalchemy tables reflect that change?
>> 
>> You probably want to be creating Table metadata ad-hoc for each step in 
>> which you are not doing DDL, which is not unlike how Alembic migration 
>> scripts work. 
>> 
>> Overall it seems like you're running a database migration of some kind. I'm 
>> not really sure why, if you have a SQL script that does the job, there's a 
>> need to rewrite that into a SQLAlchemy style format. It seems like when you 
>> say, "this code has become very chaotic and large", something is still 
>> producing *new* SQL scripts in this format, which implies there is data in 
>> some other way elsewhere that's not in a schema at all. Usually with ETL, 
>> this is the data you work with, you establish a single schema that will 
>> accommodate the data, then you generate the schema all at once, then you do 
>> the actual E-T-L part with the data.
>> 
>> It looks like Hive is driving this process. Looking at the website it looks 
>> like it's mostly like a SQL database, but I guess the line "Structure can be 
>> projected onto data already in storage." is the crazy part. So I guess it's 
>> a schemaless system that reads a big set of data and as it (magically?) 
>> determines new data fields, adds them into some kind of loose schema.
>> 
>> That's a nice set of features and the point at which I'd use SQLAlchemy with 
>> such a system is after the data has been put into it entirely If I weren't 
>> using Hive, then I wouldn't do things that way.
>> 
>> It's not clear if you're seeking to migrate off of Hive or to build a 
>> complementary process. 
>> 
>> 
>>> 
>>> 
>>> On Mon, Aug 5, 2019, 07:44 Mike Bayer <mike...@zzzcomputing.com> wrote:
>>>> __
>>>> 
>>>> 
>>>> On Sun, Aug 4, 2019, at 1:14 PM, Abdeali Kothari wrote:
>>>>> I am using the sqlalchemy Table class to load up a data schema I am 
>>>>> provided.
>>>>> 
>>>>> Once loaded into sqlalchemy, I want to use some of the existing 
>>>>> extensions like:
>>>>>  - ERD plots - eralchemy
>>>>>  - Auto finding relationships using automap
>>>>> etc.
>>>>> 
>>>>> I am not using any actual DB connection under the table though - i.e. not 
>>>>> really going to be generating SQL statements (as of now)
>>>>> 
>>>>> My schema comes from different operations that are fed into the system - 
>>>>> like create table, add column, rename column, delete column.
>>>>> While I was trying to mimic these operations in sqlalchemy, I realized 
>>>>> that I need to:
>>>>> 1. Create a MetaData
>>>>> 2. Create the table with a name and add columns to the arguments
>>>>> 3. To add a new column - I can use the extend_existing kwarg in 
>>>>> sqlalchemy.Table
>>>>> 
>>>>> But I could not find a way to "*drop*" or "remove" a column I have added.
>>>>> 
>>>>> *To add some code examples*:
>>>>> >>> import sqlalchemy as sqla
>>>>> >>> mytable = sqla.Table("mytable", sqla.MetaData(), sqla.Column('col1', 
>>>>> >>> sqla.String()))
>>>>> >>> mytable
>>>>> Table('mytable',
>>>>>  MetaData(bind=None),
>>>>>  Column('col1', String(), table=<mytable>), schema=None)
>>>>> 
>>>>> >>> # Mutating the table to add a column:
>>>>> >>> sqla.Table("mytable", mytable.metadata, sqla.Column('col2', 
>>>>> >>> sqla.String()), extend_existing=True)
>>>>> Table('mytable',
>>>>>  MetaData(bind=None),
>>>>>  Column('col1', String(), table=<mytable>),
>>>>>  Column('col2', String(), table=<mytable>), schema=None)
>>>>> 
>>>>> >>> # Mutating the table to drop a column:
>>>>> >>> # What do I do here ?
>>>>> Table('mytable',
>>>>>  MetaData(bind=None),
>>>>>  Column('col1', String(), table=<mytable>), schema=None)
>>>> 
>>>> 
>>>> So "mutating" a Table like that would correspond to emitting ALTER 
>>>> statements on the database. There is a tool called sqlalchemy-migrate 
>>>> which works this way; you have a Table object, you say, 
>>>> "table.remove_column(col)", and it emits an "ALTER TABLE" which drops that 
>>>> column. To do this, they altered SQLAlchemy itself by monkeypatching the 
>>>> Table class. This design ultimately proved too rigid for its purpose, as 
>>>> it required the construction of Table models in order to emit simple 
>>>> commands and therefore led to a lot of overly complex and inefficient 
>>>> migration scripts.
>>>> 
>>>> To alleviate this, I wrote a new migrations tool called Alembic that 
>>>> instead allows you to emit commands like "ALTER TABLE" without actually 
>>>> mutating Table objects. Today, Alembic has superseded sqlalchemy-migrate 
>>>> and as luck would have it, I was put in charge of maintaining 
>>>> sqlalchemy-migrate anyway even though it is pretty much a legacy project.
>>>> 
>>>> The point is, I don't think you should be trying to change the structure 
>>>> of a Table object once you create it. The extend_existing() method is only 
>>>> there to suit the purpose of being able to reflect database metadata into 
>>>> an existing set of Table objects that include some existing state already. 
>>>> But otherwise, the use case for Table / MetaData / Column etc. is that 
>>>> they represent a fixed database schema structure that is expressed at 
>>>> once. They allow for additive creation, that is, adding new tables, adding 
>>>> new columns, only as a means to help construct a particular schema from 
>>>> different parts, such as certain ORM mixins that add new columns to a 
>>>> table, or being able to reflect additional columns / tables to complement 
>>>> a set of columns / tables that are explicitly declared in the application. 
>>>> But in all cases this additive construction is intended to represent a 
>>>> schema that is unchanging. Schema changes are considered to be out of 
>>>> scope for the runtime of the application itself.
>>>> 
>>>> If your program is provided with "a data schema" to start with, there 
>>>> should be no need to rename or delete columns. Your actual use case is 
>>>> likely not quite like this so feel free to share more details of why you 
>>>> are seeking these operations.
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> *I can think of 2 approaches*:
>>>>> # Recreate the entire table
>>>>> mytable.metadata.remove(mytable)
>>>>> mytable = sqla.Table(
>>>>> mytable.name,
>>>>>  mytable.metadata,
>>>>>  *[i for i in mytable.columns if i.name != 'col2])
>>>>> 
>>>>> # Modify the private _columns collection
>>>>> mytable._columns.remove(mytable.c['col2'])
>>>>> 
>>>>> I was wondering if there is a correct/compliant way to do this.
>>>>> 
>>>>> PS: I cannot avoid doing a drop column - have to do it as that is an 
>>>>> operation that we want to support.
>>>>> 

>>>>> --
>>>>> SQLAlchemy - 
>>>>> The Python SQL Toolkit and Object Relational Mapper
>>>>> 
>>>>> http://www.sqlalchemy.org/
>>>>> 
>>>>> To post example code, please provide an MCVE: Minimal, Complete, and 
>>>>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
>>>>> description.
>>>>> --- 
>>>>> You received this message because you are subscribed to the Google Groups 
>>>>> "sqlalchemy" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>>> email to sqlalchemy+unsubscr...@googlegroups.com.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/sqlalchemy/CA%2BBKPDXffNsB9LaBqz4yiUa_%2BmFAcKbYEnXRbrL6pdohr5DZxg%40mail.gmail.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/sqlalchemy/CA%2BBKPDXffNsB9LaBqz4yiUa_%2BmFAcKbYEnXRbrL6pdohr5DZxg%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>>>> 
>>>> 

>>>> --
>>>> SQLAlchemy - 
>>>> The Python SQL Toolkit and Object Relational Mapper
>>>> 
>>>> http://www.sqlalchemy.org/
>>>> 
>>>> To post example code, please provide an MCVE: Minimal, Complete, and 
>>>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
>>>> description.
>>>> --- 
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "sqlalchemy" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to sqlalchemy+unsubscr...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/sqlalchemy/e8f6621d-b4af-4c53-9415-39b9a65282f0%40www.fastmail.com
>>>>  
>>>> <https://groups.google.com/d/msgid/sqlalchemy/e8f6621d-b4af-4c53-9415-39b9a65282f0%40www.fastmail.com?utm_medium=email&utm_source=footer>.
>>> 

>>> --
>>> SQLAlchemy - 
>>> The Python SQL Toolkit and Object Relational Mapper
>>> 
>>> http://www.sqlalchemy.org/
>>> 
>>> To post example code, please provide an MCVE: Minimal, Complete, and 
>>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
>>> description.
>>> --- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "sqlalchemy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to sqlalchemy+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/sqlalchemy/CA%2BBKPDUXrUT92WM6hRWaiRwzOwhB1Rrki_xnUx3KWr8PAUEWdg%40mail.gmail.com
>>>  
>>> <https://groups.google.com/d/msgid/sqlalchemy/CA%2BBKPDUXrUT92WM6hRWaiRwzOwhB1Rrki_xnUx3KWr8PAUEWdg%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>> 
>> 

>> --
>>  SQLAlchemy - 
>>  The Python SQL Toolkit and Object Relational Mapper
>> 
>> http://www.sqlalchemy.org/
>> 
>>  To post example code, please provide an MCVE: Minimal, Complete, and 
>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
>> description.
>>  --- 
>>  You received this message because you are subscribed to the Google Groups 
>> "sqlalchemy" group.
>>  To unsubscribe from this group and stop receiving emails from it, send an 
>> email to sqlalchemy+unsubscr...@googlegroups.com.
>>  To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/sqlalchemy/89af6d33-42ed-4463-a967-ee106dba36f2%40www.fastmail.com
>>  
>> <https://groups.google.com/d/msgid/sqlalchemy/89af6d33-42ed-4463-a967-ee106dba36f2%40www.fastmail.com?utm_medium=email&utm_source=footer>.
> 

> --
>  SQLAlchemy - 
>  The Python SQL Toolkit and Object Relational Mapper
> 
> http://www.sqlalchemy.org/
> 
>  To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
>  --- 
>  You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
>  To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
>  To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/CA%2BBKPDWWxwgiVKHcEzR3w_X-x6VckH-DTnxLtE%3D2oUxmdNeVvQ%40mail.gmail.com
>  
> <https://groups.google.com/d/msgid/sqlalchemy/CA%2BBKPDWWxwgiVKHcEzR3w_X-x6VckH-DTnxLtE%3D2oUxmdNeVvQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/ab586e77-4f87-420e-8330-bfb652318506%40www.fastmail.com.

Reply via email to