Re: Re: Re: Integrate PyHive and Alembic

Mike Bayer Wed, 26 Feb 2020 08:07:48 -0800


On Tue, Feb 25, 2020, at 11:06 AM, Ke Zhu - k...@us.ibm.com wrote:
> According to Apache Hive, Update/Delete can only be performed on tables that 
> supports ACID. see  
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Update
> 
> > this is a database that can change schema structures but not 
> > insert/update/delete rows? is there some kind of SQL layer that has to be 
> > in use that isn't there when you create structures?
> 
> Yes! Basically it separate data store and data model so that you can update 
> table schema as metadata while the actual data can be loaded/streamed into 
> external data store w/o SQL interface.
> 
> > that wouldn't be very easy but also it would be way better to store the 
> > version info in the target DB itself.
> 
> totally understand this. I see this design as advantage as well just like 
> other schema management tool like sqitch. I'm just looking for possiblilities 
> since I've seen it uses the same connection to execute schema changes and 
> versioning change in a migration context.
> 
> > I don't know anything about Hive/Presto, but they *are* databases so I'd 
> > assume you can put data in them
> 
> Yes to Hive, when using PyHive, it just needs to fix  
> https://github.com/dropbox/PyHive/issues/315


oh, that issue is talking about rowcount. Alembic does not need rowcount to 
function correctly. I see that Alembic is doing this now, however there is a 
dialect-level flag called "supports_sane_rowcount", if this were False, Alembic 
should be checking this and skipping that particular check.

is the "rowcount" check the only issue here? this part can be changed on the 
Alembic side.


> 
> While, the answer will be no to PrestoSQL which is just a SQL query engine 
> that delegates data model and data store to query targets 
> (MySQL/Postgres/Kafka/Elasticsearch etc) via connectors.
> 
> On Mon, 2020-02-24 at 18:28 -0500, Mike Bayer wrote:
>> 
>> 
>> On Mon, Feb 24, 2020, at 3:44 PM, Ke Zhu - k...@us.ibm.com wrote:
>>> Mike,
>>> 
>>> Thanks for the pointers. I've figured out the programming part and 
>>> discovers more things when integrating PyHive:
>>> 
>>> 1. It requires the table alembic_version to be transactional so that it can 
>>> do Update/Delete when upgrade/downgrade. which is challege for Hive3 which 
>>> has limited ACID support. Since I could not figure out a way to program a 
>>> transactional table via SQLAlechemy API (see 
>>> https://github.com/dropbox/PyHive/issues/314), it ended up a decoration to 
>>> patch sqlalchemy.schema.CreateTable by appending table properties.
>> 
>> it's not a hard requirement that there's a DB transaction in use, you could 
>> run alembic in an "autocommit" mode and that shouldn't cause any problem. 
>> you just won't be able to roll back if something fails. im not completely 
>> sure what you mean by "the table to be transactional so that it can do 
>> update/delete" but from Alembic's point of view it just needs to run 
>> INSERT/UPDATE/DELETE but there doesn't have to be any ACID guarantees.
>> 
>> 
>>> 2. PyHive doesn't fully support Update/Delete for Hive/Presto yet. it's 
>>> easy to patch PyHive but the key problem is it doesn't support 
>>> transactional DDL like rollback in Hive3.
>>> 
>>> Although I've managed to get `alembic upgrade` and `alembic downgrade` to 
>>> work on Hive 3, it's still not a fully transactional experience (e.g., 
>>> changed schema sccessfully but failed to update table alembic_version).
>> 
>> so..this is a database that can change schema structures but not 
>> insert/update/delete rows? im not following. is there some kind of SQL layer 
>> that has to be in use that isn't there when you create structures?
>> 
>> 
>>> 
>>> I wonder if there's any design direction in Alembic to allowing storing the 
>>> version table `alembic_version` in another db when dealing with non-RDBMS 
>>> SQL engine (Hive/Presto). e.g., supporting a postgres db to store table 
>>> alembic while delivering the actual changes to Hive. I had a PoC to using 
>>> multi-db template to manage the table `alembic_version` in a RDBMS while 
>>> denying any operation on table `alembic_version` in Hive/Presto. it works 
>>> now but does it sound right?
>> 
>> that wouldn't be very easy but also it would be way better to store the 
>> version info in the target DB itself. I don't know anything about 
>> Hive/Presto, but they *are* databases so I'd assume you can put data in them.
>> 
>> 
>>> 
>>> On Sat, 2020-01-25 at 18:19 -0500, Mike Bayer wrote:
>>>> 
>>>> 
>>>> On Fri, Jan 24, 2020, at 1:56 PM, Ke Zhu wrote:
>>>>> Just discovered this post when trying to do exact same thing (besides 
>>>>> planning to support one more dialect).
>>>>> 
>>>>> > Anywhere in your hive dialect, simply put the above code that you have 
>>>>> > (using the correct imports of course). 
>>>>> 
>>>>> Does it mean it must introduce dependency to alembic (since it uses 
>>>>> alembic.ddl.impl.DefaultImpl) in a package (.e.g, pyHive) that supports 
>>>>> sqlalchemy interfaces?
>>>> 
>>>> well you have to put it in a try/except ImportError block so that if 
>>>> alembic isn't installed, it silently passes. there's a github issue to add 
>>>> support for real entrypoints but it hasn't been that critical.
>>>> 
>>>>> 
>>>>> If not, is there any guidance to support this at alembic level in a 
>>>>> plug-gable way? E.g., declare a HiveImpl class in `env.py` of a project 
>>>>> uses alembic?
>>>> 
>>>> you could put one in your env.py also but if you are the person working on 
>>>> the dialect you can have this built in, see the example in 
>>>> sqlalchemy-redshift:  
>>>> https://github.com/sqlalchemy-redshift/sqlalchemy-redshift/blob/master/sqlalchemy_redshift/dialect.py#L27
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> PS: I raised this question 
>>>>> <https://stackoverflow.com/questions/59887588/how-to-add-new-dialect-to-alembic-besides-built-in-dialects>
>>>>>  in stackoverflow but raised this group is a better place to get help.
>>>>> 
>>>>> 
>>>>> On Friday, February 10, 2017 at 9:45:38 AM UTC-5, mike bayer wrote:
>>>>>> 
>>>>>> 
>>>>>> On 02/10/2017 07:41 AM, Alexander Peletz wrote: 
>>>>>> > Hello, 
>>>>>> > 
>>>>>> > I would like to use Alembic to manage my Hive Metastore. I have 
>>>>>> > installed, PyHive, SqlAlchemy, and Alembic. I am able to create a 
>>>>>> > functional engine object using the 'hive' dialect in sqlalchemy, 
>>>>>> > however 
>>>>>> > I cannot get Alembic to recognize this dialect. The problem appears to 
>>>>>> > be a lack of a HiveImpl class in the Alembic package. I attempted to 
>>>>>> > resolve this by creating an alembic/ddl/hive.py module and pasting the 
>>>>>> > following code into that module: 
>>>>>> > 
>>>>>> > 
>>>>>> > from .impl import DefaultImpl 
>>>>>> > 
>>>>>> > class HiveImpl(DefaultImpl): 
>>>>>> > __dialect__ = 'hive' 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> you don't actually have to create a "hive.py" file. Anywhere in your 
>>>>>> hive dialect, simply put the above code that you have (using the correct 
>>>>>> imports of course). The DefaultImpl uses a metaclass that will allow 
>>>>>> the "hive" name to be available to alembic as a result of this class 
>>>>>> being created. 
>>>>>> 
>>>>>> 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > I simply want to be able to execute raw SQL against a Hive instance 
>>>>>> > (no 
>>>>>> > ORM implementation needed) and I was hoping to use Alembic to manage 
>>>>>> > the 
>>>>>> > minimum upgrade/downgrade functionality. Are there any simple edits I 
>>>>>> > can make to the Alembic source code to allow me to achieve this goal? 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > Thanks, 
>>>>>> > 
>>>>>> > Alexander 
>>>>>> > 
>>>>>> > -- 
>>>>> 

>>>>> --
>>>>> You received this message because you are subscribed to the Google Groups 
>>>>> "sqlalchemy-alembic" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>>> email to sqlalchemy-alembic+unsubscr...@googlegroups.com.
>>>>> To view this discussion on the web visit  
>>>>> https://groups.google.com/d/msgid/sqlalchemy-alembic/59b308d9-7a9f-4038-bb52-f578c2c9cb69%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/sqlalchemy-alembic/59b308d9-7a9f-4038-bb52-f578c2c9cb69%40googlegroups.com?utm_medium=email&utm_source=footer>.
>>>> 
>>>> 

>>> 

>>> --
>>> You received this message because you are subscribed to the Google Groups 
>>> "sqlalchemy-alembic" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to sqlalchemy-alembic+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit  
>>> https://groups.google.com/d/msgid/sqlalchemy-alembic/d0623681577b433554974720744cb5dd835dd126.camel%40us.ibm.com
>>>  
>>> <https://groups.google.com/d/msgid/sqlalchemy-alembic/d0623681577b433554974720744cb5dd835dd126.camel%40us.ibm.com?utm_medium=email&utm_source=footer>.
>>> 
>>> 
>>> *Attachments:*
>>>  * smime.p7s
>> 
> 

> --
>  You received this message because you are subscribed to the Google Groups 
> "sqlalchemy-alembic" group.
>  To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy-alembic+unsubscr...@googlegroups.com.
>  To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy-alembic/8414ea00ba7a587499b71457649aa7717661d82c.camel%40us.ibm.com
>  
> <https://groups.google.com/d/msgid/sqlalchemy-alembic/8414ea00ba7a587499b71457649aa7717661d82c.camel%40us.ibm.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy-alembic" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy-alembic+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy-alembic/584f1f4e-7fa6-45aa-9e32-e2a476e18b47%40www.fastmail.com.

Re: Re: Re: Integrate PyHive and Alembic

Reply via email to