Re: [DISCUSS] Implementation strategies for supporting Iceberg tables in Hive

2019-08-07 Thread Owen O'Malley
> On Jul 24, 2019, at 22:52, Adrien Guillo > wrote: > > Hi Iceberg folks, > > In the last few months, we (the data infrastructure team at Airbnb) have been > closely following the project. We are currently evaluating potential > strategies to migrate our data warehouse to Iceberg.

Re: Iceberg in Spark 3.0.0

2019-08-07 Thread Saisai Shao
IMHO I agree that we should have a branch to track the changes for Spark 3.0.0. Spark 3.0.0 has several changes regarding to DataSource V2, it would be better to evaluate the changes and do the design by also considering 3.0 changes. My two cents :) Best regards, Saisai Edgar Rodriguez

Re: Two newbie question about Iceberg

2019-08-07 Thread Saisai Shao
Thanks guys for your reply. I didn't do anything special, I don't even have a configured Hive. I just simply put the iceberg (assembly) jar into Spark and start a local Spark process. I think the built-in Hive version of Spark is 1.2.1-spark (has a slight pom change), and all the configurations

Re: Any plan to support update, delete and others

2019-08-07 Thread Saisai Shao
Thanks a lot Ryan, that would be very helpful! Delta lake recently adds support for such operations in API level ( https://github.com/delta-io/delta/blob/master/src/main/scala/io/delta/tables/DeltaTable.scala). I was thinking that in the API level the goal of Iceberg is similar, maybe we could

Iceberg in Spark 3.0.0

2019-08-07 Thread Edgar Rodriguez
Hi everyone, I was wondering if there's a branch tracking the changes happening in Spark 3.0.0 for Iceberg. The DataSource V2 API has substantially changed from the one implemented in Iceberg master branch and since Spark 3.0.0 would allow us to introduce Spark SQL support then it seems

Re: Any plan to support update, delete and others

2019-08-07 Thread Ryan Blue
Hi Saisai, We are working on adding row-level delete support to Iceberg, where the deletes are applied when data is read. We’ve had a few good design discussions and have come up with a good way to integrate these into the format. Erik has written a good document on it:

Re: Row-level delete sync notes - July 2019

2019-08-07 Thread Ryan Blue
Thanks for the update Erik! In addition, Anton has opened PR #351 to update the API so that we can implement eager row-level overwrites. I think that's the only part that needs to be done for the eager overwrite case because the rest of the

Re: Two newbie question about Iceberg

2019-08-07 Thread Anton Okolnychyi
I think the reason why it works in tests is because we create all tables (including HIVE_LOCKS) using a script. I am not sure lock tables are always created in embedded mode. > On 7 Aug 2019, at 16:49, Ryan Blue wrote: > > This is the right list. Iceberg is fairly low in the stack, so most

Re: Two newbie question about Iceberg

2019-08-07 Thread Ryan Blue
This is the right list. Iceberg is fairly low in the stack, so most questions are probably dev questions. I'm surprised that this doesn't work with an embedded metastore because we use an embedded metastore in tests:

Any plan to support update, delete and others

2019-08-07 Thread Saisai Shao
Hi team, Delta lake project recently announced version 0.3.0, which added several new features in API level, like update, delete, merge, vacuum, etc. May I ask is there any plan to add such features in Iceberg? Thanks Saisai