Re: Iceberg and Hive

2019-01-07 Thread Arvind Pruthi
Vladi, We have similarities to what you describe. While I agree that what Owen mentioned about an implementation of Hive's Rawstore API will be really useful, I don't believe it fully answers Vladi's question. I think the main concern here is smooth migration of existing clients to iceberg tab

Re: Iceberg and Hive

2019-01-07 Thread Ryan Blue
Vladi, I'll add a little to Owen's answer for context. Owen was right that using an Iceberg table in Hive will require some work implementing the RawStore API. But the `iceberg-hive` module will currently use the Hive Metastore to keep track of Iceberg metadata. An Iceberg table isn't a Hive tabl

Re: Iceberg and Hive

2019-01-07 Thread Owen O'Malley
The group has moved to the Apache infrastructure, so we should use dev@iceberg.apache.org . What is required, but not started, is for someone to implement Hive's RawStore API with an Iceberg backend. That would let you use Hive SQL commands to manipulate the Iceberg tables. .. Owen On Mon, Jan

Re: [DISCUSS] Draft report for January 2019

2019-01-07 Thread Owen O'Malley
+1 On Mon, Jan 7, 2019 at 11:55 AM Ryan Blue wrote: > Dev list, > > We missed the initial report deadline, but I went ahead and drafted this > anyway. Mentors can still sign off on this until end of day tomorrow, here: > https://wiki.apache.org/incubator/January2019. Please have a look. > > rb >

[DISCUSS] Draft report for January 2019

2019-01-07 Thread Ryan Blue
Dev list, We missed the initial report deadline, but I went ahead and drafted this anyway. Mentors can still sign off on this until end of day tomorrow, here: https://wiki.apache.org/incubator/January2019. Please have a look. rb Iceberg is a table format for large, slow-moving tabular data. I

Re: Python support for Tables creation

2019-01-07 Thread Dave Sugden
On 2019/01/05 16:55:49, Dave Sugden wrote: > Are there plans to provide python support (JavaGateway etc.) for the> > creation of the Tables (eg. HadoopTables), Schema and PartitionSpec?> > > As far as I can tell, that would be sufficient, as pyspark.sql provides> > DataFrame support.> > Ah! just