Re: Iceberg - Hive schema synchronization

2020-12-04 Thread Ryan Blue
A few replies inline. On Thu, Nov 26, 2020 at 3:49 AM Peter Vary wrote: > I think the column mapping should also be 1-to-1. Hive would have trouble > writing to a table if it didn't include all required columns. I think that > the right thing is for all engines to provide uniform access to all

Re: Iceberg - Hive schema synchronization

2020-11-26 Thread Peter Vary
Thanks for all the responses. Added my comments below: > On Nov 25, 2020, at 23:45, Ryan Blue wrote: > > I agree that a 1-to-1 type mapping is the right option. Some additional > mappings should be supported; I think it should be fine to use VARCHAR in DDL > to produce a string column in

Re: Iceberg - Hive schema synchronization

2020-11-25 Thread Ryan Blue
I agree that a 1-to-1 type mapping is the right option. Some additional mappings should be supported; I think it should be fine to use VARCHAR in DDL to produce a string column in Iceberg. Iceberg is also strict about type promotion, and I don't think that we should confuse type promotion with

Re: Iceberg - Hive schema synchronization

2020-11-25 Thread Zoltán Borók-Nagy
Hi Everyone, In Impala we face the same challenges. I think a strict 1-to-1 type mapping would be beneficial because that way we could derive the Iceberg schema from the Hive schema, not just the other way around. So we could just naturally create Iceberg tables via DDL. We should use the same

Re: Iceberg - Hive schema synchronization

2020-11-24 Thread Vivekanand Vellanki
Some of the conversions we are seeing are: - Decimal to Decimal; not just limited to increasing precision as with Iceberg - varchar to string - numeric type to numeric type (float to Decimal, double to Decimal, Decimal to double, etc) - - numeric type to string On Tue, Nov

Re: Iceberg - Hive schema synchronization

2020-11-24 Thread Owen O'Malley
You left the complex types off of your list (struct, map, array, uniontype). All of them have natural mappings in Iceberg, except for uniontype. Interval is supported on output, but not as a column type. Unfortunately, we have some tables with uniontype, so we'll need a solution for how to deal

Re: Iceberg - Hive schema synchronization

2020-11-24 Thread Vivekanand Vellanki
One of the challenges we've had is that Hive is more flexible with schema evolution compared to Iceberg. Are you guys also looking at this aspect? On Tue, Nov 24, 2020 at 8:21 PM Peter Vary wrote: > Hi Team, > > With Shardul we had a longer discussion yesterday about the schema >

Iceberg - Hive schema synchronization

2020-11-24 Thread Peter Vary
Hi Team, With Shardul we had a longer discussion yesterday about the schema synchronization between Iceberg and Hive, and we thought that it would be good to ask the opinion of the greater community too. We can have 2 sources for the schemas. Hive table definition / schema Iceberg schema. If