Possible Data Loss with RemoveOrphanFilesAction

2020-09-11 Thread Russell Spitzer
Because the RemoveOrphanFilesAction uses Filesystem.list, the paths of files found in the file system can have an authority included in them based on the core-site.xml. This is determined when listing the files so the entries stored in the metadata tables do not necessarily have to match. URIs will

Re: Upgrade components to Hive 3 and Hadoop 3

2020-09-11 Thread Ryan Blue
Hi Marton, could you share a link to your branch with the changes? It would be great to see what needs to be done. A quick summary would help as well. Knowing what changes between Hive 2 and 3 in our iceberg-hive-metastore project is important because we would ideally use whatever is available at

Re: Iceberg's type system

2020-09-11 Thread Chen Song
Thanks for the reply. We have this requirement as we may want to include some very specific types that are not generic at all. Say if we use Iceberg's data apis to read and write data files directly, all I think is needed is to be able to specify the new types in metadata. Annotate an existing typ

Re: Iceberg's type system

2020-09-11 Thread Ryan Blue
Sorry I missed this email! Right now, we don't support extending Iceberg's type system. We are currently targeting a small set of types so that engines can easily implement them. And we've already hit some issues with the types that Iceberg supports: Spark, for example, doesn't support a time type

Re: Iceberg's type system

2020-09-11 Thread Chen Song
Any thoughts, or suggestions on this? On Tue, Sep 8, 2020 at 3:01 PM Chen Song wrote: > Hi > > I have a general question on Iceberg's data type system. Iceberg has a > well defined type spec > which can be > mapped to types in Avro, Parqu

Re: Timestamp Based Incremental Reading in Iceberg ...

2020-09-11 Thread Ryan Blue
> you never know if that's the same order in which the writers will commit (when you have multiple writers in the system) That's exactly the problem with trying to rely on timestamps. Even if you can coordinate the timestamps themselves, commits could still be out of order because of differences i

INSERT to Iceberg tables from Impala

2020-09-11 Thread Zoltán Borók-Nagy
Hi, I'm willing to add INSERT support for Iceberg tables in Impala. For start I created the following design doc: https://docs.google.com/document/d/1_KL0YptDKwhiXvJyx4Vb-yZjggrPQAW2yjeGV4C0vMU/edit?usp=sharing All comments are welcome. Thanks, Zoltan

Upgrade components to Hive 3 and Hadoop 3

2020-09-11 Thread Marton Bod
Hi Team, We would like to start a discussion on upgrading Iceberg components to use Hive 3 and Hadoop 3. We have a fork where we have bumped up the hive and hadoop dependency versions and made the necessary changes to get all tests to pass. As some components cannot (e.g. spark2) or might not wan