Re: Incompatible Writes due to OutOfOrder Fields

Gautam Thu, 26 Sep 2019 01:24:14 -0700

Shone and I synced offline but wanted to circle back here so others can
hopefully benefit and others with more experience with this can correct me
if there's a better way to achieve this.

*Problem*:
  The use case  is that incoming data has fields out of order w.r.t already
ingested data in Iceberg. This same scenario applies to nested columns as
well (e.g. fields in a sub-struct has fields out of order) . Also Incoming
data might have added fields. Issue is if data is ingested as is  Iceberg
will complain with it's compatibility checks. As it should.

*Solution*:
  Iceberg doesn't depend on field names nor natural order of fields. It
uses Ids to keep track of schema fields. So if one wants to
enforce evolution rules correctly she should first go back to the
underlying Iceberg schema and apply schema transformation rules using
Iceberg Schema Update Api and commit the schema changes to the underlying
table. Once this is done Iceberg will have created a new version of the
schema with new Ids allotted to the added fields. It also accounts for
different order in the incoming data as it keeps the id-name mapping for
all columns.

Here is a gist that captures these scenarios described above with sample
data : https://gist.github.com/prodeezy/b2cc35b87fca7d43ae681d45b3d7cab3

Cheers,
-Gautam.

On Wed, Sep 25, 2019 at 5:29 AM Ryan Blue <rb...@netflix.com.invalid> wrote:

> Hi Shone,
>
> Iceberg should be able to handle out of order data columns in nested
> structures. We probably just need to relax that compatibility check to
> allow it. Can you post the error message that you're getting?
>
> On Sun, Sep 22, 2019 at 4:49 AM Shone Sadler <ssad...@adobe.com.invalid>
> wrote:
>
>> Hello everyone,
>>
>> This question is related to schema evolution support in Iceberg.
>>
>> We have data coming in with fields out-of-order wrt to the schema in
>> Iceberg (e.g. inbound struct(a,b,c) vs. iceberg struct(c,b,a))
>>
>> As a result we are hitting the following error in Iceberg when saving the
>> data  -> "Cannot write incompatible dataset to table with schema",
>> generated within the IcebergeSource ->
>> https://github.com/apache/incubator-iceberg/blob/d1f0b540f5f14f002be86133ef9f66445f7e0926/spark/src/main/java/org/apache/iceberg/spark/source/IcebergSource.java#L157
>>
>> I also noted in the documentation that re-ordering was allowed ->
>> https://iceberg.apache.org/evolution/ , which led me to believe that we
>> could update the schema prior to writing the data, However, I see no means
>> of re-ordering fields on the current UpdateSchema API.
>>
>> How are people handling out-of-order fields today?
>>
>> Our data is deeply nested, as a result I am hoping not to have to
>> transform/prep on ingest and looking for alternatives.
>>
>> Any thoughts appreciated!
>>
>> Regards,
>> Shone Sadler
>>
>>
>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Incompatible Writes due to OutOfOrder Fields

Reply via email to