Shone and I synced offline but wanted to circle back here so others can hopefully benefit and others with more experience with this can correct me if there's a better way to achieve this.
*Problem*: The use case is that incoming data has fields out of order w.r.t already ingested data in Iceberg. This same scenario applies to nested columns as well (e.g. fields in a sub-struct has fields out of order) . Also Incoming data might have added fields. Issue is if data is ingested as is Iceberg will complain with it's compatibility checks. As it should. *Solution*: Iceberg doesn't depend on field names nor natural order of fields. It uses Ids to keep track of schema fields. So if one wants to enforce evolution rules correctly she should first go back to the underlying Iceberg schema and apply schema transformation rules using Iceberg Schema Update Api and commit the schema changes to the underlying table. Once this is done Iceberg will have created a new version of the schema with new Ids allotted to the added fields. It also accounts for different order in the incoming data as it keeps the id-name mapping for all columns. Here is a gist that captures these scenarios described above with sample data : https://gist.github.com/prodeezy/b2cc35b87fca7d43ae681d45b3d7cab3 Cheers, -Gautam. On Wed, Sep 25, 2019 at 5:29 AM Ryan Blue <rb...@netflix.com.invalid> wrote: > Hi Shone, > > Iceberg should be able to handle out of order data columns in nested > structures. We probably just need to relax that compatibility check to > allow it. Can you post the error message that you're getting? > > On Sun, Sep 22, 2019 at 4:49 AM Shone Sadler <ssad...@adobe.com.invalid> > wrote: > >> Hello everyone, >> >> This question is related to schema evolution support in Iceberg. >> >> We have data coming in with fields out-of-order wrt to the schema in >> Iceberg (e.g. inbound struct(a,b,c) vs. iceberg struct(c,b,a)) >> >> As a result we are hitting the following error in Iceberg when saving the >> data -> "Cannot write incompatible dataset to table with schema", >> generated within the IcebergeSource -> >> https://github.com/apache/incubator-iceberg/blob/d1f0b540f5f14f002be86133ef9f66445f7e0926/spark/src/main/java/org/apache/iceberg/spark/source/IcebergSource.java#L157 >> >> I also noted in the documentation that re-ordering was allowed -> >> https://iceberg.apache.org/evolution/ , which led me to believe that we >> could update the schema prior to writing the data, However, I see no means >> of re-ordering fields on the current UpdateSchema API. >> >> How are people handling out-of-order fields today? >> >> Our data is deeply nested, as a result I am hoping not to have to >> transform/prep on ingest and looking for alternatives. >> >> Any thoughts appreciated! >> >> Regards, >> Shone Sadler >> >> >> >> > > -- > Ryan Blue > Software Engineer > Netflix >