need help with Hudi Delete

2022-07-14 Thread aakash aakash
Hi, We have a use case to perform soft delete over some record keys where we nullify non-key fields and ignore any update for this record later on. We thought of using a hudi meta field: "_hoodie_is_soft_deleted" as hudi hard delete (_hoodie_is_deleted) does to make it simple to identify if the p

Re: need help with Hudi Delete

2022-07-15 Thread Pratyaksh Sharma
Hi, Hudi is complaining because '_hoodie_is_soft_deleted' is present in the parquet file's schema but is not present in your incoming schema. >From my experience, I would say it is a standard practice to add an extra field which acts as a marker for soft deletion and needs to be persisted with ev

Re: need help with Hudi Delete

2022-07-15 Thread aakash aakash
Thanks for the response Pratyaksh! We add this column to the Spark dataframe before calling the hudi upsert and delete. And this should work like an extra nullable column in the schema but it's not behaving like that, so wondering if we remove any column with the prefix *'_hoodie' * in Hudi code.

Re: need help with Hudi Delete

2022-07-15 Thread Pratyaksh Sharma
Hi Aakash, For the field to behave as a nullable extra field, you need to add default value as null to the schema and make "null" as the first type in your union schema for `_hoodie_is_soft_deleted`.Hope that helps. On Fri, Jul 15, 2022 at 8:01 PM aakash aakash wrote: > Thanks for the response

Re: need help with Hudi Delete

2022-07-18 Thread Sivabalan
yes, from the pasted schema, there is no default set for the newly added column. { "name" : "*_hoodie_is_soft_deleted*", "type" : [ "string", "null" ] } ] If you can fix that and give it a try, it should work. On Sat, 16 Jul 2022 at 03:13, Pratyaksh Sharma wrote: > Hi Aakash, > > Fo