[GitHub] [hudi] nmukerje commented on issue #3321: [SUPPORT] Setting _hoodie_is_deleted column is not deleting records when using Spark DataSource.

2021-08-15 Thread GitBox
nmukerje commented on issue #3321: URL: https://github.com/apache/hudi/issues/3321#issuecomment-899117318 @umehrot2 Yes, that was exactly what was happening. Here is what I had to do to trick Spark to make the column nullable. ``` from pyspark.sql.functions import

[GitHub] [hudi] nmukerje commented on issue #3321: [SUPPORT] Setting _hoodie_is_deleted column is not deleting records when using Spark DataSource.

2021-08-14 Thread GitBox
nmukerje commented on issue #3321: URL: https://github.com/apache/hudi/issues/3321#issuecomment-898979181 @nsivabalan @codope Worked fine after I cast _hoodie_is_deleted to boolean. ``` df1=df1.withColumn("_hoodie_is_deleted",df1["_hoodie_is_deleted"].cast(BooleanType())) ```

[GitHub] [hudi] nmukerje commented on issue #3321: [SUPPORT] Setting _hoodie_is_deleted column is not deleting records when using Spark DataSource.

2021-07-28 Thread GitBox
nmukerje commented on issue #3321: URL: https://github.com/apache/hudi/issues/3321#issuecomment-888485324 Interesting! Trying it... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [hudi] nmukerje commented on issue #3321: [SUPPORT] Setting _hoodie_is_deleted column is not deleting records when using Spark DataSource.

2021-07-24 Thread GitBox
nmukerje commented on issue #3321: URL: https://github.com/apache/hudi/issues/3321#issuecomment-886012008 I am not using bulk insert for the upsert/delete. I am just bulk insert for Step 1 to stage some records. The notebook is public so you can run the cells/oode. The schema is