[GitHub] [hudi] bvaradar commented on issue #1910: [SUPPORT] Upsert operation duplicating records in a partition

2020-09-02 Thread GitBox
bvaradar commented on issue #1910: URL: https://github.com/apache/hudi/issues/1910#issuecomment-685860529 Closing this due to inactivity. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] bvaradar commented on issue #1910: [SUPPORT] Upsert operation duplicating records in a partition

2020-08-18 Thread GitBox
bvaradar commented on issue #1910: URL: https://github.com/apache/hudi/issues/1910#issuecomment-675441022 @mingujotemp : Any updates ? This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] bvaradar commented on issue #1910: [SUPPORT] Upsert operation duplicating records in a partition

2020-08-06 Thread GitBox
bvaradar commented on issue #1910: URL: https://github.com/apache/hudi/issues/1910#issuecomment-670341009 @mingujotemp : I just noticed you are using hive 3.x. I have not seen similar issues with Hive 2.x. Can you enable debug logging to see if your spark sql query triggers

[GitHub] [hudi] bvaradar commented on issue #1910: [SUPPORT] Upsert operation duplicating records in a partition

2020-08-05 Thread GitBox
bvaradar commented on issue #1910: URL: https://github.com/apache/hudi/issues/1910#issuecomment-669681168 For Spark SQL, "--conf spark.sql.hive.convertMetastoreParquet=false" needs to be passed when starting up Spark. Can you check if this is being set ?

[GitHub] [hudi] bvaradar commented on issue #1910: [SUPPORT] Upsert operation duplicating records in a partition

2020-08-04 Thread GitBox
bvaradar commented on issue #1910: URL: https://github.com/apache/hudi/issues/1910#issuecomment-668651417 This looks like you are not using hudi format to read the table. Did you try spark.read.format("hudi"). ? This