tandonraghavs opened a new issue #2131: URL: https://github.com/apache/hudi/issues/2131
We are planning to use HUDI for our Data Warehouse, and we want to dump Mongo Data to S3. For Mongo we are relying on Oplogs(Debezium) . But as I experimented with Hudi, Hudi is not working if I have partial Data(oplogs kind of data). Is it recommended to have entire Row for Hudi? - I also tried to overwrite **HoodieRecordPayload** for **combineAndGetUpdateValue**, but that is also not fulfilling the use case as Compaction is running in Background then it only compares the recent record and stored record. So, few of the oplogs updates are getting lost. - There is another issue in Oplogs where we can have multiple Ids of same Document, then in this case I want to apply update using all the records, but due to preCombine it is taking updates of only latest record. - Also, there is no option to set **hoodie.combine.before.upsert=false** while using DataSource. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org