[GitHub] [hudi] nsivabalan commented on issue #6531: [SUPPORT] Insert results different than bulk_insert

2022-09-03 Thread GitBox
nsivabalan commented on issue #6531: URL: https://github.com/apache/hudi/issues/6531#issuecomment-1236197809 yeah. if not for small file handling, you might as well go w/ bulk_insert. thats why. I am going ahead and closing out the github issue. feel free to open new one if you have any

[GitHub] [hudi] nsivabalan commented on issue #6531: [SUPPORT] Insert results different than bulk_insert

2022-08-31 Thread GitBox
nsivabalan commented on issue #6531: URL: https://github.com/apache/hudi/issues/6531#issuecomment-1233550376 dedup w/ insert could happen by chance if the new batch is routed to the same file group due to small file handling. so thats just a side effect of small file handling. --

[GitHub] [hudi] nsivabalan commented on issue #6531: [SUPPORT] Insert results different than bulk_insert

2022-08-29 Thread GitBox
nsivabalan commented on issue #6531: URL: https://github.com/apache/hudi/issues/6531#issuecomment-1230640919 @bhasudha : can we enhance the docs on this end. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [hudi] nsivabalan commented on issue #6531: [SUPPORT] Insert results different than bulk_insert

2022-08-29 Thread GitBox
nsivabalan commented on issue #6531: URL: https://github.com/apache/hudi/issues/6531#issuecomment-1230640076 yes, by default bulk_insert will not dedup and thats by design. we just wanted to give user a way to bulk import w/o any index look up. -- This is an automated message from