nsivabalan commented on issue #6531:
URL: https://github.com/apache/hudi/issues/6531#issuecomment-1236197809
yeah. if not for small file handling, you might as well go w/ bulk_insert.
thats why. I am going ahead and closing out the github issue. feel free to open
new one if you have any
nsivabalan commented on issue #6531:
URL: https://github.com/apache/hudi/issues/6531#issuecomment-1233550376
dedup w/ insert could happen by chance if the new batch is routed to the
same file group due to small file handling.
so thats just a side effect of small file handling.
--
nsivabalan commented on issue #6531:
URL: https://github.com/apache/hudi/issues/6531#issuecomment-1230640919
@bhasudha : can we enhance the docs on this end.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
nsivabalan commented on issue #6531:
URL: https://github.com/apache/hudi/issues/6531#issuecomment-1230640076
yes, by default bulk_insert will not dedup and thats by design. we just
wanted to give user a way to bulk import w/o any index look up.
--
This is an automated message from