adnanhb opened a new issue #2207: URL: https://github.com/apache/hudi/issues/2207
Hello, this might be a basic question but I am not able to find any guidance anywhere. We are writing approx 8 million records (55 columns per reord) to a hudi dataset which is saved on s3. We are using copy on write. The entire process takes about 4 hours. I am pretty sure the overall time can be optimized but i am not sure how to go about it. My biggest confusion is whether running the spark application on multiple executors will speed up the write. From what i have gleaned from reading several posts is that apache hudi does not support concurrent writes. Does that mean having multiple executors manipulating the hudi dataset will not work? Thanks ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org