[GitHub] [hudi] adnanhb opened a new issue #2207: [SUPPORT]

GitBox Sun, 25 Oct 2020 13:52:01 -0700


adnanhb opened a new issue #2207:
URL: https://github.com/apache/hudi/issues/2207



   Hello, this might be a basic question but I am not able to find any guidance 
anywhere. We are writing approx 8 million records (55 columns per reord) to a 
hudi dataset which is saved on s3. We are using copy on write. The entire 
process takes about 4 hours. I am pretty sure the overall time can be optimized 
but i am not sure how to go about it. My biggest confusion is whether running 
the spark application on multiple executors will speed up the write. From what 
i have gleaned from reading several posts is that apache hudi does not support 
concurrent writes. Does that mean having multiple executors manipulating the 
hudi dataset will not work? Thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] adnanhb opened a new issue #2207: [SUPPORT]

Reply via email to