Re: Question on implementing Celeborn client,

2023-07-13 Thread Keyong Zhou
Hi, If you call endpoint.ask[CommitFilesResponse](message), you should wait for response. If responses is successful, you can be sure commit files succeeds. Please refer to CommitHandler.requestCommitFilesWithRetry. Thanks, Keyong Zhou 于2023年7月13日周四 15:54写道: > > Following are the main steps fo

Re: Question on implementing Celeborn client,

2023-07-13 Thread orpl
Is there some way to use Celeborn API to check if CommitFiles succeeds in step 6? Currently we are testing with TPC-DS 10TB data, and some heavy query (query 24) occasionally fails with: Caused by: java.io.IOException: Premature EOF from inputStream We are speculating that this error occurs b

Re: Question on implementing Celeborn client,

2023-07-13 Thread orpl
Following are the main steps for a shuffle stage: 1. LifecycleManager sends RequestSlots to Master to request slots for the current shuffle; 2. Master allocates slots among workers for the shuffle and returns RequestSlotsResponse; 3. LifecycleManager sends ReserveSlots to workers; workers do initi

Re: Question on implementing Celeborn client,

2023-07-12 Thread Keyong Zhou
Hi Sungwoo, Glad to know about your progress! For your questions, 1. In Celeborn's default implementation, ShuffleClient is a singleton in the Executor and Driver process, I suggest to follow this practice. It's recommended to call ShuffleClient.cleanup(int shuffleId, int mapId, int attemptId

Re: Question on implementing Celeborn client,

2023-07-12 Thread orpl
Hi Keyong, Thanks for your quick reply. We thought that Celeborn API was clean and very intuitive, and have not encountered serious problems yet for getting our system up and running. We are not sure about just a few points that are not immediately obvious from Celeborn API (e.g., whether or n

Re: Question on implementing Celeborn client,

2023-07-12 Thread Keyong Zhou
Hi Sungwoo, Thanks for your effort to integrating Celeborn into MR3! For your question, currently a reducer does wait until the completion of all mappers before starting to fetch shuffle data. Briefly speaking, Celeborn client contains two modules: 1. ShuffleClient for push/fetch data, mainly us

Question on implementing Celeborn client,

2023-07-11 Thread orpl
Hi Team, We are currently implementing a Celeborn client for our application (called MR3 which is similar to Tez), and have a question on the internals of Celeborn. The question is whether a reducer should wait until the completion of all mappers before starting to fetch mapper output. From