attilapiros commented on pull request #28618:
URL: https://github.com/apache/spark/pull/28618#issuecomment-646071843


   I was thinking a lot on `ShuffleDriverComponents` and I have an idea how to 
improve it. 
   
   The problem I believe this class tries to fulfil two very separate roles: be 
a **builder** and the **result of the building** in the same time. 
   
   This is why we need this kind of check:
   
https://github.com/apache/spark/blob/289050e4309b024146867b7b32974ab4746eb570/core/src/main/java/org/apache/spark/shuffle/sort/io/LocalDiskShuffleDriverComponents.java#L47-L49
 
   
   If the building is cleanly separated from the result of the building then we 
can be sure the prerequisites are fulfilled before. 
   
   I would change it by transforming it to be the result of the building in the 
following way:
   - the `initializeApplication` (I mean the process of the initialisation and 
not the returned Map) should be part of the building. The documentation of the 
`ShuffleDataIO#driver` method can be extended by mentioning this is the right 
place to initialize.
   - the `ShuffleDriverComponents` could have a new method which gives back the 
"additional SparkConf settings necessary for initializing the executor 
components" we can call it like `additonalExecutorConfigs`. This new method 
would replace the old `initializeApplication`
   
   One more idea / question:
   - I do not see why the `ShuffleOutputTracker` is optional. Either we or the 
API user can provide an implementation where the methods are empty this way the 
API a bit simpler.
   
   @mccheah what do you think?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to