[ https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503371#comment-13503371 ]
Bikas Saha commented on YARN-230: --------------------------------- Ignoring orphaned attempts was explicitly added because of the current implementation of deleting the app info first to act as a marker about app completion even if attempts failed to get deleted after that. The code should have discarded orphaned attempts from the store but I forgot about that. Good catch! Will fix. The store and remove methods have been made mirrors because it helps maintain symmetry of operations that is logically clear. An actual implementation could choose to remove the entire app data including attempts in removeApplication() making removeApplicationAttempt() a no-op. So that alternative is not precluded in the current interface while still maintaining flexibility at the interface. Also, the directory implementation can still be done in which case removeApplication() could call FS.delete(Path_to_dir) and removeApplicationAttempt() would remove the attempt file under the app directory or return success if the app dir has already been deleted. I chose to not use directories for FileSystem because one could put a key value store behind a FileSystem interface and I am not sure how directories would work in them. Also rmdir is atomic on HDFS but may not be atomic on every file system. For HDFS one could certainly write a directory based file structure for store in which apps would have their own directories. But IMO the best implementation might be a transaction log type implementation similar to what HBase uses I think. It might also have better HA characteristics because HDFS guarantees single writer to a file. That however requires considerable investment of time. It is really hard to guarantee atomicity of removal when we dont know how the file system is implemented. We could use log structure implementations or for HDFS we could use atomic rmdir. Also, lets look at the following scenario. We cannot removeApplication() until we know that the AM has exited and the job is really done. Just after the RM knows that the job is done, the RM could die before updating state. So upon restart we can never guarantee that a completed application was recorded as completed. This is one reason why I chose not to make the state machine wait for removeApplication() to complete. One improvement would be to update the store with an attempts final state (failed/killed/succeeded) and wait for it to be recorded before completing the state machine. This would allow us to not count killed as failed and also complete the application state machine if the last attempt had succeeded. This would implement the preferable solution in your second point above. This would still be an optimization since the RM could fail before storing the attempt state (like above) and we are back to square one. I would like to make this change after YARN-218 is done so that all related changes can be made together. I consciously chose to not provide defaults for the store because I think its important that users understand and think about it when they enable a store. And changing the config helps trigger important questions like which store works for me, what permissions are needed etc. I you still feel strongly about it then I could add defaults like you suggest. Could you please help by providing a good system path. I am not quite familiar with typical rules used to determine them. I have explained the temporary choice of Exception on YARN-231. I will address the remaining comments in the next patch. Thanks for all the feedback. This is a good discussion. I am sure that there are improvements to be made. Unless there are big issues with the current state of the work it would be great if we can commit it and address improvements in subsequent sub-tasks. This would help keep the changes smaller and easier to manage. The current code refactors and places basic interface/infrastructure in place. What do you think? > Make changes for RM restart phase 1 > ----------------------------------- > > Key: YARN-230 > URL: https://issues.apache.org/jira/browse/YARN-230 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Bikas Saha > Assignee: Bikas Saha > Attachments: PB-impl.patch, Recovery.patch, Store.patch, Test.patch, > YARN-230.1.patch > > > As described in YARN-128, phase 1 of RM restart puts in place mechanisms to > save application state and read them back after restart. Upon restart, the > NM's are asked to reboot and the previously running AM's are restarted. > After this is done, RM HA and work preserving restart can continue in > parallel. For more details please refer to the design document in YARN-128 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira