[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

Bikas Saha (JIRA) Sat, 24 Nov 2012 02:13:06 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503371#comment-13503371
 ]


Bikas Saha commented on YARN-230:
---------------------------------

Ignoring orphaned attempts was explicitly added because of the current 
implementation of deleting the app info first to act as a marker about app 
completion even if attempts failed to get deleted after that. The code should 
have discarded orphaned attempts from the store but I forgot about that. Good 
catch! Will fix.
The store and remove methods have been made mirrors because it helps maintain 
symmetry of operations that is logically clear. An actual implementation could 
choose to remove the entire app data including attempts in removeApplication() 
making removeApplicationAttempt() a no-op. So that alternative is not precluded 
in the current interface while still maintaining flexibility at the interface.
Also, the directory implementation can still be done in which case 
removeApplication() could call FS.delete(Path_to_dir) and 
removeApplicationAttempt() would remove the attempt file under the app 
directory or return success if the app dir has already been deleted.
I chose to not use directories for FileSystem because one could put a key value 
store behind a FileSystem interface and I am not sure how directories would 
work in them. Also rmdir is atomic on HDFS but may not be atomic on every file 
system.
For HDFS one could certainly write a directory based file structure for store 
in which apps would have their own directories. But IMO the best implementation 
might be a transaction log type implementation similar to what HBase uses I 
think. It might also have better HA characteristics because HDFS guarantees 
single writer to a file. That however requires considerable investment of time.
It is really hard to guarantee atomicity of removal when we dont know how the 
file system is implemented. We could use log structure implementations or for 
HDFS we could use atomic rmdir. Also, lets look at the following scenario. We 
cannot removeApplication() until we know that the AM has exited and the job is 
really done. Just after the RM knows that the job is done, the RM could die 
before updating state. So upon restart we can never guarantee that a completed 
application was recorded as completed. This is one reason why I chose not to 
make the state machine wait for removeApplication() to complete.
One improvement would be to update the store with an attempts final state 
(failed/killed/succeeded) and wait for it to be recorded before completing the 
state machine. This would allow us to not count killed as failed and also 
complete the application state machine if the last attempt had succeeded. This 
would implement the preferable solution in your second point above. This would 
still be an optimization since the RM could fail before storing the attempt 
state (like above) and we are back to square one. I would like to make this 
change after YARN-218 is done so that all related changes can be made together.
I consciously chose to not provide defaults for the store because I think its 
important that users understand and think about it when they enable a store. 
And changing the config helps trigger important questions like which store 
works for me, what permissions are needed etc. I you still feel strongly about 
it then I could add defaults like you suggest. Could you please help by 
providing a good system path. I am not quite familiar with typical rules used 
to determine them.
I have explained the temporary choice of Exception on YARN-231.
I will address the remaining comments in the next patch.

Thanks for all the feedback. This is a good discussion. I am sure that there 
are improvements to be made. Unless there are big issues with the current state 
of the work it would be great if we can commit it and address improvements in 
subsequent sub-tasks. This would help keep the changes smaller and easier to 
manage. The current code refactors and places basic interface/infrastructure in 
place. What do you think?

                
> Make changes for RM restart phase 1
> -----------------------------------
>
>                 Key: YARN-230
>                 URL: https://issues.apache.org/jira/browse/YARN-230
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: PB-impl.patch, Recovery.patch, Store.patch, Test.patch, 
> YARN-230.1.patch
>
>
> As described in YARN-128, phase 1 of RM restart puts in place mechanisms to 
> save application state and read them back after restart. Upon restart, the 
> NM's are asked to reboot and the previously running AM's are restarted.
> After this is done, RM HA and work preserving restart can continue in 
> parallel. For more details please refer to the design document in YARN-128

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-230) Make changes for RM restart phase 1

Reply via email to