[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910788#comment-13910788 ]
Bikas Saha commented on YARN-1410: ---------------------------------- Yes. I would like to understand why we are proposing a custom solution that only works for application submission instead of laying down a common pattern (using Retry Cache) that can be subsequently used in a uniform manner for all other remaining non-idempotent operations. Given then HDFS already uses that layer, it would be good to depend on a common framework that has already been debugged and proven to work on HDFS. Given that YARN and HDFS will be commonly deployed together, sharing these basic pieces will go a long way in making it easier to build/deploy and operate. Given so many pros for this approach why should we not invest in adopting it? > Handle client failover during 2 step client API's like app submission > --------------------------------------------------------------------- > > Key: YARN-1410 > URL: https://issues.apache.org/jira/browse/YARN-1410 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Bikas Saha > Assignee: Xuan Gong > Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, > YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, > YARN-1410.5.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > App submission involves > 1) creating appId > 2) using that appId to submit an ApplicationSubmissionContext to the user. > The client may have obtained an appId from an RM, the RM may have failed > over, and the client may submit the app to the new RM. > Since the new RM has a different notion of cluster timestamp (used to create > app id) the new RM may reject the app submission resulting in unexpected > failure on the client side. > The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)