[jira] [Updated] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing

2016-01-22 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-4497:
---
Attachment: YARN-4497.04.patch

> RM might fail to restart when recovering apps whose attempts are missing
> 
>
> Key: YARN-4497
> URL: https://issues.apache.org/jira/browse/YARN-4497
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
>Priority: Critical
> Attachments: YARN-4497.01.patch, YARN-4497.02.patch, 
> YARN-4497.03.patch, YARN-4497.04.patch
>
>
> Find following problem when discussing in YARN-3480.
> If RM fails to store some attempts in RMStateStore, there will be missing 
> attempts in RMStateStore, for the case storing attempt1, attempt2 and 
> attempt3, RM successfully stored attempt1 and attempt3, but failed to store 
> attempt2. When RM restarts, in *RMAppImpl#recover*, we recover attempts one 
> by one, for this case, we will recover attmept1, then attempt2. When 
> recovering attempt2, we call  
> *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will first find 
> its ApplicationAttemptStateData, but it could not find it, an error will come 
> at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing

2016-01-21 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-4497:
---
Attachment: YARN-4497.03.patch

> RM might fail to restart when recovering apps whose attempts are missing
> 
>
> Key: YARN-4497
> URL: https://issues.apache.org/jira/browse/YARN-4497
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
>Priority: Critical
> Attachments: YARN-4497.01.patch, YARN-4497.02.patch, 
> YARN-4497.03.patch
>
>
> Find following problem when discussing in YARN-3480.
> If RM fails to store some attempts in RMStateStore, there will be missing 
> attempts in RMStateStore, for the case storing attempt1, attempt2 and 
> attempt3, RM successfully stored attempt1 and attempt3, but failed to store 
> attempt2. When RM restarts, in *RMAppImpl#recover*, we recover attempts one 
> by one, for this case, we will recover attmept1, then attempt2. When 
> recovering attempt2, we call  
> *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will first find 
> its ApplicationAttemptStateData, but it could not find it, an error will come 
> at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing

2016-01-13 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-4497:
---
Attachment: YARN-4497.02.patch

> RM might fail to restart when recovering apps whose attempts are missing
> 
>
> Key: YARN-4497
> URL: https://issues.apache.org/jira/browse/YARN-4497
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
>Priority: Critical
> Attachments: YARN-4497.01.patch, YARN-4497.02.patch
>
>
> Find following problem when discussing in YARN-3480.
> If RM fails to store some attempts in RMStateStore, there will be missing 
> attempts in RMStateStore, for the case storing attempt1, attempt2 and 
> attempt3, RM successfully stored attempt1 and attempt3, but failed to store 
> attempt2. When RM restarts, in *RMAppImpl#recover*, we recover attempts one 
> by one, for this case, we will recover attmept1, then attempt2. When 
> recovering attempt2, we call  
> *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will first find 
> its ApplicationAttemptStateData, but it could not find it, an error will come 
> at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing

2016-01-12 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4497:
---
Priority: Critical  (was: Major)

> RM might fail to restart when recovering apps whose attempts are missing
> 
>
> Key: YARN-4497
> URL: https://issues.apache.org/jira/browse/YARN-4497
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
>Priority: Critical
> Attachments: YARN-4497.01.patch
>
>
> Find following problem when discussing in YARN-3480.
> If RM fails to store some attempts in RMStateStore, there will be missing 
> attempts in RMStateStore, for the case storing attempt1, attempt2 and 
> attempt3, RM successfully stored attempt1 and attempt3, but failed to store 
> attempt2. When RM restarts, in *RMAppImpl#recover*, we recover attempts one 
> by one, for this case, we will recover attmept1, then attempt2. When 
> recovering attempt2, we call  
> *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will first find 
> its ApplicationAttemptStateData, but it could not find it, an error will come 
> at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing

2015-12-30 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-4497:
---
Attachment: YARN-4497.01.patch

> RM might fail to restart when recovering apps whose attempts are missing
> 
>
> Key: YARN-4497
> URL: https://issues.apache.org/jira/browse/YARN-4497
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-4497.01.patch
>
>
> Find following problem when discussing in YARN-3480.
> If RM fails to store some attempts in RMStateStore, there will be missing 
> attempts in RMStateStore, for the case storing attempt1, attempt2 and 
> attempt3, RM successfully stored attempt1 and attempt3, but failed to store 
> attempt2. When RM restarts, in *RMAppImpl#recover*, we recover attempts one 
> by one, for this case, we will recover attmept1, then attempt2. When 
> recovering attempt2, we call  
> *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will first find 
> its ApplicationAttemptStateData, but it could not find it, an error will come 
> at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing

2015-12-22 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-4497:
---
Description: 
Find following problem when discussing in YARN-3480.

If RM fails to store some attempts in RMStateStore, there will be missing 
attempts in RMStateStore, for the case storing attempt1, attempt2 and attempt3, 
RM successfully stored attempt1 and attempt3, but failed to store attempt2. 
When RM restarts, in *RMAppImpl#recover*, we recover attempts one by one, for 
this case, we will recover attmept1, then attempt2. When recovering attempt2, 
we call  *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will 
first find its ApplicationAttemptStateData, but it could not find it, an error 
will come at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 
880).


  was:
Find following problem when discussing in YARN-3480.

If RM fails to store some attempts in RMStateStore, there will be missing 
attempts in RMStateStore, for the case storing attempt1, attempt2 and attempt3, 
RM successfully stored attempt1 and attempt3, but failed to store attempt2. 
When RM restarts, in *RMAppImpl#recover*, we recover attempts one by one, for 
this case, we will recover attmept1, then attempt2. When recovering attempt2, 
we call  *((RMAppAttemptImpl)this.currentAttempt).recover(state)*,
 it will first find its ApplicationAttemptStateData, but it could not find it, 
an error will come at *assert attemptState != null*(*RMAppAttemptImpl#recover*, 
line 880).



> RM might fail to restart when recovering apps whose attempts are missing
> 
>
> Key: YARN-4497
> URL: https://issues.apache.org/jira/browse/YARN-4497
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
>
> Find following problem when discussing in YARN-3480.
> If RM fails to store some attempts in RMStateStore, there will be missing 
> attempts in RMStateStore, for the case storing attempt1, attempt2 and 
> attempt3, RM successfully stored attempt1 and attempt3, but failed to store 
> attempt2. When RM restarts, in *RMAppImpl#recover*, we recover attempts one 
> by one, for this case, we will recover attmept1, then attempt2. When 
> recovering attempt2, we call  
> *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will first find 
> its ApplicationAttemptStateData, but it could not find it, an error will come 
> at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)