[jira] [Commented] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery

2013-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13800137#comment-13800137
 ] 

Hudson commented on YARN-1185:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1584 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1584/])
YARN-1185. Fixed FileSystemRMStateStore to not leave partial files that prevent 
subsequent ResourceManager recovery. Contributed by Omkar Vinit Joshi. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1533803)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java


> FileSystemRMStateStore can leave partial files that prevent subsequent 
> recovery
> ---
>
> Key: YARN-1185
> URL: https://issues.apache.org/jira/browse/YARN-1185
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Omkar Vinit Joshi
> Fix For: 2.3.0
>
> Attachments: YARN-1185.1.patch, YARN-1185.2.patch, YARN-1185.3.patch
>
>
> FileSystemRMStateStore writes directly to the destination file when storing 
> state. However if the RM were to crash in the middle of the write, the 
> recovery method could encounter a partially-written file and either outright 
> crash during recovery or silently load incomplete state.
> To avoid this, the data should be written to a temporary file and renamed to 
> the destination file afterwards.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery

2013-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13800126#comment-13800126
 ] 

Hudson commented on YARN-1185:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1558 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1558/])
YARN-1185. Fixed FileSystemRMStateStore to not leave partial files that prevent 
subsequent ResourceManager recovery. Contributed by Omkar Vinit Joshi. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1533803)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java


> FileSystemRMStateStore can leave partial files that prevent subsequent 
> recovery
> ---
>
> Key: YARN-1185
> URL: https://issues.apache.org/jira/browse/YARN-1185
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Omkar Vinit Joshi
> Fix For: 2.3.0
>
> Attachments: YARN-1185.1.patch, YARN-1185.2.patch, YARN-1185.3.patch
>
>
> FileSystemRMStateStore writes directly to the destination file when storing 
> state. However if the RM were to crash in the middle of the write, the 
> recovery method could encounter a partially-written file and either outright 
> crash during recovery or silently load incomplete state.
> To avoid this, the data should be written to a temporary file and renamed to 
> the destination file afterwards.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery

2013-10-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13800121#comment-13800121
 ] 

Hudson commented on YARN-1185:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #368 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/368/])
YARN-1185. Fixed FileSystemRMStateStore to not leave partial files that prevent 
subsequent ResourceManager recovery. Contributed by Omkar Vinit Joshi. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1533803)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java


> FileSystemRMStateStore can leave partial files that prevent subsequent 
> recovery
> ---
>
> Key: YARN-1185
> URL: https://issues.apache.org/jira/browse/YARN-1185
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Omkar Vinit Joshi
> Fix For: 2.3.0
>
> Attachments: YARN-1185.1.patch, YARN-1185.2.patch, YARN-1185.3.patch
>
>
> FileSystemRMStateStore writes directly to the destination file when storing 
> state. However if the RM were to crash in the middle of the write, the 
> recovery method could encounter a partially-written file and either outright 
> crash during recovery or silently load incomplete state.
> To avoid this, the data should be written to a temporary file and renamed to 
> the destination file afterwards.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery

2013-10-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799966#comment-13799966
 ] 

Hudson commented on YARN-1185:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4633 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4633/])
YARN-1185. Fixed FileSystemRMStateStore to not leave partial files that prevent 
subsequent ResourceManager recovery. Contributed by Omkar Vinit Joshi. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1533803)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java


> FileSystemRMStateStore can leave partial files that prevent subsequent 
> recovery
> ---
>
> Key: YARN-1185
> URL: https://issues.apache.org/jira/browse/YARN-1185
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1185.1.patch, YARN-1185.2.patch, YARN-1185.3.patch
>
>
> FileSystemRMStateStore writes directly to the destination file when storing 
> state. However if the RM were to crash in the middle of the write, the 
> recovery method could encounter a partially-written file and either outright 
> crash during recovery or silently load incomplete state.
> To avoid this, the data should be written to a temporary file and renamed to 
> the destination file afterwards.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery

2013-10-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799721#comment-13799721
 ] 

Hadoop QA commented on YARN-1185:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609245/YARN-1185.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2226//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2226//console

This message is automatically generated.

> FileSystemRMStateStore can leave partial files that prevent subsequent 
> recovery
> ---
>
> Key: YARN-1185
> URL: https://issues.apache.org/jira/browse/YARN-1185
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1185.1.patch, YARN-1185.2.patch, YARN-1185.3.patch
>
>
> FileSystemRMStateStore writes directly to the destination file when storing 
> state. However if the RM were to crash in the middle of the write, the 
> recovery method could encounter a partially-written file and either outright 
> crash during recovery or silently load incomplete state.
> To avoid this, the data should be written to a temporary file and renamed to 
> the destination file afterwards.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery

2013-10-18 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799707#comment-13799707
 ] 

Vinod Kumar Vavilapalli commented on YARN-1185:
---

Patch looks good to me. Can you address the test-issue?

> FileSystemRMStateStore can leave partial files that prevent subsequent 
> recovery
> ---
>
> Key: YARN-1185
> URL: https://issues.apache.org/jira/browse/YARN-1185
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1185.1.patch, YARN-1185.2.patch
>
>
> FileSystemRMStateStore writes directly to the destination file when storing 
> state. However if the RM were to crash in the middle of the write, the 
> recovery method could encounter a partially-written file and either outright 
> crash during recovery or silently load incomplete state.
> To avoid this, the data should be written to a temporary file and renamed to 
> the destination file afterwards.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery

2013-10-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798727#comment-13798727
 ] 

Hadoop QA commented on YARN-1185:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609080/YARN-1185.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestRMStateStore

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2216//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2216//console

This message is automatically generated.

> FileSystemRMStateStore can leave partial files that prevent subsequent 
> recovery
> ---
>
> Key: YARN-1185
> URL: https://issues.apache.org/jira/browse/YARN-1185
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1185.1.patch, YARN-1185.2.patch
>
>
> FileSystemRMStateStore writes directly to the destination file when storing 
> state. However if the RM were to crash in the middle of the write, the 
> recovery method could encounter a partially-written file and either outright 
> crash during recovery or silently load incomplete state.
> To avoid this, the data should be written to a temporary file and renamed to 
> the destination file afterwards.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery

2013-10-17 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798711#comment-13798711
 ] 

Omkar Vinit Joshi commented on YARN-1185:
-

Thanks [~vinodkv] and [~jianhe].

bq. Can you please rip apart TestRMStateStore into two tests (files) - 
TestFileSystemRMStateStore and TestZKRMStateStore but use common code?
done.
bq. Also, to indicate corruption, instead of .tmp file, we can try to a 
state-store write with a partial record and try to recover from that.
I am already doing this.
bq. The test case may also better to assert in the end that the corrupted 
application/attempt is not loaded back in RMState and doesn't exist in 
FileSystem
Done.

Attaching a new patch.

> FileSystemRMStateStore can leave partial files that prevent subsequent 
> recovery
> ---
>
> Key: YARN-1185
> URL: https://issues.apache.org/jira/browse/YARN-1185
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1185.1.patch, YARN-1185.2.patch
>
>
> FileSystemRMStateStore writes directly to the destination file when storing 
> state. However if the RM were to crash in the middle of the write, the 
> recovery method could encounter a partially-written file and either outright 
> crash during recovery or silently load incomplete state.
> To avoid this, the data should be written to a temporary file and renamed to 
> the destination file afterwards.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery

2013-10-16 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797558#comment-13797558
 ] 

Jian He commented on YARN-1185:
---

The test case may also better to assert in the end that the corrupted 
application/attempt is not loaded back in RMState and doesn't exist in 
FileSystem

> FileSystemRMStateStore can leave partial files that prevent subsequent 
> recovery
> ---
>
> Key: YARN-1185
> URL: https://issues.apache.org/jira/browse/YARN-1185
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1185.1.patch
>
>
> FileSystemRMStateStore writes directly to the destination file when storing 
> state. However if the RM were to crash in the middle of the write, the 
> recovery method could encounter a partially-written file and either outright 
> crash during recovery or silently load incomplete state.
> To avoid this, the data should be written to a temporary file and renamed to 
> the destination file afterwards.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery

2013-10-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795511#comment-13795511
 ] 

Hadoop QA commented on YARN-1185:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12608545/YARN-1185.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2178//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2178//console

This message is automatically generated.

> FileSystemRMStateStore can leave partial files that prevent subsequent 
> recovery
> ---
>
> Key: YARN-1185
> URL: https://issues.apache.org/jira/browse/YARN-1185
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1185.1.patch
>
>
> FileSystemRMStateStore writes directly to the destination file when storing 
> state. However if the RM were to crash in the middle of the write, the 
> recovery method could encounter a partially-written file and either outright 
> crash during recovery or silently load incomplete state.
> To avoid this, the data should be written to a temporary file and renamed to 
> the destination file afterwards.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery

2013-10-15 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795461#comment-13795461
 ] 

Omkar Vinit Joshi commented on YARN-1185:
-

I think it would be fair to assume that rename operation is atomic in nature 
and we can split the existing writeFile operation into two calls
* First write the data to .tmp file
* rename it to actual file.

Similarly when we are loading the state if we encounter any file with ".tmp" 
extension then we will discard it. Attaching the patch which does the same 
thing. Let me know your thoughts.

> FileSystemRMStateStore can leave partial files that prevent subsequent 
> recovery
> ---
>
> Key: YARN-1185
> URL: https://issues.apache.org/jira/browse/YARN-1185
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>Assignee: Omkar Vinit Joshi
> Attachments: YARN-1185.1.patch
>
>
> FileSystemRMStateStore writes directly to the destination file when storing 
> state. However if the RM were to crash in the middle of the write, the 
> recovery method could encounter a partially-written file and either outright 
> crash during recovery or silently load incomplete state.
> To avoid this, the data should be written to a temporary file and renamed to 
> the destination file afterwards.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery

2013-09-30 Thread Arpit Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782363#comment-13782363
 ] 

Arpit Gupta commented on YARN-1185:
---

Here is the stack trace from the RM when it tries to recover partially written 
data

{code}
2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:parseQueue(408)) - Initialized queue: default: 
capacity=1.0, absoluteCapacity=1.0, usedResources=usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0
2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:parseQueue(408)) - Initialized queue: root: 
numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, usedResources=usedCapacity=0.0, numApps=0, numContainers=0
2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:initializeQueues(306)) - Initialized root queue root: 
numChildQueue= 1, capacity=1.0, absoluteCapacity=1.0, usedResources=usedCapacity=0.0, numApps=0, numContainers=0
2013-09-30 09:12:09,206 INFO  capacity.CapacityScheduler 
(CapacityScheduler.java:reinitialize(270)) - Initialized CapacityScheduler with 
calculator=class 
org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
minimumAllocation=<>, maximumAllocation=<>
2013-09-30 09:12:09,240 INFO  event.AsyncDispatcher 
(AsyncDispatcher.java:register(157)) - Registering class 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManagerEventType for class 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager
2013-09-30 09:12:09,250 INFO  event.AsyncDispatcher 
(AsyncDispatcher.java:register(157)) - Registering class 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncherEventType 
for class 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher
2013-09-30 09:12:09,252 INFO  resourcemanager.RMNMInfo 
(RMNMInfo.java:(63)) - Registered RMNMInfo MBean
2013-09-30 09:12:09,253 INFO  util.HostsFileReader 
(HostsFileReader.java:refresh(84)) - Refreshing hosts (include/exclude) list
2013-09-30 09:12:09,278 INFO  security.UserGroupInformation 
(UserGroupInformation.java:loginUserFromKeytab(843)) - Login successful for 
user rm/hostname@realm using keytab file /etc/security/keytabs/rm.service.keytab
2013-09-30 09:12:09,278 INFO  security.RMContainerTokenSecretManager 
(RMContainerTokenSecretManager.java:rollMasterKey(103)) - Rolling master-key 
for container-tokens
2013-09-30 09:12:09,279 INFO  security.AMRMTokenSecretManager 
(AMRMTokenSecretManager.java:rollMasterKey(107)) - Rolling master-key for 
amrm-tokens
2013-09-30 09:12:09,281 INFO  security.NMTokenSecretManagerInRM 
(NMTokenSecretManagerInRM.java:rollMasterKey(97)) - Rolling master-key for 
nm-tokens
2013-09-30 09:12:10,196 INFO  recovery.FileSystemRMStateStore 
(FileSystemRMStateStore.java:loadRMAppState(131)) - Loading application from 
node: application_1380531989689_0002
2013-09-30 09:12:10,217 INFO  recovery.FileSystemRMStateStore 
(FileSystemRMStateStore.java:loadRMAppState(131)) - Loading application from 
node: application_1380531989689_0003
2013-09-30 09:12:10,232 INFO  security.RMDelegationTokenSecretManager 
(RMDelegationTokenSecretManager.java:recover(181)) - recovering 
RMDelegationTokenSecretManager.
2013-09-30 09:12:10,234 INFO  resourcemanager.RMAppManager 
(RMAppManager.java:recover(329)) - Recovering 2 applications
2013-09-30 09:12:10,234 ERROR resourcemanager.ResourceManager 
(ResourceManager.java:serviceStart(640)) - Failed to load/recover state
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:332)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:842)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:636)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:855)
2013-09-30 09:12:10,236 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - 
Exiting with status 1
2013-09-30 09:17:20,144 INFO  resourcemanager.ResourceManager 
(StringUtils.java:startupShutdownMessage(601)) - STARTUP_MSG:
{code}

> FileSystemRMStateStore can leave partial files that prevent subsequent 
> recovery
> ---
>
> Key: YARN-1185
> URL: https://issues.apache.org/jira/browse/YARN-1185
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Jason Lowe
>
> FileSystemRMStateStore writes directly to the destination file when storing 
> state. However if the RM were to crash in the middle of the write, the 
> recovery method could encounter a partial