[jira] [Commented] (YARN-1185) FileSystemRMStateStore doesn't use temporary files when writing data

2013-09-12 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13765649#comment-13765649
 ] 

Bikas Saha commented on YARN-1185:
--

Yes. Since FileSystem interface does not provide any atomic operations.
The RM will not start if there is anything wrong with the stored state. So it 
some write is partial/empty is will not start. At that point we can judge if 
the missing piece is important or not and purge that piece and continue. This 
should be ok for job related data since we only lose a job. However, for global 
data like secret keys we may have to be more careful. In one case we encode the 
info in the file name. In other cases, where the data cannot be encoded in the 
file name, we may have to ensure that the store operation is not partial/empty. 
For HDFS we may assume atomic rename but will that be true for all filesystems?

So we could do the following. 
Storing app data may continue to be optimistic and since thats the main 
workload we continue to do what we do today.
Storing global data (mainly the security stuff) can change to be more atomic.

We can make all store operations more atomic if we feel that we will not slow 
down the RM because of multiple roundtrips to the store.


 FileSystemRMStateStore doesn't use temporary files when writing data
 

 Key: YARN-1185
 URL: https://issues.apache.org/jira/browse/YARN-1185
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe

 FileSystemRMStateStore writes directly to the destination file when storing 
 state. However if the RM were to crash in the middle of the write, the 
 recovery method could encounter a partially-written file and either outright 
 crash during recovery or silently load incomplete state.
 To avoid this, the data should be written to a temporary file and renamed to 
 the destination file afterwards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1185) FileSystemRMStateStore doesn't use temporary files when writing data

2013-09-12 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13765622#comment-13765622
 ] 

Jason Lowe commented on YARN-1185:
--

Also, couldn't it be left with zero-length files if it dies after create but 
before write can occur?

 FileSystemRMStateStore doesn't use temporary files when writing data
 

 Key: YARN-1185
 URL: https://issues.apache.org/jira/browse/YARN-1185
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe

 FileSystemRMStateStore writes directly to the destination file when storing 
 state. However if the RM were to crash in the middle of the write, the 
 recovery method could encounter a partially-written file and either outright 
 crash during recovery or silently load incomplete state.
 To avoid this, the data should be written to a temporary file and renamed to 
 the destination file afterwards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1185) FileSystemRMStateStore doesn't use temporary files when writing data

2013-09-12 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13765620#comment-13765620
 ] 

Jason Lowe commented on YARN-1185:
--

Ah I see.  That's an assumption based on a specific FileSystem implementation 
-- does it only work with HDFS?

 FileSystemRMStateStore doesn't use temporary files when writing data
 

 Key: YARN-1185
 URL: https://issues.apache.org/jira/browse/YARN-1185
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe

 FileSystemRMStateStore writes directly to the destination file when storing 
 state. However if the RM were to crash in the middle of the write, the 
 recovery method could encounter a partially-written file and either outright 
 crash during recovery or silently load incomplete state.
 To avoid this, the data should be written to a temporary file and renamed to 
 the destination file afterwards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1185) FileSystemRMStateStore doesn't use temporary files when writing data

2013-09-12 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13765607#comment-13765607
 ] 

Bikas Saha commented on YARN-1185:
--

The expectation was that the data being written to each file is so small that 
it gets sent over completely in the first transfer. So partial writes would not 
occur. Thats why the code accumulates the write buffer locally and then issues 
a single write to the FileSystem.

 FileSystemRMStateStore doesn't use temporary files when writing data
 

 Key: YARN-1185
 URL: https://issues.apache.org/jira/browse/YARN-1185
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe

 FileSystemRMStateStore writes directly to the destination file when storing 
 state. However if the RM were to crash in the middle of the write, the 
 recovery method could encounter a partially-written file and either outright 
 crash during recovery or silently load incomplete state.
 To avoid this, the data should be written to a temporary file and renamed to 
 the destination file afterwards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira