date:20130715


[ 
https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708266#comment-13708266
 ] 

Bikas Saha commented on YARN-521:
-

Same error and it does not look like a time out. Assertion fails.

 Augment AM - RM client module to be able to request containers only at 
 specific locations
 -

 Key: YARN-521
 URL: https://issues.apache.org/jira/browse/YARN-521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, 
 YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521.patch


 When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to 
 offer an easy way to access their functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM


[ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708267#comment-13708267
 ] 

Bikas Saha commented on YARN-513:
-

Why does YARNClient need to accept the rmaddress as a parameter? The whole 
point of this jira is to not require anyone to supply the rm connection 
parameters. The RM proxy is supposed to figure it out by itself from conf.

 Create common proxy client for communicating with RM
 

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, 
 YARN-513.13.patch, YARN-513.14.patch, YARN-513.1.patch, YARN-513.2.patch, 
 YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, 
 YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch


 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations


[ 
https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708278#comment-13708278
 ] 

Sandy Ryza commented on YARN-521:
-

Yeah, I noticed.  I'm pretty puzzled because I still can't reproduce it 
locally, but I'll look further into it tomorrow.

 Augment AM - RM client module to be able to request containers only at 
 specific locations
 -

 Key: YARN-521
 URL: https://issues.apache.org/jira/browse/YARN-521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, 
 YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521.patch


 When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to 
 offer an easy way to access their functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-917) Job can fail when RM restarts after staging dir is cleaned but before MR successfully unregister with RM

2013-07-15 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708537#comment-13708537
 ] 

Jason Lowe commented on YARN-917:
-

I think one way to solve this is to move the removal of the staging directory 
to *after* we unregister from the RM.  Now that there's a FINISHING state that 
gives the app a grace period to finish cleanly, we leverage this to remove the 
staging directory after unregistering.  This should solve some other races 
related to removal of the staging directory and unregistering (e.g.: AM crashes 
after removing staging directory but before unregistering).

 Job can fail when RM restarts after staging dir is cleaned but before MR 
 successfully unregister with RM
 

 Key: YARN-917
 URL: https://issues.apache.org/jira/browse/YARN-917
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-909) Disable TestLinuxContainerExecutorWithMocks on Windows

2013-07-15 Thread Chuan Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu updated YARN-909:
---

Attachment: YARN-909.2.patch

Thanks for the suggestion, Chris! Attaching a new patch.

 Disable TestLinuxContainerExecutorWithMocks on Windows
 --

 Key: YARN-909
 URL: https://issues.apache.org/jira/browse/YARN-909
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Attachments: YARN-909.2.patch, YARN-909.patch


 This unit test tests a Linux specific feature. We should skip this unit test 
 on Windows. A similar unit test 'TestLinuxContainerExecutor' was already 
 skipped when running on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-909) Disable TestLinuxContainerExecutorWithMocks on Windows

2013-07-15 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-909:
---

Attachment: YARN-909.3.patch

Actually, I was thinking we could put the {{assumeTrue}} call inside the 
{{Before}} method.  This way, it's a very small patch.  It also helps ease 
maintenance, because when people add new tests to this suite, they won't need 
to remember to call {{assumeTrue}}.

I apologize if I'm not communicating this clearly.  I'm attaching a patch that 
shows it.  Sometimes code is easier.  :-)

Does this look good to you?

 Disable TestLinuxContainerExecutorWithMocks on Windows
 --

 Key: YARN-909
 URL: https://issues.apache.org/jira/browse/YARN-909
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Attachments: YARN-909.2.patch, YARN-909.3.patch, YARN-909.patch


 This unit test tests a Linux specific feature. We should skip this unit test 
 on Windows. A similar unit test 'TestLinuxContainerExecutor' was already 
 skipped when running on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-149) ResourceManager (RM) High-Availability (HA)

[
https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708636#comment-13708636
]

Bikas Saha commented on YARN-149:
-

Thanks. I looked at the draft. Will incorporate stuff from it or use it as the
base directly. In general, its slightly mixing fail-over with HA. The way RM
restart has been envisioned, with a good implementation, downtime due to
restart should not visible to users even with what is termed as a cold
restart. Finally, I differ on the wrapper implementation because of 1) extra
daemon to manage because in fail-over scenarios each extra actor increases the
combinatorics 2) the wrapper functionality seems to overlap the ZKFC and RM 3)
RM will need to be changed to interact with the wrapper and the changes IMO
should not be much different than those needed for direct ZKFC interaction 4)
we will not similar to HDFS patterns and that makes the system harder to
maintain and manage. In fact, what is being called as a wrapper is something
that probably does wrap around core RM functionality but remains inside the RM.
From what I see, it will be an impl of the HAProtocol interface around the core
RM startup functionality.

ResourceManager (RM) High-Availability (HA)
---

Key: YARN-149
URL: https://issues.apache.org/jira/browse/YARN-149
Project: Hadoop YARN
Issue Type: New Feature
Reporter: Harsh J
Assignee: Bikas Saha
Attachments: rm-ha-phase1-approach-draft1.pdf, rm-ha-phase1-draft2.pdf

This jira tracks work needed to be done to support one RM instance failing
over to another RM instance so that we can have RM HA. Work includes leader
election, transfer of control to leader and client re-direction to new leader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-909) Disable TestLinuxContainerExecutorWithMocks on Windows

2013-07-15 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-909:
---

  Component/s: nodemanager
 Target Version/s: 3.0.0, 2.1.0-beta
Affects Version/s: 2.1.0-beta
   3.0.0
 Hadoop Flags: Reviewed

Thanks, Chuan!  I'll commit this after Jenkins gives +1 on the latest patch.

 Disable TestLinuxContainerExecutorWithMocks on Windows
 --

 Key: YARN-909
 URL: https://issues.apache.org/jira/browse/YARN-909
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Attachments: YARN-909.2.patch, YARN-909.3.patch, YARN-909.patch


 This unit test tests a Linux specific feature. We should skip this unit test 
 on Windows. A similar unit test 'TestLinuxContainerExecutor' was already 
 skipped when running on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-909) Disable TestLinuxContainerExecutorWithMocks on Windows


[ 
https://issues.apache.org/jira/browse/YARN-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708646#comment-13708646
 ] 

Hadoop QA commented on YARN-909:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12592350/YARN-909.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1478//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1478//console

This message is automatically generated.

 Disable TestLinuxContainerExecutorWithMocks on Windows
 --

 Key: YARN-909
 URL: https://issues.apache.org/jira/browse/YARN-909
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Attachments: YARN-909.2.patch, YARN-909.3.patch, YARN-909.patch


 This unit test tests a Linux specific feature. We should skip this unit test 
 on Windows. A similar unit test 'TestLinuxContainerExecutor' was already 
 skipped when running on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations


[ 
https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708652#comment-13708652
 ] 

Sandy Ryza commented on YARN-521:
-

Looks like Eclipse wasn't failing on the non-JUnit asserts.  Uploading a new 
patch that fixes the issue.


 Augment AM - RM client module to be able to request containers only at 
 specific locations
 -

 Key: YARN-521
 URL: https://issues.apache.org/jira/browse/YARN-521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, 
 YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521.patch


 When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to 
 offer an easy way to access their functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations


[ 
https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708663#comment-13708663
 ] 

Bikas Saha commented on YARN-521:
-

Diff is empty for the last 2 patches. Can you please check? Also, probably 
needs rebase after YARN-654 that just went in.

 Augment AM - RM client module to be able to request containers only at 
 specific locations
 -

 Key: YARN-521
 URL: https://issues.apache.org/jira/browse/YARN-521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, 
 YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521.patch


 When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to 
 offer an easy way to access their functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-924) TestNMClient.testNMClientNoCleanupOnStop frequently failing due to timeout

Bikas Saha created YARN-924:
---

 Summary: TestNMClient.testNMClientNoCleanupOnStop frequently 
failing due to timeout
 Key: YARN-924
 URL: https://issues.apache.org/jira/browse/YARN-924
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Zhijie Shen


Error Message

test timed out after 18 milliseconds
Stacktrace

java.lang.Exception: test timed out after 18 milliseconds
at 
org.apache.maven.surefire.report.ConsoleOutputCapture$ForwardingPrintStream.println(ConsoleOutputCapture.java:87)
at java.lang.Throwable.printStackTrace(Throwable.java:464)
at java.lang.Throwable.printStackTrace(Throwable.java:451)
at 
org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:349)
at 
org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:317)
at 
org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClientNoCleanupOnStop(TestNMClient.java:182)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations


[ 
https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708676#comment-13708676
 ] 

Hadoop QA commented on YARN-521:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12592356/YARN-521-5.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1479//console

This message is automatically generated.

 Augment AM - RM client module to be able to request containers only at 
 specific locations
 -

 Key: YARN-521
 URL: https://issues.apache.org/jira/browse/YARN-521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, 
 YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521-5.patch, 
 YARN-521.patch


 When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to 
 offer an easy way to access their functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-654) AMRMClient: Perform sanity checks for parameters of public methods


[ 
https://issues.apache.org/jira/browse/YARN-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708680#comment-13708680
 ] 

Hudson commented on YARN-654:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #4082 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4082/])
YARN-654. AMRMClient: Perform sanity checks for parameters of public methods 
(Xuan Gong via bikas) (bikas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1503353)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java


 AMRMClient: Perform sanity checks for parameters of public methods
 --

 Key: YARN-654
 URL: https://issues.apache.org/jira/browse/YARN-654
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Xuan Gong
 Fix For: 2.1.0-beta

 Attachments: YARN-654.1.patch, YARN-654.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-661) NM fails to cleanup local directories for users


 [ 
https://issues.apache.org/jira/browse/YARN-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-661:
---

Attachment: YARN-661-20130715.1.patch

 NM fails to cleanup local directories for users
 ---

 Key: YARN-661
 URL: https://issues.apache.org/jira/browse/YARN-661
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta, 0.23.8
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Attachments: YARN-661-20130701.patch, YARN-661-20130708.patch, 
 YARN-661-20130710.1.patch, YARN-661-20130711.1.patch, 
 YARN-661-20130712.1.patch, YARN-661-20130715.1.patch


 YARN-71 added deletion of local directories on startup, but in practice it 
 fails to delete the directories because of permission problems.  The 
 top-level usercache directory is owned by the user but is in a directory that 
 is not writable by the user.  Therefore the deletion of the user's usercache 
 directory, as the user, fails due to lack of permissions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-909) Disable TestLinuxContainerExecutorWithMocks on Windows


[ 
https://issues.apache.org/jira/browse/YARN-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708689#comment-13708689
 ] 

Hudson commented on YARN-909:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #4083 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4083/])
YARN-909. Disable TestLinuxContainerExecutorWithMocks on Windows. Contributed 
by Chuan Liu. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1503357)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java


 Disable TestLinuxContainerExecutorWithMocks on Windows
 --

 Key: YARN-909
 URL: https://issues.apache.org/jira/browse/YARN-909
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Attachments: YARN-909.2.patch, YARN-909.3.patch, YARN-909.patch


 This unit test tests a Linux specific feature. We should skip this unit test 
 on Windows. A similar unit test 'TestLinuxContainerExecutor' was already 
 skipped when running on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM


[ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708696#comment-13708696
 ] 

Jian He commented on YARN-513:
--

bq. Why does YARNClient need to accept the rmaddress as a parameter? The whole 
point of this jira is to not require anyone to supply the rm connection 
parameters. The RM proxy is supposed to figure it out by itself from conf.
The original code is accepting rmaddress as a parameter, should we remove that?

 Create common proxy client for communicating with RM
 

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, 
 YARN-513.13.patch, YARN-513.14.patch, YARN-513.1.patch, YARN-513.2.patch, 
 YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, 
 YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch


 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations


 [ 
https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-521:


Attachment: YARN-521-6.patch

 Augment AM - RM client module to be able to request containers only at 
 specific locations
 -

 Key: YARN-521
 URL: https://issues.apache.org/jira/browse/YARN-521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, 
 YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521-5.patch, 
 YARN-521-6.patch, YARN-521.patch


 When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to 
 offer an easy way to access their functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations


[ 
https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708707#comment-13708707
 ] 

Sandy Ryza commented on YARN-521:
-

(The diff you were comparing was before I had uploaded the new patch.  Latest 
diff includes test fix and rebase.)

 Augment AM - RM client module to be able to request containers only at 
 specific locations
 -

 Key: YARN-521
 URL: https://issues.apache.org/jira/browse/YARN-521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, 
 YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521-5.patch, 
 YARN-521-6.patch, YARN-521.patch


 When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to 
 offer an easy way to access their functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-149) ResourceManager (RM) High-Availability (HA)

2013-07-15 Thread Karthik Kambatla (JIRA)

[
https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708732#comment-13708732
]

Karthik Kambatla commented on YARN-149:
---

Thanks Bikas.

bq. 1) extra daemon to manage because in fail-over scenarios each extra actor
increases the combinatorics
The wrapper is not an extra daemon. There will be a single daemon for the
wrapper/RM. In the cold standby case, the wrapper starts the RM instance when
it becomes active.

bq. 2) the wrapper functionality seems to overlap the ZKFC and RM
The wrapper *interacts* with the ZKFC and RM.

bq. 3) RM will need to be changed to interact with the wrapper and the changes
IMO should not be much different than those needed for direct ZKFC interaction
Mostly agree with you here.

I believe it boils down to the following: what state machine to incorporate the
HA logic into. The wrapper approach essentially proposes two state machines -
one for the core RM and one for the HA logic. Integrating the HA logic into the
current RM will be adding more states to the current RM. There are
(dis)advantages to both: the wrapper approach shouldn't affect non-HA
instances, and might help with earlier adoption by major YARN users like Yahoo!

bq. In fact, what is being called as a wrapper is something that probably does
wrap around core RM functionality but remains inside the RM. From what I see,
it will be an impl of the HAProtocol interface around the core RM startup
functionality.
Looks like a promising approach. Let me take a closer look at the code and
comment.

ResourceManager (RM) High-Availability (HA)
---

[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore


[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708734#comment-13708734
 ] 

Jian He commented on YARN-353:
--

Thanks for the review, Karthik
bq.YarnConfiguration: how about creating a common prefix for all of 
zk-state-store related parameters?
bq.Make the ZKRMStateStore#NUM_RETRIES configurable with default set to 3.
bq.ZKRMStateStore#getNewZooKeeper need not be synchronized
fixed
bq.Might be cleaner to move zkDoWithRetries to ZkAction
we can implement no-retry functionalities with ZkAction if separate 
zkDoWithRetries out of ZkAction. same reason for 6

New patch also added test case for ZKClient disconnect and reconnect logic.



 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
 YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-353) Add Zookeeper-based store implementation for RMStateStore


 [ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-353:
-

Attachment: YARN-353.6.patch

 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
 YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-661) NM fails to cleanup local directories for users


[ 
https://issues.apache.org/jira/browse/YARN-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708735#comment-13708735
 ] 

Hadoop QA commented on YARN-661:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12592359/YARN-661-20130715.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1480//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1480//console

This message is automatically generated.

 NM fails to cleanup local directories for users
 ---

 Key: YARN-661
 URL: https://issues.apache.org/jira/browse/YARN-661
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta, 0.23.8
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Attachments: YARN-661-20130701.patch, YARN-661-20130708.patch, 
 YARN-661-20130710.1.patch, YARN-661-20130711.1.patch, 
 YARN-661-20130712.1.patch, YARN-661-20130715.1.patch


 YARN-71 added deletion of local directories on startup, but in practice it 
 fails to delete the directories because of permission problems.  The 
 top-level usercache directory is owned by the user but is in a directory that 
 is not writable by the user.  Therefore the deletion of the user's usercache 
 directory, as the user, fails due to lack of permissions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations


[ 
https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708750#comment-13708750
 ] 

Hadoop QA commented on YARN-521:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12592362/YARN-521-6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1481//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1481//console

This message is automatically generated.

 Augment AM - RM client module to be able to request containers only at 
 specific locations
 -

 Key: YARN-521
 URL: https://issues.apache.org/jira/browse/YARN-521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, 
 YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521-5.patch, 
 YARN-521-6.patch, YARN-521.patch


 When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to 
 offer an easy way to access their functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-903) DistributedShell throwing Errors in logs after successfull completion


[ 
https://issues.apache.org/jira/browse/YARN-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708764#comment-13708764
 ] 

Omkar Vinit Joshi commented on YARN-903:


planning to fix this.. I am planning to remember completed containers (only id) 
at node manager for predefined time (10min). Does that time sounds reasonable 
or we should make it configurable? ... I don't really think adding a new 
configuration parameter will be a good idea but I am open for any different 
approach / adding conf .. thoughts? This will have a similar implementation 
like YARN-62 but only difference is that YARN-62 only tracks container for a 
time after it starts to avoid duplicate launch..where as this tries to avoid 
logging errors for valid stop attempts...

 DistributedShell throwing Errors in logs after successfull completion
 -

 Key: YARN-903
 URL: https://issues.apache.org/jira/browse/YARN-903
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.0.4-alpha
 Environment: Ununtu 11.10
Reporter: Abhishek Kapoor
Assignee: Omkar Vinit Joshi
 Attachments: AppMaster.stderr, 
 yarn-sunny-nodemanager-sunny-Inspiron.log


 I have tried running DistributedShell and also used ApplicationMaster of the 
 same for my test.
 The application is successfully running through logging some errors which 
 would be useful to fix.
 Below are the logs from NodeManager and ApplicationMasterode
 Log Snippet for NodeManager
 =
 2013-07-07 13:39:18,787 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting 
 to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1
 2013-07-07 13:39:19,050 INFO 
 org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager:
  Rolling master-key for container-tokens, got key with id -325382586
 2013-07-07 13:39:19,052 INFO 
 org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: 
 Rolling master-key for nm-tokens, got key with id :1005046570
 2013-07-07 13:39:19,053 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered 
 with ResourceManager as sunny-Inspiron:9993 with total resource of 
 memory:10240, vCores:8
 2013-07-07 13:39:19,053 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying 
 ContainerManager to unblock new container-requests
 2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1373184544832_0001_01 (auth:SIMPLE)
 2013-07-07 13:39:35,492 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
  Start request for container_1373184544832_0001_01_01 by user sunny
 2013-07-07 13:39:35,507 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
  Creating a new application reference for app application_1373184544832_0001
 2013-07-07 13:39:35,511 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny  
 IP=127.0.0.1OPERATION=Start Container Request   
 TARGET=ContainerManageImpl  RESULT=SUCCESS  
 APPID=application_1373184544832_0001
 CONTAINERID=container_1373184544832_0001_01_01
 2013-07-07 13:39:35,511 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Application application_1373184544832_0001 transitioned from NEW to INITING
 2013-07-07 13:39:35,512 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Adding container_1373184544832_0001_01_01 to application 
 application_1373184544832_0001
 2013-07-07 13:39:35,518 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Application application_1373184544832_0001 transitioned from INITING to 
 RUNNING
 2013-07-07 13:39:35,528 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1373184544832_0001_01_01 transitioned from NEW to 
 LOCALIZING
 2013-07-07 13:39:35,540 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
  Resource hdfs://localhost:9000/application/test.jar transitioned from INIT 
 to DOWNLOADING
 2013-07-07 13:39:35,540 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1373184544832_0001_01_01
 2013-07-07 13:39:35,675 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Writing credentials to the nmPrivate file

[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.

[
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708761#comment-13708761
]

Zhijie Shen commented on YARN-744:
--

The passed in appAttemptId for an app currently seems to be the same object,
such that it can be used to for synchronized blocks, but I agree with the idea
of wrapper, because it is more predictable and stand-alone in
ApplicationMasterService.

BTW, is it convenient to write a test case for concurrent allocation? Like
TestClientRMService#testConcurrentAppSubmit.

Race condition in ApplicationMasterService.allocate .. It might process same
allocate request twice resulting in additional containers getting allocated.
-

Key: YARN-744
URL: https://issues.apache.org/jira/browse/YARN-744
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Reporter: Bikas Saha
Assignee: Omkar Vinit Joshi
Attachments: MAPREDUCE-3899-branch-0.23.patch,
YARN-744-20130711.1.patch, YARN-744.patch

Looks like the lock taken in this is broken. It takes a lock on lastResponse
object and then puts a new lastResponse object into the map. At this point a
new thread entering this function will get a new lastResponse object and will
be able to take its lock and enter the critical section. Presumably we want
to limit one response per app attempt. So the lock could be taken on the
ApplicationAttemptId key of the response map object.

[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore


[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708796#comment-13708796
 ] 

Hadoop QA commented on YARN-353:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12592363/YARN-353.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1482//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/1482//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1482//console

This message is automatically generated.

 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
 YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (YARN-924) TestNMClient.testNMClientNoCleanupOnStop frequently failing due to timeout


 [ 
https://issues.apache.org/jira/browse/YARN-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-924.
--

Resolution: Duplicate

It is duplicate with YARN-906.

 TestNMClient.testNMClientNoCleanupOnStop frequently failing due to timeout
 --

 Key: YARN-924
 URL: https://issues.apache.org/jira/browse/YARN-924
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Zhijie Shen

 Error Message
 test timed out after 18 milliseconds
 Stacktrace
 java.lang.Exception: test timed out after 18 milliseconds
   at 
 org.apache.maven.surefire.report.ConsoleOutputCapture$ForwardingPrintStream.println(ConsoleOutputCapture.java:87)
   at java.lang.Throwable.printStackTrace(Throwable.java:464)
   at java.lang.Throwable.printStackTrace(Throwable.java:451)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:349)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:317)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClientNoCleanupOnStop(TestNMClient.java:182)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-919) Setting default heap sizes in yarn env

2013-07-15 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708997#comment-13708997
 ] 

Mayank Bansal commented on YARN-919:


Thanks [~hitesh] for comments

The primary intention for this JIRA is to provide some handy way for users to 
set the parameters.

I think you are right bin/yarn already set the defaults which I can remove from 
this patch , however We need something like this in yarn-env.sh may be 
commented for now by that user doesnt have to dig around the documentation to 
increase the sizes for memory.

thoughts?

Thanks,
Mayank

 Setting default heap sizes in yarn env
 --

 Key: YARN-919
 URL: https://issues.apache.org/jira/browse/YARN-919
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Priority: Minor
 Attachments: YARN-919-trunk-1.patch


 Right now there are no defaults in yarn env scripts for resource manager nad 
 node manager and if user wants to override that, then user has to go to 
 documentation and find the variables and change the script.
 There is no straight forward way to change it in script. Just updating the 
 variables with defaults.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-353) Add Zookeeper-based store implementation for RMStateStore


 [ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-353:
-

Attachment: YARN-353.7.patch

 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
 YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-321) Generic application history service

2013-07-15 Thread Mayank Bansal (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709030#comment-13709030
]

Mayank Bansal commented on YARN-321:

Overall Looks good, However some points to consider

ResourceManager will push the data to HistoryStorage after an application
finishes in a separate thread.

Is it per application or only one thread in RM?

Isn't it be a good idea that as soon as application starts we send the
information to AHS and let AHS write all the data published by RM for that
application. In that case it would be very less overhead for RM.

What about in the cases where RM restart or crashes in those cases RM has to
republish all the running applications to AHS or just forget about the previous
running apps?

Right now its not clear what needs to be done for log aggregation?

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

The mapreduce job history server currently needs to be deployed as a trusted
server in sync with the mapreduce runtime. Every new application would need a
similar application history server. Having to deploy O(T*V) (where T is
number of type of application, V is number of version of application) trusted
servers is clearly not scalable.
Job history storage handling itself is pretty generic: move the logs and
history data into a particular directory for later serving. Job history data
is already stored as json (or binary avro). I propose that we create only one
trusted application history server, which can have a generic UI (display json
as a tree of strings) as well. Specific application/version can deploy
untrusted webapps (a la AMs) to query the application history server and
interpret the json for its specific UI and/or analytics.

[jira] [Created] (YARN-925) Interface of HistoryStorage for AHS

2013-07-15 Thread Mayank Bansal (JIRA)

Mayank Bansal created YARN-925:
--

 Summary: Interface of HistoryStorage for AHS
 Key: YARN-925
 URL: https://issues.apache.org/jira/browse/YARN-925
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore


[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709060#comment-13709060
 ] 

Hadoop QA commented on YARN-353:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12592413/YARN-353.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestRMStateStore

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1484//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/1484//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1484//console

This message is automatically generated.

 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
 YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-513) Create common proxy client for communicating with RM


 [ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-513:
-

Attachment: YARN-513.15.patch

 Create common proxy client for communicating with RM
 

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, 
 YARN-513.13.patch, YARN-513.14.patch, YARN-513.15.patch, YARN-513.1.patch, 
 YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, 
 YARN-513.6.patch, YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch


 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM


[ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709063#comment-13709063
 ] 

Jian He commented on YARN-513:
--

New patch removed rmaddress as a parameter from  YARNClient

 Create common proxy client for communicating with RM
 

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, 
 YARN-513.13.patch, YARN-513.14.patch, YARN-513.15.patch, YARN-513.1.patch, 
 YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, 
 YARN-513.6.patch, YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch


 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.

[
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709064#comment-13709064
]

Omkar Vinit Joshi commented on YARN-744:

bq. BTW, is it convenient to write a test case for concurrent allocation? Like
TestClientRMService#testConcurrentAppSubmit.
yeah wrote one...

bq. The passed in appAttemptId for an app currently seems to be the same
object, such that it can be used to for synchronized blocks, but I agree with
the idea of wrapper, because it is more predictable and stand-alone in
ApplicationMasterService.
locking on appAttemptId in case of allocate / RegisterApplicationMaster call
won't work. They are coming from client...can't guarantee that they are
identical in terms grabbing a lock.. thoughts?

Race condition in ApplicationMasterService.allocate .. It might process same
allocate request twice resulting in additional containers getting allocated.
-

[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager

2013-07-15 Thread Aaron T. Myers (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709067#comment-13709067
 ] 

Aaron T. Myers commented on YARN-914:
-

Should we perhaps do an s/NN/NM/g in the description of this JIRA? Or does this 
have something to do with the Name Node and I'm completely missing it?

 Support graceful decommission of nodemanager
 

 Key: YARN-914
 URL: https://issues.apache.org/jira/browse/YARN-914
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Luke Lu
Assignee: Junping Du

 When NNs are decommissioned for non-fault reasons (capacity change etc.), 
 it's desirable to minimize the impact to running applications.
 Currently if a NN is decommissioned, all running containers on the NN need to 
 be rescheduled on other NNs. Further more, for finished map tasks, if their 
 map output are not fetched by the reducers of the job, these map tasks will 
 need to be rerun as well.
 We propose to introduce a mechanism to optionally gracefully decommission a 
 node manager.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.


 [ 
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-744:
---

Attachment: YARN-744-20130715.1.patch

 Race condition in ApplicationMasterService.allocate .. It might process same 
 allocate request twice resulting in additional containers getting allocated.
 -

 Key: YARN-744
 URL: https://issues.apache.org/jira/browse/YARN-744
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Omkar Vinit Joshi
 Attachments: MAPREDUCE-3899-branch-0.23.patch, 
 YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch


 Looks like the lock taken in this is broken. It takes a lock on lastResponse 
 object and then puts a new lastResponse object into the map. At this point a 
 new thread entering this function will get a new lastResponse object and will 
 be able to take its lock and enter the critical section. Presumably we want 
 to limit one response per app attempt. So the lock could be taken on the 
 ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.

[
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709072#comment-13709072
]

Zhijie Shen commented on YARN-744:
--

bq. locking on appAttemptId in case of allocate / RegisterApplicationMaster
call won't work. They are coming from client...can't guarantee that they are
identical in terms grabbing a lock.. thoughts?

I meant that AMRMClient uses the same appAttemptId, but the uniqueness is not
guaranteed, so I agreed with the self-contained locker - wrapper.

Race condition in ApplicationMasterService.allocate .. It might process same
allocate request twice resulting in additional containers getting allocated.
-

[jira] [Commented] (YARN-919) Setting default heap sizes in yarn env

2013-07-15 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709088#comment-13709088
 ] 

Hitesh Shah commented on YARN-919:
--

[~mayank_bansal] Yes, that sounds good. 

Something like:

export 
YARN_RESOURCEMANAGER_HEAPSIZE=${YARN_RESOURCEMANAGER_HEAPSIZE:-default/recommended
 value}
export 
YARN_RESOURCEMANAGER_OPTS=${YARN_RESOURCEMANAGER_OPTS:-default/recommended 
settings} 

should be enough to account for docs. yarn-env.sh is expected to be overwritten 
by the user in any case as part of a deployment. 

 

 Setting default heap sizes in yarn env
 --

 Key: YARN-919
 URL: https://issues.apache.org/jira/browse/YARN-919
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Priority: Minor
 Attachments: YARN-919-trunk-1.patch


 Right now there are no defaults in yarn env scripts for resource manager nad 
 node manager and if user wants to override that, then user has to go to 
 documentation and find the variables and change the script.
 There is no straight forward way to change it in script. Just updating the 
 variables with defaults.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.

[
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709091#comment-13709091
]

Bikas Saha commented on YARN-744:
-

Why do we need a wrapper?
We should not be locking on the app attempt id. We should try to find some
internal RM object thats unique for the app attempt and lock on that. Also
avoid locking the RMAttempImpl object itself since it will block internal async
dispatcher.

Race condition in ApplicationMasterService.allocate .. It might process same
allocate request twice resulting in additional containers getting allocated.
-

[jira] [Updated] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.


 [ 
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-744:


Priority: Minor  (was: Major)

 Race condition in ApplicationMasterService.allocate .. It might process same 
 allocate request twice resulting in additional containers getting allocated.
 -

 Key: YARN-744
 URL: https://issues.apache.org/jira/browse/YARN-744
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Omkar Vinit Joshi
Priority: Minor
 Attachments: MAPREDUCE-3899-branch-0.23.patch, 
 YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch


 Looks like the lock taken in this is broken. It takes a lock on lastResponse 
 object and then puts a new lastResponse object into the map. At this point a 
 new thread entering this function will get a new lastResponse object and will 
 be able to take its lock and enter the critical section. Presumably we want 
 to limit one response per app attempt. So the lock could be taken on the 
 ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.

[
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709096#comment-13709096
]

Bikas Saha commented on YARN-744:
-

btw. it does not look like this is a practical problem. Until we start seeing a
few instances of this happening we should probably lower the priority of this
jira. I will do that now. Please change it if you think otherwise. A bug that
does not manifest itself is not a bug :P

Race condition in ApplicationMasterService.allocate .. It might process same
allocate request twice resulting in additional containers getting allocated.
-

[jira] [Commented] (YARN-321) Generic application history service

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709098#comment-13709098
]

Zhijie Shen commented on YARN-321:
--

bq. Is it per application or only one thread in RM?

I think it should be one thread in RM.

bq. Isn't it be a good idea that as soon as application starts we send the
information to AHS and let AHS write all the data published by RM for that
application.

I'm afraid a number of metrics cannot be determined when an application has
just been started, such as the finish time and the final status.

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM

[
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709113#comment-13709113
]

Hadoop QA commented on YARN-513:

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12592424/YARN-513.15.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 5 new
or modified test files.

{color:red}-1 javac{color:red}. The patch appears to cause the build to
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1486//console

This message is automatically generated.

Create common proxy client for communicating with RM

Key: YARN-513
URL: https://issues.apache.org/jira/browse/YARN-513
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch,
YARN-513.13.patch, YARN-513.14.patch, YARN-513.15.patch, YARN-513.1.patch,
YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch,
YARN-513.6.patch, YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch

When the RM is restarting, the NM, AM and Clients should wait for some time
for the RM to come back up.

[jira] [Updated] (YARN-914) Support graceful decommission of nodemanager

2013-07-15 Thread Luke Lu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Luke Lu updated YARN-914:
-

Description:
When NNs are decommissioned for non-fault reasons (capacity change etc.), it's
desirable to minimize the impact to running applications.

Currently if a NM is decommissioned, all running containers on the NM need to
be rescheduled on other NMs. Further more, for finished map tasks, if their map
output are not fetched by the reducers of the job, these map tasks will need to
be rerun as well.

We propose to introduce a mechanism to optionally gracefully decommission a
node manager.

was:
When NNs are decommissioned for non-fault reasons (capacity change etc.), it's
desirable to minimize the impact to running applications.

Currently if a NN is decommissioned, all running containers on the NN need to
be rescheduled on other NNs. Further more, for finished map tasks, if their map
output are not fetched by the reducers of the job, these map tasks will need to
be rerun as well.

We propose to introduce a mechanism to optionally gracefully decommission a
node manager.

Support graceful decommission of nodemanager

Key: YARN-914
URL: https://issues.apache.org/jira/browse/YARN-914
Project: Hadoop YARN
Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Luke Lu
Assignee: Junping Du

When NNs are decommissioned for non-fault reasons (capacity change etc.),
it's desirable to minimize the impact to running applications.
Currently if a NM is decommissioned, all running containers on the NM need to
be rescheduled on other NMs. Further more, for finished map tasks, if their
map output are not fetched by the reducers of the job, these map tasks will
need to be rerun as well.
We propose to introduce a mechanism to optionally gracefully decommission a
node manager.

[jira] [Updated] (YARN-918) ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701


 [ 
https://issues.apache.org/jira/browse/YARN-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-918:
-

Attachment: YARN-918-20130715.txt

Here's a patch that works. Needs to be applied on top of YARN-701. It simply 
removes the ApplicationAttemptId from the protocol request objects.

 ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload 
 after YARN-701
 -

 Key: YARN-918
 URL: https://issues.apache.org/jira/browse/YARN-918
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Attachments: YARN-918-20130715.txt


 Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need 
 ApplicationAttemptId in the RPC pay load. This is an API change, so doing it 
 as a blocker for 2.1.0-beta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-914) Support graceful decommission of nodemanager

2013-07-15 Thread Luke Lu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Luke Lu updated YARN-914:
-

Description:
When NMs are decommissioned for non-fault reasons (capacity change etc.), it's
desirable to minimize the impact to running applications.

We propose to introduce a mechanism to optionally gracefully decommission a
node manager.

was:
When NNs are decommissioned for non-fault reasons (capacity change etc.), it's
desirable to minimize the impact to running applications.

We propose to introduce a mechanism to optionally gracefully decommission a
node manager.

Support graceful decommission of nodemanager

Key: YARN-914
URL: https://issues.apache.org/jira/browse/YARN-914
Project: Hadoop YARN
Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Luke Lu
Assignee: Junping Du

When NMs are decommissioned for non-fault reasons (capacity change etc.),
it's desirable to minimize the impact to running applications.
Currently if a NM is decommissioned, all running containers on the NM need to
be rescheduled on other NMs. Further more, for finished map tasks, if their
map output are not fetched by the reducers of the job, these map tasks will
need to be rerun as well.
We propose to introduce a mechanism to optionally gracefully decommission a
node manager.

[jira] [Created] (YARN-926) ContainerManagerProtcol APIs should take in requests for multiple containers

Vinod Kumar Vavilapalli created YARN-926:


 Summary: ContainerManagerProtcol APIs should take in requests for 
multiple containers
 Key: YARN-926
 URL: https://issues.apache.org/jira/browse/YARN-926
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


AMs typically have to launch multiple containers on a node and the current 
single container APIs aren't helping. We should have all the APIs take in 
multiple requests and return multiple responses.

The client libraries could expose both the single and multi-container requests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.


[ 
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709147#comment-13709147
 ] 

Hadoop QA commented on YARN-744:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12592426/YARN-744-20130715.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1485//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1485//console

This message is automatically generated.

 Race condition in ApplicationMasterService.allocate .. It might process same 
 allocate request twice resulting in additional containers getting allocated.
 -

 Key: YARN-744
 URL: https://issues.apache.org/jira/browse/YARN-744
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Omkar Vinit Joshi
Priority: Minor
 Attachments: MAPREDUCE-3899-branch-0.23.patch, 
 YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch


 Looks like the lock taken in this is broken. It takes a lock on lastResponse 
 object and then puts a new lastResponse object into the map. At this point a 
 new thread entering this function will get a new lastResponse object and will 
 be able to take its lock and enter the critical section. Presumably we want 
 to limit one response per app attempt. So the lock could be taken on the 
 ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager

2013-07-15 Thread Luke Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709144#comment-13709144
 ] 

Luke Lu commented on YARN-914:
--

[~atm]: Nice catch! Of course :)

 Support graceful decommission of nodemanager
 

 Key: YARN-914
 URL: https://issues.apache.org/jira/browse/YARN-914
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Luke Lu
Assignee: Junping Du

 When NMs are decommissioned for non-fault reasons (capacity change etc.), 
 it's desirable to minimize the impact to running applications.
 Currently if a NM is decommissioned, all running containers on the NM need to 
 be rescheduled on other NMs. Further more, for finished map tasks, if their 
 map output are not fetched by the reducers of the job, these map tasks will 
 need to be rerun as well.
 We propose to introduce a mechanism to optionally gracefully decommission a 
 node manager.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-926) ContainerManagerProtcol APIs should take in requests for multiple containers


 [ 
https://issues.apache.org/jira/browse/YARN-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-926:
-

Priority: Blocker  (was: Major)
Target Version/s: 2.1.0-beta

Technically this isn't a blocker, and can be added as a new API in a compatible 
manner. But I'd like to avoid having multiple APIs as we still have a chance of 
getting this into 2.1.0. Thoughts?

 ContainerManagerProtcol APIs should take in requests for multiple containers
 

 Key: YARN-926
 URL: https://issues.apache.org/jira/browse/YARN-926
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker

 AMs typically have to launch multiple containers on a node and the current 
 single container APIs aren't helping. We should have all the APIs take in 
 multiple requests and return multiple responses.
 The client libraries could expose both the single and multi-container 
 requests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-912) Create exceptions package in common/api for yarn and move client facing exceptions to them


[ 
https://issues.apache.org/jira/browse/YARN-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709157#comment-13709157
 ] 

Vinod Kumar Vavilapalli commented on YARN-912:
--

Can you also take care of NMNotYetReadyException and InvalidContainerException 
too?

 Create exceptions package in common/api for yarn and move client facing 
 exceptions to them
 --

 Key: YARN-912
 URL: https://issues.apache.org/jira/browse/YARN-912
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Mayank Bansal
 Attachments: YARN-912-trunk-1.patch, YARN-912-trunk-2.patch, 
 YARN-912-trunk-3.patch


 Exceptions like InvalidResourceBlacklistRequestException, 
 InvalidResourceRequestException, InvalidApplicationMasterRequestException etc 
 are currently inside ResourceManager and not visible to clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-321) Generic application history service

2013-07-15 Thread Karthik Kambatla (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709192#comment-13709192
]

Karthik Kambatla commented on YARN-321:
---

Few other considerations:

bq. Running as service: By default, ApplicationHistoryService will be embedded
inside ResourceManager but will be independent enough to run as a separate
service for scaling purposes.
Is there a reason to embed this inside the RM? I don't know if there were
reasons for the JHS to be separate, other than it being MR-specific. If there
were, this would be against those. No?
That said, I agree it will be easier for the user if AHS starts along with the
RM. May be, that should be configurable and turned on by default?

bq. Hosting/serving per-framework data is out of scope for this JIRA.
Understand and agree it makes sense to not complicate it. However, during the
design, it would be nice to outline (at least at a high-level) how the
plugins can work. For the plugins to serve application-specific information,
I suspect the RM should write this information in addition to generic YARN
information about that application (e.g. MapReduce counters). On completion,
can we leave a provision for the AM to write a json blob (may be, via RM) to
{{HistoryStorage}}. In the AHS, can we leave a provision for app-plugins to
access/use this information to render application specifics.

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

[jira] [Commented] (YARN-906) TestNMClient.testNMClientNoCleanupOnStop fails occasionally

[
https://issues.apache.org/jira/browse/YARN-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709195#comment-13709195
]

Zhijie Shen commented on YARN-906:
--

Did some investigation in this test failure. The test itself seems to have no
problem. The test was timeout because the container state kept RUNNING after it
was stopped, which was not expected.

Looked into the test log: after stopContainer was called, Container moved from
LOCALIZED to KILLING, but didn't move on any more. However, looked into my
local test log of a successful run: Container moved from LOCALIZED to KILLING,
and then from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL, during which the major
work is to clean the localized container resources (observed the execution of
file deletion). However, the failed test log didn't show any file deletion.
Therefore, I guess there's something blocking during container resources
cleanup. Thoughts?

More investigation is needed to further locate the problem.

TestNMClient.testNMClientNoCleanupOnStop fails occasionally
---

Key: YARN-906
URL: https://issues.apache.org/jira/browse/YARN-906
Project: Hadoop YARN
Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen

See
https://builds.apache.org/job/PreCommit-YARN-Build/1435//testReport/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClientNoCleanupOnStop/

[jira] [Assigned] (YARN-306) FIFO scheduler doesn't respect changing job priority

2013-07-15 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-306:
-

Assignee: (was: Karthik Kambatla)

 FIFO scheduler doesn't respect changing job priority
 

 Key: YARN-306
 URL: https://issues.apache.org/jira/browse/YARN-306
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Nishan Shetty

 1.Submit job
 2.Change the job priority using setPriority() or CLI command ./mapred 
 job-set-priority job-id priority
 Observe that Job priority is not changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-07-15 Thread Thomas Weise (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709242#comment-13709242
 ] 

Thomas Weise commented on YARN-896:
---

We also identified the need for token renewal (app specific tokens). This 
should be a common need for long running services. Has it been discussed 
elsewhere?

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-654) AMRMClient: Perform sanity checks for parameters of public methods


[ 
https://issues.apache.org/jira/browse/YARN-654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709243#comment-13709243
 ] 

Bikas Saha commented on YARN-654:
-

Sorry the minor change I made in the patch that changed =0 to 0 is wrong. I 
did not follow the code correctly. Fixing it in the commit for YARN-521.

 AMRMClient: Perform sanity checks for parameters of public methods
 --

 Key: YARN-654
 URL: https://issues.apache.org/jira/browse/YARN-654
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Xuan Gong
 Fix For: 2.1.0-beta

 Attachments: YARN-654.1.patch, YARN-654.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations


 [ 
https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-521:


Attachment: YARN-521.final.patch

 Augment AM - RM client module to be able to request containers only at 
 specific locations
 -

 Key: YARN-521
 URL: https://issues.apache.org/jira/browse/YARN-521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, 
 YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521-5.patch, 
 YARN-521-6.patch, YARN-521.final.patch, YARN-521.patch


 When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to 
 offer an easy way to access their functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-513) Create common proxy client for communicating with RM


 [ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-513:
-

Attachment: YARN-513.16.patch

 Create common proxy client for communicating with RM
 

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, 
 YARN-513.13.patch, YARN-513.14.patch, YARN-513.15.patch, YARN-513.16.patch, 
 YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, 
 YARN.513.5.patch, YARN-513.6.patch, YARN-513.7.patch, YARN-513.8.patch, 
 YARN-513.9.patch


 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception

2013-07-15 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-875:
---

Attachment: YARN-875.1.patch

 Application can hang if AMRMClientAsync callback thread has exception
 -

 Key: YARN-875
 URL: https://issues.apache.org/jira/browse/YARN-875
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-875.1.patch


 Currently that thread will die and then never callback. App can hang. 
 Possible solution could be to catch Throwable in the callback and then call 
 client.onError().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager

2013-07-15 Thread Aaron T. Myers (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709255#comment-13709255
 ] 

Aaron T. Myers commented on YARN-914:
-

Thanks, Luke.

 Support graceful decommission of nodemanager
 

 Key: YARN-914
 URL: https://issues.apache.org/jira/browse/YARN-914
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Luke Lu
Assignee: Junping Du

 When NMs are decommissioned for non-fault reasons (capacity change etc.), 
 it's desirable to minimize the impact to running applications.
 Currently if a NM is decommissioned, all running containers on the NM need to 
 be rescheduled on other NMs. Further more, for finished map tasks, if their 
 map output are not fetched by the reducers of the job, these map tasks will 
 need to be rerun as well.
 We propose to introduce a mechanism to optionally gracefully decommission a 
 node manager.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-321) Generic application history service

2013-07-15 Thread Jason Lowe (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709262#comment-13709262
]

Jason Lowe commented on YARN-321:
-

bq. Is there a reason to embed this inside the RM? I don't know if there were
reasons for the JHS to be separate, other than it being MR-specific.

IIRC the history server was embedded in the JT back in 1.x and was only split
out as a separate daemon to keep the RM from having a dependency on MR.

bq. That said, I agree it will be easier for the user if AHS starts along with
the RM. May be, that should be configurable and turned on by default?

That'd be my preference, and the proxyserver is already done this way. One can
run it either as part of the RM (default) or setup some configs and launch it
separately via {{yarn proxyserver}}.

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

[jira] [Resolved] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations


 [ 
https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha resolved YARN-521.
-

   Resolution: Fixed
Fix Version/s: 2.1.0-beta
 Hadoop Flags: Reviewed

Committed to trunk, branch-2 and branch-2.1-beta. I included in trivial error 
from YARN-654 in the commit. Thanks Sandy!

 Augment AM - RM client module to be able to request containers only at 
 specific locations
 -

 Key: YARN-521
 URL: https://issues.apache.org/jira/browse/YARN-521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.1.0-beta

 Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, 
 YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521-5.patch, 
 YARN-521-6.patch, YARN-521.final.patch, YARN-521.patch


 When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to 
 offer an easy way to access their functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-523) Container localization failures aren't reported from NM to RM


[ 
https://issues.apache.org/jira/browse/YARN-523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709280#comment-13709280
 ] 

Vinod Kumar Vavilapalli commented on YARN-523:
--

Tx for the testing update Jian. The test changes are trivial and good. Checking 
this in.

 Container localization failures aren't reported from NM to RM
 -

 Key: YARN-523
 URL: https://issues.apache.org/jira/browse/YARN-523
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-523.patch


 This is mainly a pain on crashing AMs, but once we fix this, containers also 
 can benefit - same fix for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.

[
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709304#comment-13709304
]

Omkar Vinit Joshi commented on YARN-744:

bq. We should not be locking on the app attempt id.
I am not locking on appAttemptId... or AppAttemptImpl...didn't understand your
question.

bq. Why do we need a wrapper?
We don't have any explicit lock for an application attempt...I am creating a
wrapped object to avoid maintaining per application attempt lock. Thereby
across application attempt response we can lock on specific attempt.

I think this is important as we may loose container than what were requested...

Race condition in ApplicationMasterService.allocate .. It might process same
allocate request twice resulting in additional containers getting allocated.
-

Key: YARN-744
URL: https://issues.apache.org/jira/browse/YARN-744
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Reporter: Bikas Saha
Assignee: Omkar Vinit Joshi
Priority: Minor
Attachments: MAPREDUCE-3899-branch-0.23.patch,
YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch

[jira] [Resolved] (YARN-430) Add HDFS based store for RM which manages the store using directories


 [ 
https://issues.apache.org/jira/browse/YARN-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha resolved YARN-430.
-

Resolution: Not A Problem

YARN-922 adds this to the FileSystemStateStore. HDFS retry behavior can be 
configured from config. No need for this anymore.

 Add HDFS based store for RM which manages the store using directories
 -

 Key: YARN-430
 URL: https://issues.apache.org/jira/browse/YARN-430
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He

 There is a generic FileSystem store but it does not take advantage of HDFS 
 features like directories, replication, DFSClient advanced settings for HA, 
 retries etc. Writing a store thats optimized for HDFS would be good.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-922) Change FileSystemRMStateStore to use directories


[ 
https://issues.apache.org/jira/browse/YARN-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709311#comment-13709311
 ] 

Jian He commented on YARN-922:
--

existing test cases covers this, no need to add more tests

 Change FileSystemRMStateStore to use directories
 

 Key: YARN-922
 URL: https://issues.apache.org/jira/browse/YARN-922
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-922.patch


 Store each app and its attempts in the same directory so that removing 
 application state is only one operation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM


[ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709329#comment-13709329
 ] 

Hadoop QA commented on YARN-513:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12592454/YARN-513.16.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1487//console

This message is automatically generated.

 Create common proxy client for communicating with RM
 

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, 
 YARN-513.13.patch, YARN-513.14.patch, YARN-513.15.patch, YARN-513.16.patch, 
 YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, 
 YARN.513.5.patch, YARN-513.6.patch, YARN-513.7.patch, YARN-513.8.patch, 
 YARN-513.9.patch


 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-513) Create common proxy client for communicating with RM


 [ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-513:
-

Attachment: YARN-513.17.patch

rebased the patch

 Create common proxy client for communicating with RM
 

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, 
 YARN-513.13.patch, YARN-513.14.patch, YARN-513.15.patch, YARN-513.16.patch, 
 YARN-513.17.patch, YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, 
 YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, YARN-513.7.patch, 
 YARN-513.8.patch, YARN-513.9.patch


 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-922) Change FileSystemRMStateStore to use directories


 [ 
https://issues.apache.org/jira/browse/YARN-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-922:
-

Attachment: YARN-922.1.patch

 Change FileSystemRMStateStore to use directories
 

 Key: YARN-922
 URL: https://issues.apache.org/jira/browse/YARN-922
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-922.1.patch, YARN-922.patch


 Store each app and its attempts in the same directory so that removing 
 application state is only one operation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-927) Change ContainerRequest to not have more than 1 container count and remove StoreContainerRequest

Bikas Saha created YARN-927:
---

 Summary: Change ContainerRequest to not have more than 1 container 
count and remove StoreContainerRequest
 Key: YARN-927
 URL: https://issues.apache.org/jira/browse/YARN-927
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Bikas Saha


The downside is having to use more than 1 container request when requesting 
more than 1 container at * priority. For most other use cases that have 
specific locations we anyways need to make multiple container requests. This 
will also remove unnecessary duplication caused by StoredContainerRequest. It 
will make the getMatchingRequest() always available and easy to use 
removeContainerRequest().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-245) Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED


[ 
https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709355#comment-13709355
 ] 

Omkar Vinit Joshi commented on YARN-245:


Thanks mayank... I just took a look at your patch.. Here are few comments..
bq. +  private int lastHeartBeatId;

do we need this? Can we remove it?

bq. +// Checking if the response id is the same which we just 
processed
bq. +// If yes then ignore the update.
bq. +if (lastHeartBeatID = response.getResponseId()) {
bq. +  ++lastHeartBeatID;
bq. +  continue;
bq. +}

I remember we talked about it some time backcorrect me if I am wrong.. Can 
we replace this with something like ..accepting only one and reject all others? 

if (lastHeartbeatID != response.getResponseId() - 1 ) {
  continue;
}
thoughts??

* for Test case.. probably we can avoid creating certain classes there..
MyNodeManager6 - MockNM with existing NodeStatusUpdater?
MyResourceTracker6 - with existing resource tracker (may be from another test? 
)only override nodeHeartbeat method...
Also I think we need to check 2 things..
1) on node manager side we are getting only one APP_FINISH event on dispatcher 
queue? 
2) if application state == DONE then send node status response containing the 
current application to be finished?? may be this will actually test the current 
problem? should fail without patch.. thoughts?

 Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at 
 FINISHED
 

 Key: YARN-245
 URL: https://issues.apache.org/jira/browse/YARN-245
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch


 {code:xml}
 2012-11-25 12:56:11,795 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 FINISH_APPLICATION at FINISHED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
 at java.lang.Thread.run(Thread.java:662)
 2012-11-25 12:56:11,796 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Application application_1353818859056_0004 transitioned from FINISHED to null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-927) Change ContainerRequest to not have more than 1 container count and remove StoreContainerRequest


[ 
https://issues.apache.org/jira/browse/YARN-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709359#comment-13709359
 ] 

Sandy Ryza commented on YARN-927:
-

Removing the ability to request multiple containers would be an annoying 
regression for a large class of applications that merely want a 
non-locality-constrained bunch of processes on the cluster.  Have you 
considered allowing StoredContainerRequest to have multiple containers and 
including a decrementContainerRequest method?


 Change ContainerRequest to not have more than 1 container count and remove 
 StoreContainerRequest
 

 Key: YARN-927
 URL: https://issues.apache.org/jira/browse/YARN-927
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Bikas Saha

 The downside is having to use more than 1 container request when requesting 
 more than 1 container at * priority. For most other use cases that have 
 specific locations we anyways need to make multiple container requests. This 
 will also remove unnecessary duplication caused by StoredContainerRequest. It 
 will make the getMatchingRequest() always available and easy to use 
 removeContainerRequest().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-523) Container localization failures aren't reported from NM to RM


[ 
https://issues.apache.org/jira/browse/YARN-523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709366#comment-13709366
 ] 

Hudson commented on YARN-523:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #4086 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4086/])
YARN-523. Modified a test-case to validate container diagnostics on 
localization failures. Contributed by Jian He. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1503532)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java


 Container localization failures aren't reported from NM to RM
 -

 Key: YARN-523
 URL: https://issues.apache.org/jira/browse/YARN-523
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Fix For: 2.1.1-beta

 Attachments: YARN-523.patch


 This is mainly a pain on crashing AMs, but once we fix this, containers also 
 can benefit - same fix for both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations


[ 
https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709367#comment-13709367
 ] 

Hudson commented on YARN-521:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #4086 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4086/])
YARN-521. Augment AM - RM client module to be able to request containers only 
at specific locations (Sandy Ryza via bikas) (bikas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1503526)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/InvalidContainerRequestException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientContainerRequest.java


 Augment AM - RM client module to be able to request containers only at 
 specific locations
 -

 Key: YARN-521
 URL: https://issues.apache.org/jira/browse/YARN-521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.1.0-beta

 Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, 
 YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521-5.patch, 
 YARN-521-6.patch, YARN-521.final.patch, YARN-521.patch


 When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to 
 offer an easy way to access their functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception


[ 
https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709370#comment-13709370
 ] 

Hadoop QA commented on YARN-875:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12592456/YARN-875.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
  
org.apache.hadoop.yarn.client.api.impl.TestAMRMClientContainerRequest
  org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
  org.apache.hadoop.yarn.client.api.impl.TestNMClient

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1489//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1489//console

This message is automatically generated.

 Application can hang if AMRMClientAsync callback thread has exception
 -

 Key: YARN-875
 URL: https://issues.apache.org/jira/browse/YARN-875
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-875.1.patch


 Currently that thread will die and then never callback. App can hang. 
 Possible solution could be to catch Throwable in the callback and then call 
 client.onError().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-927) Change ContainerRequest to not have more than 1 container count and remove StoreContainerRequest

[
https://issues.apache.org/jira/browse/YARN-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709380#comment-13709380
]

Bikas Saha commented on YARN-927:
-

IMO calling
addContainerRequest(new ContainerRequest(5))
is only a little bit less work than calling
for(int i=0; i5; ++i) {addContainerRequest(new ContainerRequest(1));}
Other than that there should be no change in any of those applications. They
should continue to work as is after that.

Book-keeping for store containers is next to impossible when add(pri1, 5),
add(pri1, 4) is called followed remove(pri1, 2). Internally, we dont know
whether to remove from the first CR or the second. That makes getting
getMatchingRequest API a non-starter. That is why getMatchingRequest is
restricted for StoredContainerRequest. It makes the API confusing. Allowing
users to get some CR and changing its container count outside of the AMRMClient
will lead to other correctness issues. The API is messy the way it is right
now. I had always wanted to do this but lost track of it because of TEZ
stabilization work. Reviewing YARN-521 recently reminded of this when I noticed
the duplication of stuff between CR and StoredCR and how its easy to miss them.

From what I see, the downside of this very minimal and upside is a much
cleaner API. Hence I want to get this in before beta. I held back on it since
YARN-521 was close and didnt want to cause unnecessary massive merge conflicts
because of this simple refactor.

Change ContainerRequest to not have more than 1 container count and remove
StoreContainerRequest

Key: YARN-927
URL: https://issues.apache.org/jira/browse/YARN-927
Project: Hadoop YARN
Issue Type: Task
Reporter: Bikas Saha
Assignee: Bikas Saha

The downside is having to use more than 1 container request when requesting
more than 1 container at * priority. For most other use cases that have
specific locations we anyways need to make multiple container requests. This
will also remove unnecessary duplication caused by StoredContainerRequest. It
will make the getMatchingRequest() always available and easy to use
removeContainerRequest().

[jira] [Commented] (YARN-922) Change FileSystemRMStateStore to use directories


[ 
https://issues.apache.org/jira/browse/YARN-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709382#comment-13709382
 ] 

Hadoop QA commented on YARN-922:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12592465/YARN-922.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestRMStateStore

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1488//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1488//console

This message is automatically generated.

 Change FileSystemRMStateStore to use directories
 

 Key: YARN-922
 URL: https://issues.apache.org/jira/browse/YARN-922
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-922.1.patch, YARN-922.patch


 Store each app and its attempts in the same directory so that removing 
 application state is only one operation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-62) AM should not be able to abuse container tokens for repetitive container launches

[
https://issues.apache.org/jira/browse/YARN-62?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709388#comment-13709388
]

Omkar Vinit Joshi commented on YARN-62:
---

thanks vinod..
bq. Though it works in most cases, it isn't logically correct to expire old
token only if a new container comes in or succeeds. We should perform the
expiry in a thread.

I thought about threads earlier but that is like starting an additional one to
maintain this and seems like an overhead. thoughts?

bq. Can you also write a specific test which launches a container that very
quickly exits, turns around and launches another container with same ID and
token and gets rejected?
bq. Also, please write a test which makes sure that old tokens are expired
after 10 mins.
yeah will add one...

AM should not be able to abuse container tokens for repetitive container
launches
-

Key: YARN-62
URL: https://issues.apache.org/jira/browse/YARN-62
Project: Hadoop YARN
Issue Type: Sub-task
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
Attachments: YARN-62-20130621.1.patch, YARN-62-20130621.patch,
YARN-62-20130628.patch

Clone of YARN-51.
ApplicationMaster should not be able to store container tokens and use the
same set of tokens for repetitive container launches. The possibility of such
abuse is there in the current code, for a duration of 1d+10mins, we need to
fix this.

[jira] [Updated] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception


 [ 
https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-875:


Attachment: YARN-875.1.patch

Kicking test again.

 Application can hang if AMRMClientAsync callback thread has exception
 -

 Key: YARN-875
 URL: https://issues.apache.org/jira/browse/YARN-875
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-875.1.patch, YARN-875.1.patch


 Currently that thread will die and then never callback. App can hang. 
 Possible solution could be to catch Throwable in the callback and then call 
 client.onError().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.


 [ 
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-744:
---

Description: 
Looks like the lock taken in this is broken. It takes a lock on lastResponse 
object and then puts a new lastResponse object into the map. At this point a 
new thread entering this function will get a new lastResponse object and will 
be able to take its lock and enter the critical section. Presumably we want to 
limit one response per app attempt. So the lock could be taken on the 
ApplicationAttemptId key of the response map object.



  was:
Looks like the lock taken in this is broken. It takes a lock on lastResponse 
object and then puts a new lastResponse object into the map. At this point a 
new thread entering this function will get a new lastResponse object and will 
be able to take its lock and enter the critical section. Presumably we want to 
limit one response per app attempt. So the lock could be taken on the 
ApplicationAttemptId key of the response map object.

As a part of this ticket also fixing the problem which is present in 


 Race condition in ApplicationMasterService.allocate .. It might process same 
 allocate request twice resulting in additional containers getting allocated.
 -

 Key: YARN-744
 URL: https://issues.apache.org/jira/browse/YARN-744
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Omkar Vinit Joshi
Priority: Minor
 Attachments: MAPREDUCE-3899-branch-0.23.patch, 
 YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch


 Looks like the lock taken in this is broken. It takes a lock on lastResponse 
 object and then puts a new lastResponse object into the map. At this point a 
 new thread entering this function will get a new lastResponse object and will 
 be able to take its lock and enter the critical section. Presumably we want 
 to limit one response per app attempt. So the lock could be taken on the 
 ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.


 [ 
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-744:
---

Description: 
Looks like the lock taken in this is broken. It takes a lock on lastResponse 
object and then puts a new lastResponse object into the map. At this point a 
new thread entering this function will get a new lastResponse object and will 
be able to take its lock and enter the critical section. Presumably we want to 
limit one response per app attempt. So the lock could be taken on the 
ApplicationAttemptId key of the response map object.

As a part of this ticket also fixing the problem which is present in 

  was:Looks like the lock taken in this is broken. It takes a lock on 
lastResponse object and then puts a new lastResponse object into the map. At 
this point a new thread entering this function will get a new lastResponse 
object and will be able to take its lock and enter the critical section. 
Presumably we want to limit one response per app attempt. So the lock could be 
taken on the ApplicationAttemptId key of the response map object.


 Race condition in ApplicationMasterService.allocate .. It might process same 
 allocate request twice resulting in additional containers getting allocated.
 -

 Key: YARN-744
 URL: https://issues.apache.org/jira/browse/YARN-744
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Omkar Vinit Joshi
Priority: Minor
 Attachments: MAPREDUCE-3899-branch-0.23.patch, 
 YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch


 Looks like the lock taken in this is broken. It takes a lock on lastResponse 
 object and then puts a new lastResponse object into the map. At this point a 
 new thread entering this function will get a new lastResponse object and will 
 be able to take its lock and enter the critical section. Presumably we want 
 to limit one response per app attempt. So the lock could be taken on the 
 ApplicationAttemptId key of the response map object.
 As a part of this ticket also fixing the problem which is present in 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations


[ 
https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709398#comment-13709398
 ] 

Hudson commented on YARN-521:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #4087 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4087/])
Trivial fix for minor refactor error for YARN-521 (bikas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1503543)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java


 Augment AM - RM client module to be able to request containers only at 
 specific locations
 -

 Key: YARN-521
 URL: https://issues.apache.org/jira/browse/YARN-521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.1.0-beta

 Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, 
 YARN-521-3.patch, YARN-521-4.patch, YARN-521-4.patch, YARN-521-5.patch, 
 YARN-521-6.patch, YARN-521.final.patch, YARN-521.patch


 When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to 
 offer an easy way to access their functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception


[ 
https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709408#comment-13709408
 ] 

Hadoop QA commented on YARN-875:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12592471/YARN-875.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1491//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1491//console

This message is automatically generated.

 Application can hang if AMRMClientAsync callback thread has exception
 -

 Key: YARN-875
 URL: https://issues.apache.org/jira/browse/YARN-875
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-875.1.patch, YARN-875.1.patch


 Currently that thread will die and then never callback. App can hang. 
 Possible solution could be to catch Throwable in the callback and then call 
 client.onError().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-927) Change ContainerRequest to not have more than 1 container count and remove StoreContainerRequest


 [ 
https://issues.apache.org/jira/browse/YARN-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-927:


Attachment: YARN-927.1.patch

Attaching the refactoring patch. The code change in DistributedShell reflects 
the trivial changes needed for apps that would have used 1 container count in 
a single ContainerRequest.

 Change ContainerRequest to not have more than 1 container count and remove 
 StoreContainerRequest
 

 Key: YARN-927
 URL: https://issues.apache.org/jira/browse/YARN-927
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: YARN-927.1.patch


 The downside is having to use more than 1 container request when requesting 
 more than 1 container at * priority. For most other use cases that have 
 specific locations we anyways need to make multiple container requests. This 
 will also remove unnecessary duplication caused by StoredContainerRequest. It 
 will make the getMatchingRequest() always available and easy to use 
 removeContainerRequest().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-451) Add more metrics to RM page

2013-07-15 Thread Joep Rottinghuis (JIRA)

[
https://issues.apache.org/jira/browse/YARN-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709420#comment-13709420
]

Joep Rottinghuis commented on YARN-451:
---

It would certainly be very useful to be able to see application size/weight
(and order by this) when many applications run.
If it were to be added, various Yarn applications would have their own specific
implementation.
At the moment only memory is tracked, so #slot Gigabytes would be a possible
number that would be more generic then simply #mappers+#reducers.

Either would be more useful that having no data at all.

Being able to see the size of applications is really helpful to understand what
is going on in one view. Is somebody running many small applications, a few
large ones, many large ones ? etc.

Add more metrics to RM page
---

Key: YARN-451
URL: https://issues.apache.org/jira/browse/YARN-451
Project: Hadoop YARN
Issue Type: Improvement
Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
Priority: Minor

ResourceManager webUI shows list of RUNNING applications, but it does not
tell which applications are requesting more resource compared to others. With
cluster running hundreds of applications at once it would be useful to have
some kind of metric to show high-resource usage applications vs low-resource
usage ones. At the minimum showing number of containers is good option.

[jira] [Updated] (YARN-513) Create common proxy client for communicating with RM


 [ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-513:


Attachment: YARN-513.17.patch

 Create common proxy client for communicating with RM
 

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, 
 YARN-513.13.patch, YARN-513.14.patch, YARN-513.15.patch, YARN-513.16.patch, 
 YARN-513.17.patch, YARN-513.17.patch, YARN-513.1.patch, YARN-513.2.patch, 
 YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, 
 YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch


 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-927) Change ContainerRequest to not have more than 1 container count and remove StoreContainerRequest


[ 
https://issues.apache.org/jira/browse/YARN-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709433#comment-13709433
 ] 

Hadoop QA commented on YARN-927:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12592478/YARN-927.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1492//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1492//console

This message is automatically generated.

 Change ContainerRequest to not have more than 1 container count and remove 
 StoreContainerRequest
 

 Key: YARN-927
 URL: https://issues.apache.org/jira/browse/YARN-927
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: YARN-927.1.patch


 The downside is having to use more than 1 container request when requesting 
 more than 1 container at * priority. For most other use cases that have 
 specific locations we anyways need to make multiple container requests. This 
 will also remove unnecessary duplication caused by StoredContainerRequest. It 
 will make the getMatchingRequest() always available and easy to use 
 removeContainerRequest().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM

[
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709443#comment-13709443
]

Hadoop QA commented on YARN-513:

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12592464/YARN-513.17.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 5 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

org.apache.hadoop.mapreduce.v2.TestMROldApiJobs

org.apache.hadoop.yarn.client.api.impl.TestAMRMClientContainerRequest
org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
org.apache.hadoop.yarn.client.api.impl.TestNMClient

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/1490//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1490//console

This message is automatically generated.

Create common proxy client for communicating with RM

Key: YARN-513
URL: https://issues.apache.org/jira/browse/YARN-513
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch,
YARN-513.13.patch, YARN-513.14.patch, YARN-513.15.patch, YARN-513.16.patch,
YARN-513.17.patch, YARN-513.17.patch, YARN-513.1.patch, YARN-513.2.patch,
YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch,
YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch

When the RM is restarting, the NM, AM and Clients should wait for some time
for the RM to come back up.

[jira] [Updated] (YARN-922) Change FileSystemRMStateStore to use directories


 [ 
https://issues.apache.org/jira/browse/YARN-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-922:
-

Attachment: YARN-922.2.patch

fixed test failure.

 Change FileSystemRMStateStore to use directories
 

 Key: YARN-922
 URL: https://issues.apache.org/jira/browse/YARN-922
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-922.1.patch, YARN-922.2.patch, YARN-922.patch


 Store each app and its attempts in the same directory so that removing 
 application state is only one operation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM


[ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709457#comment-13709457
 ] 

Jian He commented on YARN-513:
--

The above test failures is not related to this patch, TestMROldApiJobs is also 
failing on trunk, other test failures are related to YARN-521 which should be 
already fixed by now

 Create common proxy client for communicating with RM
 

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, 
 YARN-513.13.patch, YARN-513.14.patch, YARN-513.15.patch, YARN-513.16.patch, 
 YARN-513.17.patch, YARN-513.17.patch, YARN-513.1.patch, YARN-513.2.patch, 
 YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, 
 YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch


 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-922) Change FileSystemRMStateStore to use directories


[ 
https://issues.apache.org/jira/browse/YARN-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709458#comment-13709458
 ] 

Hadoop QA commented on YARN-922:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12592484/YARN-922.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1494//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1494//console

This message is automatically generated.

 Change FileSystemRMStateStore to use directories
 

 Key: YARN-922
 URL: https://issues.apache.org/jira/browse/YARN-922
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-922.1.patch, YARN-922.2.patch, YARN-922.patch


 Store each app and its attempts in the same directory so that removing 
 application state is only one operation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception

2013-07-15 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709461#comment-13709461
 ] 

Xuan Gong commented on YARN-875:


fix -1 on javadoc

 Application can hang if AMRMClientAsync callback thread has exception
 -

 Key: YARN-875
 URL: https://issues.apache.org/jira/browse/YARN-875
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-875.1.patch, YARN-875.1.patch, YARN-875.2.patch


 Currently that thread will die and then never callback. App can hang. 
 Possible solution could be to catch Throwable in the callback and then call 
 client.onError().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception

2013-07-15 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-875:
---

Attachment: YARN-875.2.patch

 Application can hang if AMRMClientAsync callback thread has exception
 -

 Key: YARN-875
 URL: https://issues.apache.org/jira/browse/YARN-875
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-875.1.patch, YARN-875.1.patch, YARN-875.2.patch


 Currently that thread will die and then never callback. App can hang. 
 Possible solution could be to catch Throwable in the callback and then call 
 client.onError().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception


[ 
https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13709470#comment-13709470
 ] 

Hadoop QA commented on YARN-875:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12592488/YARN-875.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1495//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1495//console

This message is automatically generated.

 Application can hang if AMRMClientAsync callback thread has exception
 -

 Key: YARN-875
 URL: https://issues.apache.org/jira/browse/YARN-875
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-875.1.patch, YARN-875.1.patch, YARN-875.2.patch


 Currently that thread will die and then never callback. App can hang. 
 Possible solution could be to catch Throwable in the callback and then call 
 client.onError().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Moved] (YARN-928) While killing attempt for a task which got succeeded , task transition happens from SUCCEEDED to SCHEDULED and InvalidStateTransitonException is thrown

2013-07-15 Thread J.Andreina (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina moved MAPREDUCE-5389 to YARN-928:


  Component/s: (was: task)
   applications
Affects Version/s: (was: 2.0.5-alpha)
   2.0.5-alpha
  Key: YARN-928  (was: MAPREDUCE-5389)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 While killing attempt for a task which got succeeded , task transition 
 happens from SUCCEEDED to SCHEDULED and InvalidStateTransitonException is 
 thrown 
 

 Key: YARN-928
 URL: https://issues.apache.org/jira/browse/YARN-928
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Priority: Minor

 Step 1: Install cluster with HDFS , MR
 Step 2: Execute a job
 Step 3: Issue a kill task attempt for which the task has got completed.
 Rex@HOST-10-18-91-55:~/NodeAgentTmpDir/installations/hadoop-2.0.5.tar/hadoop-2.0.5/bin
  ./mapred job -kill-task attempt_1373875322959_0032_m_00_0 
 No GC_PROFILE is given. Defaults to medium.
 13/07/15 14:46:32 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
 13/07/15 14:46:32 INFO proxy.ResourceManagerProxies: HA Proxy Creation with 
 xface : interface org.apache.hadoop.yarn.api.ClientRMProtocol
 13/07/15 14:46:33 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
 Killed task attempt_1373875322959_0032_m_00_0
 Observation:
 ===
 1. task state has been transitioned from SUCCEEDED to SCHEDULED
 2. For a Succeeded attempt , when client issues Kill , then the client is 
 notified as killed for a succeeded attempt.
 3. Launched second task_attempt which is succeeded and then killed later on 
 client request.
 4. Even after the job state transitioned from SUCCEEDED to ERROR , on UI the 
 state is succeeded
 Issue :
 =
 1. Client has been notified that the atttempt is killed , but acutually the 
 attempt is succeeded and the same is displayed in JHS UI.
 2. At App master InvalidStateTransitonException is thrown .
 3. At client side and JHS job has exited with state Finished/succeeded ,At RM 
 side the state is Finished/Failed.
 AM Logs:
 
 2013-07-15 14:46:25,461 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
 attempt_1373875322959_0032_m_00_0 TaskAttempt Transitioned from RUNNING 
 to SUCCEEDED
 2013-07-15 14:46:25,468 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with 
 attempt attempt_1373875322959_0032_m_00_0
 2013-07-15 14:46:25,470 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
 task_1373875322959_0032_m_00 Task Transitioned from RUNNING to SUCCEEDED
 2013-07-15 14:46:33,810 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
 task_1373875322959_0032_m_00 Task Transitioned from SUCCEEDED to SCHEDULED
 2013-07-15 14:46:37,344 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with 
 attempt attempt_1373875322959_0032_m_00_1
 2013-07-15 14:46:37,344 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
 task_1373875322959_0032_m_00 Task Transitioned from RUNNING to SUCCEEDED
 2013-07-15 14:46:37,345 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_COMPLETED at SUCCEEDED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:866)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:128)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1095)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1091)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
 at java.lang.Thread.run(Thread.java:662)

--
This message is automatically

[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM