[jira] [Commented] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI "list" command

2014-01-27 Thread Kenji Kikushima (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882636#comment-13882636
 ] 

Kenji Kikushima commented on YARN-1480:
---

I tried "mvn test -Dtest=org.apache.hadoop.yarn.client.api.impl.TestNMClient", 
but no error occurred. Hmm...

> RM web services getApps() accepts many more filters than ApplicationCLI 
> "list" command
> --
>
> Key: YARN-1480
> URL: https://issues.apache.org/jira/browse/YARN-1480
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Kenji Kikushima
> Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480-4.patch, 
> YARN-1480.patch
>
>
> Nowadays RM web services getApps() accepts many more filters than 
> ApplicationCLI "list" command, which only accepts "state" and "type". IMHO, 
> ideally, different interfaces should provide consistent functionality. Is it 
> better to allow more filters in ApplicationCLI?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1656) Return type of YarnRPC.getProxy() should be the given protocol class instead of Object

2014-01-27 Thread Hiroshi Ikeda (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroshi Ikeda updated YARN-1656:


Description: 
Writing code with explicit cast such as:
{code}
((ApplicationClientProtocol) rpc.getProxy(ApplicationClientProtocol.class, 
rmAddress, appsManagerServerConf));
{code}
is tedious.


> Return type of YarnRPC.getProxy() should be the given protocol class instead 
> of Object
> --
>
> Key: YARN-1656
> URL: https://issues.apache.org/jira/browse/YARN-1656
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Hiroshi Ikeda
>Priority: Minor
>
> Writing code with explicit cast such as:
> {code}
> ((ApplicationClientProtocol) rpc.getProxy(ApplicationClientProtocol.class, 
> rmAddress, appsManagerServerConf));
> {code}
> is tedious.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1656) Return type of YarnRPC.getProxy() should be the given protocol class instead of Object

2014-01-27 Thread Hiroshi Ikeda (JIRA)
Hiroshi Ikeda created YARN-1656:
---

 Summary: Return type of YarnRPC.getProxy() should be the given 
protocol class instead of Object
 Key: YARN-1656
 URL: https://issues.apache.org/jira/browse/YARN-1656
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
 Environment: Writing code with explicit cast such as:
{code}
((ApplicationClientProtocol) rpc.getProxy(ApplicationClientProtocol.class, 
rmAddress, appsManagerServerConf));
{code}
is tedious.

Reporter: Hiroshi Ikeda
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1656) Return type of YarnRPC.getProxy() should be the given protocol class instead of Object

2014-01-27 Thread Hiroshi Ikeda (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroshi Ikeda updated YARN-1656:


Environment: (was: Writing code with explicit cast such as:
{code}
((ApplicationClientProtocol) rpc.getProxy(ApplicationClientProtocol.class, 
rmAddress, appsManagerServerConf));
{code}
is tedious.
)

> Return type of YarnRPC.getProxy() should be the given protocol class instead 
> of Object
> --
>
> Key: YARN-1656
> URL: https://issues.apache.org/jira/browse/YARN-1656
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Hiroshi Ikeda
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1656) Return type of YarnRPC.getProxy() should be the given protocol class instead of Object

2014-01-27 Thread Hiroshi Ikeda (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroshi Ikeda updated YARN-1656:


Attachment: YARN-1656.patch

Added a sample patch for 2.2.0, which also fixes some similar issues for 
classes around YarnRPC.

> Return type of YarnRPC.getProxy() should be the given protocol class instead 
> of Object
> --
>
> Key: YARN-1656
> URL: https://issues.apache.org/jira/browse/YARN-1656
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Hiroshi Ikeda
>Priority: Minor
> Attachments: YARN-1656.patch
>
>
> Writing code with explicit cast such as:
> {code}
> ((ApplicationClientProtocol) rpc.getProxy(ApplicationClientProtocol.class, 
> rmAddress, appsManagerServerConf));
> {code}
> is tedious.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI "list" command

2014-01-27 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-1480:


Hadoop Flags: Reviewed

> RM web services getApps() accepts many more filters than ApplicationCLI 
> "list" command
> --
>
> Key: YARN-1480
> URL: https://issues.apache.org/jira/browse/YARN-1480
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Kenji Kikushima
> Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480-4.patch, 
> YARN-1480.patch
>
>
> Nowadays RM web services getApps() accepts many more filters than 
> ApplicationCLI "list" command, which only accepts "state" and "type". IMHO, 
> ideally, different interfaces should provide consistent functionality. Is it 
> better to allow more filters in ApplicationCLI?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI "list" command

2014-01-27 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882798#comment-13882798
 ] 

Akira AJISAKA commented on YARN-1480:
-

+1, the timeout in TestNMClient is not related to the patch.

> RM web services getApps() accepts many more filters than ApplicationCLI 
> "list" command
> --
>
> Key: YARN-1480
> URL: https://issues.apache.org/jira/browse/YARN-1480
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Kenji Kikushima
> Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480-4.patch, 
> YARN-1480.patch
>
>
> Nowadays RM web services getApps() accepts many more filters than 
> ApplicationCLI "list" command, which only accepts "state" and "type". IMHO, 
> ideally, different interfaces should provide consistent functionality. Is it 
> better to allow more filters in ApplicationCLI?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1657) Timeout occurs in TestNMClient

2014-01-27 Thread Akira AJISAKA (JIRA)
Akira AJISAKA created YARN-1657:
---

 Summary: Timeout occurs in TestNMClient
 Key: YARN-1657
 URL: https://issues.apache.org/jira/browse/YARN-1657
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Akira AJISAKA


A timeout occurs in TestNMClient when a patch is tested by Jenkins.

The following comment can be seen in YARN-1480, YARN-1611, and YARN-888.
{code}
{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

org.apache.hadoop.yarn.client.api.impl.TestNMClient
{code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1632) TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package

2014-01-27 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-1632:
--

Attachment: yarn-1632v2.patch

> TestApplicationMasterServices should be under 
> org.apache.hadoop.yarn.server.resourcemanager package
> ---
>
> Key: YARN-1632
> URL: https://issues.apache.org/jira/browse/YARN-1632
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 0.23.9, 2.2.0
>Reporter: Chen He
>Assignee: Chen He
>Priority: Minor
> Attachments: yarn-1632.patch, yarn-1632v2.patch
>
>
> ApplicationMasterService is under 
> org.apache.hadoop.yarn.server.resourcemanager package. However, its unit test 
> file TestApplicationMasterService is placed under 
> org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice 
> package which only contains one file (TestApplicationMasterService). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1632) TestApplicationMasterServices should be under org.apache.hadoop.yarn.server.resourcemanager package

2014-01-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882965#comment-13882965
 ] 

Hadoop QA commented on YARN-1632:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625373/yarn-1632v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2942//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2942//console

This message is automatically generated.

> TestApplicationMasterServices should be under 
> org.apache.hadoop.yarn.server.resourcemanager package
> ---
>
> Key: YARN-1632
> URL: https://issues.apache.org/jira/browse/YARN-1632
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 0.23.9, 2.2.0
>Reporter: Chen He
>Assignee: Chen He
>Priority: Minor
> Attachments: yarn-1632.patch, yarn-1632v2.patch
>
>
> ApplicationMasterService is under 
> org.apache.hadoop.yarn.server.resourcemanager package. However, its unit test 
> file TestApplicationMasterService is placed under 
> org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice 
> package which only contains one file (TestApplicationMasterService). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store

2014-01-27 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882997#comment-13882997
 ] 

Karthik Kambatla commented on YARN-1618:


[~bikassaha], [~vinodkv] - will you be able to take a look at the patch? It 
would be nice to include this in 2.3 if possible, thought I wouldn't call it a 
blocker for 2.3.

> Applications transition from NEW to FINAL_SAVING, and try to update 
> non-existing entries in the state-store
> ---
>
> Key: YARN-1618
> URL: https://issues.apache.org/jira/browse/YARN-1618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-1618-1.patch, yarn-1618-2.patch
>
>
> YARN-891 augments the RMStateStore to store information on completed 
> applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
> This leads to the RM trying to update entries in the state-store that do not 
> exist. On ZKRMStateStore, this leads to the RM crashing. 
> Previous description:
> ZKRMStateStore fails to handle updates to znodes that don't exist. For 
> instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
> In these cases, the store should create the missing znode and handle the 
> update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-745) Move UnmanagedAMLauncher to yarn client package

2014-01-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883018#comment-13883018
 ] 

Bikas Saha commented on YARN-745:
-

That was the original plan of action for the unmanaged AM launcher. Its just 
specialization of yarnclient. Under a flag yarn client impl should be able to 
submit an unmanaged AM. However, running in-process or forking a new process 
should also be possible. Running in-process would be easier for debugging. 
Launching a separate process works for cases where people want to run their app 
in unmanaged mode (eg LAMA AM) Also when one already has an AM in jar then one 
could launch it in a process with java opts to enabled debugging instead of 
writing code to invoke YARNClient in unmanaged mode inside the AM.

> Move UnmanagedAMLauncher to yarn client package
> ---
>
> Key: YARN-745
> URL: https://issues.apache.org/jira/browse/YARN-745
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Fix For: 2.4.0
>
>
> Its currently sitting in yarn applications project which sounds wrong. client 
> project sounds better since it contains the utilities/libraries that clients 
> use to write and debug yarn applications.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1658) Webservice should redirect to active RM when HA is enabled.

2014-01-27 Thread Cindy Li (JIRA)
Cindy Li created YARN-1658:
--

 Summary: Webservice should redirect to active RM when HA is 
enabled.
 Key: YARN-1658
 URL: https://issues.apache.org/jira/browse/YARN-1658
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Cindy Li
Assignee: Cindy Li


When HA is enabled, web service to standby RM should be redirected to the 
active RM. This is a related Jira to YARN-1525.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1658) Webservice should redirect to active RM when HA is enabled.

2014-01-27 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883230#comment-13883230
 ] 

Karthik Kambatla commented on YARN-1658:


Shouldn't this be a part of YARN-1525? IOW, what do we plan to include here 
that doesn't go into YARN-1525? 

> Webservice should redirect to active RM when HA is enabled.
> ---
>
> Key: YARN-1658
> URL: https://issues.apache.org/jira/browse/YARN-1658
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Cindy Li
>Assignee: Cindy Li
>  Labels: YARN
>
> When HA is enabled, web service to standby RM should be redirected to the 
> active RM. This is a related Jira to YARN-1525.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue

2014-01-27 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-1582:


Attachment: YARN-1582-branch-0.23.patch

Preliminary patch for branch23.  The downside to this is that when the 
application first gets an application id they are told the cluster level 
setting which might be bigger then the per queue setting.  This shouldn't be a 
problem as the application just fails later on.  But it also causes the api to 
not be as clean.

> Capacity Scheduler: add a maximum-allocation-mb setting per queue 
> --
>
> Key: YARN-1582
> URL: https://issues.apache.org/jira/browse/YARN-1582
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.0.0, 0.23.10, 2.2.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Attachments: YARN-1582-branch-0.23.patch
>
>
> We want to allow certain queues to use larger container sizes while limiting 
> other queues to smaller container sizes.  Setting it per queue will help 
> prevent abuse, help limit the impact of reservations, and allow changes in 
> the maximum container size to be rolled out more easily.
> One reason this is needed is more application types are becoming available on 
> yarn and certain applications require more memory to run efficiently. While 
> we want to allow for that we don't want other applications to abuse that and 
> start requesting bigger containers then what they really need.  
> Note that we could have this based on application type, but that might not be 
> totally accurate either since for example you might want to allow certain 
> users on MapReduce to use larger containers, while limiting other users of 
> MapReduce to smaller containers.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue

2014-01-27 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883238#comment-13883238
 ] 

Thomas Graves commented on YARN-1582:
-

Note the patch attached leave the cluster level setting. The per queue settings 
must be less than or equal to the cluster level setting.  It also allows both 
the cluster level and per queue to be refreshed (yarn rmadmin -refreshQueues) 
as long as the value increases.  We can't allow it to decrease since we've told 
the AM's the max size and letting that decrease could mess them up.

> Capacity Scheduler: add a maximum-allocation-mb setting per queue 
> --
>
> Key: YARN-1582
> URL: https://issues.apache.org/jira/browse/YARN-1582
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.0.0, 0.23.10, 2.2.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Attachments: YARN-1582-branch-0.23.patch
>
>
> We want to allow certain queues to use larger container sizes while limiting 
> other queues to smaller container sizes.  Setting it per queue will help 
> prevent abuse, help limit the impact of reservations, and allow changes in 
> the maximum container size to be rolled out more easily.
> One reason this is needed is more application types are becoming available on 
> yarn and certain applications require more memory to run efficiently. While 
> we want to allow for that we don't want other applications to abuse that and 
> start requesting bigger containers then what they really need.  
> Note that we could have this based on application type, but that might not be 
> totally accurate either since for example you might want to allow certain 
> users on MapReduce to use larger containers, while limiting other users of 
> MapReduce to smaller containers.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever

2014-01-27 Thread Aditya Acharya (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Acharya updated YARN-1630:
-

Attachment: diff-1.txt

> Unbounded waiting for response in YarnClientImpl.java causes thread to hang 
> forever
> ---
>
> Key: YARN-1630
> URL: https://issues.apache.org/jira/browse/YARN-1630
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.2.0
>Reporter: Aditya Acharya
>Assignee: Aditya Acharya
> Attachments: diff-1.txt, diff.txt
>
>
> I ran an MR2 application that would have been long running, and killed it 
> programmatically using a YarnClient. The app was killed, but the client hung 
> forever. The message that I saw, which spammed the logs, was "Watiting for 
> application application_1389036507624_0018 to be killed."
> The RM log indicated that the app had indeed transitioned from RUNNING to 
> KILLED, but for some reason future responses to the RPC to kill the 
> application did not indicate that the app had been terminated.
> I tracked this down to YarnClientImpl.java, and though I was unable to 
> reproduce the bug, I wrote a patch to introduce a bound on the number of 
> times that YarnClientImpl retries the RPC before giving up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever

2014-01-27 Thread Aditya Acharya (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883242#comment-13883242
 ] 

Aditya Acharya commented on YARN-1630:
--

Added updated diff with requested changes.

> Unbounded waiting for response in YarnClientImpl.java causes thread to hang 
> forever
> ---
>
> Key: YARN-1630
> URL: https://issues.apache.org/jira/browse/YARN-1630
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.2.0
>Reporter: Aditya Acharya
>Assignee: Aditya Acharya
> Attachments: diff-1.txt, diff.txt
>
>
> I ran an MR2 application that would have been long running, and killed it 
> programmatically using a YarnClient. The app was killed, but the client hung 
> forever. The message that I saw, which spammed the logs, was "Watiting for 
> application application_1389036507624_0018 to be killed."
> The RM log indicated that the app had indeed transitioned from RUNNING to 
> KILLED, but for some reason future responses to the RPC to kill the 
> application did not indicate that the app had been terminated.
> I tracked this down to YarnClientImpl.java, and though I was unable to 
> reproduce the bug, I wrote a patch to introduce a bound on the number of 
> times that YarnClientImpl retries the RPC before giving up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1658) Webservice should redirect to active RM when HA is enabled.

2014-01-27 Thread Cindy Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883262#comment-13883262
 ] 

Cindy Li commented on YARN-1658:


Talked with Vinod@hortonworks offline. We would like to do this separately from 
the web UI part. 

> Webservice should redirect to active RM when HA is enabled.
> ---
>
> Key: YARN-1658
> URL: https://issues.apache.org/jira/browse/YARN-1658
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Cindy Li
>Assignee: Cindy Li
>  Labels: YARN
>
> When HA is enabled, web service to standby RM should be redirected to the 
> active RM. This is a related Jira to YARN-1525.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1636) Implement timeline related web-services inside AHS for storing and retrieving entities+eventies

2014-01-27 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1636:
--

Assignee: Zhijie Shen  (was: Vinod Kumar Vavilapalli)

> Implement timeline related web-services inside AHS for storing and retrieving 
> entities+eventies
> ---
>
> Key: YARN-1636
> URL: https://issues.apache.org/jira/browse/YARN-1636
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (YARN-1637) Implement a client library for java users to post entities+events

2014-01-27 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-1637:
-

Assignee: Zhijie Shen  (was: Vinod Kumar Vavilapalli)

> Implement a client library for java users to post entities+events
> -
>
> Key: YARN-1637
> URL: https://issues.apache.org/jira/browse/YARN-1637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
>
> This is a wrapper around the web-service to facilitate easy posting of 
> entity+event data to the time-line server.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1634) Define a ApplicationTimelineStore interface and an in-memory implementation

2014-01-27 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1634:
--

Assignee: Zhijie Shen  (was: Vinod Kumar Vavilapalli)

> Define a ApplicationTimelineStore interface and an in-memory implementation 
> 
>
> Key: YARN-1634
> URL: https://issues.apache.org/jira/browse/YARN-1634
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
>
> As per the design doc, the store needs to pluggable. We need a base 
> interface, and an in-memory implementation for testing.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1635) Implement a Leveldb based ApplicationTimelineStore

2014-01-27 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1635:
--

Assignee: Vinod Kumar Vavilapalli  (was: Zhijie Shen)

> Implement a Leveldb based ApplicationTimelineStore
> --
>
> Key: YARN-1635
> URL: https://issues.apache.org/jira/browse/YARN-1635
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> As per the design doc, we need a levelDB + local-filesystem based 
> implementation to start with and for small deployments.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1633) Define user-faced entity, entity-info and event objects

2014-01-27 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1633:
--

Summary: Define user-faced entity, entity-info and event objects  (was: 
Define the entity, entity-info and event objects)

> Define user-faced entity, entity-info and event objects
> ---
>
> Key: YARN-1633
> URL: https://issues.apache.org/jira/browse/YARN-1633
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
>
> Define the core objects of the application-timeline effort.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (YARN-1635) Implement a Leveldb based ApplicationTimelineStore

2014-01-27 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-1635:
-

Assignee: Zhijie Shen  (was: Vinod Kumar Vavilapalli)

> Implement a Leveldb based ApplicationTimelineStore
> --
>
> Key: YARN-1635
> URL: https://issues.apache.org/jira/browse/YARN-1635
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
>
> As per the design doc, we need a levelDB + local-filesystem based 
> implementation to start with and for small deployments.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (YARN-1633) Define the entity, entity-info and event objects

2014-01-27 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-1633:
-

Assignee: Zhijie Shen  (was: Vinod Kumar Vavilapalli)

> Define the entity, entity-info and event objects
> 
>
> Key: YARN-1633
> URL: https://issues.apache.org/jira/browse/YARN-1633
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
>
> Define the core objects of the application-timeline effort.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever

2014-01-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883269#comment-13883269
 ] 

Hadoop QA commented on YARN-1630:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625422/diff-1.txt
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2943//console

This message is automatically generated.

> Unbounded waiting for response in YarnClientImpl.java causes thread to hang 
> forever
> ---
>
> Key: YARN-1630
> URL: https://issues.apache.org/jira/browse/YARN-1630
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.2.0
>Reporter: Aditya Acharya
>Assignee: Aditya Acharya
> Attachments: diff-1.txt, diff.txt
>
>
> I ran an MR2 application that would have been long running, and killed it 
> programmatically using a YarnClient. The app was killed, but the client hung 
> forever. The message that I saw, which spammed the logs, was "Watiting for 
> application application_1389036507624_0018 to be killed."
> The RM log indicated that the app had indeed transitioned from RUNNING to 
> KILLED, but for some reason future responses to the RPC to kill the 
> application did not indicate that the app had been terminated.
> I tracked this down to YarnClientImpl.java, and though I was unable to 
> reproduce the bug, I wrote a patch to introduce a bound on the number of 
> times that YarnClientImpl retries the RPC before giving up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (YARN-1635) Implement a Leveldb based ApplicationTimelineStore

2014-01-27 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi reassigned YARN-1635:


Assignee: Billie Rinaldi  (was: Vinod Kumar Vavilapalli)

> Implement a Leveldb based ApplicationTimelineStore
> --
>
> Key: YARN-1635
> URL: https://issues.apache.org/jira/browse/YARN-1635
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Billie Rinaldi
>
> As per the design doc, we need a levelDB + local-filesystem based 
> implementation to start with and for small deployments.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1659) Define store-facing entity, entity-info and event objects

2014-01-27 Thread Billie Rinaldi (JIRA)
Billie Rinaldi created YARN-1659:


 Summary: Define store-facing entity, entity-info and event objects
 Key: YARN-1659
 URL: https://issues.apache.org/jira/browse/YARN-1659
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi


These will be used by ApplicationTimelineStore interface.  The web services 
will convert the store-facing obects to the user-facing objects.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-950) Ability to limit or avoid aggregating logs beyond a certain size

2014-01-27 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883294#comment-13883294
 ] 

Jason Lowe commented on YARN-950:
-

Ran into another case where a user filled a disk with a large stdout/stderr, 
and the NM took forever to recover the disk since it was trying to aggregate 
the huge file to HDFS.  Not only was this a waste of HDFS space and network 
bandwidth, but ops were unable to manually recover easily by removing the large 
logfile.  The NM process was holding the file open during log aggregation, so 
the disk space was not able to be freed until either the NM finished 
aggregating or the NM process exited.

Many users would prefer the ability to grab a configurable number of bytes at 
the head of a large log and a number of bytes at the end of the large log.  Of 
course the NM would need to inject some text into the log to indicate it was 
truncated, and bonus points if it includes the original log size and/or the 
amount that was truncated.

> Ability to limit or avoid aggregating logs beyond a certain size
> 
>
> Key: YARN-950
> URL: https://issues.apache.org/jira/browse/YARN-950
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 0.23.9
>Reporter: Jason Lowe
>
> It would be nice if ops could configure a cluster such that any container log 
> beyond a configured size would either only have a portion of the log 
> aggregated or not aggregated at all.  This would help speed up the recovery 
> path for cases where a container creates an enormous log and fills a disk, as 
> currently it tries to aggregate the entire, enormous log rather than only 
> aggregating a small portion or simply deleting it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1629) IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer

2014-01-27 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1629:
-

Target Version/s: 2.3.0

> IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer
> --
>
> Key: YARN-1629
> URL: https://issues.apache.org/jira/browse/YARN-1629
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-1629-1.patch, YARN-1629-2.patch, YARN-1629.patch
>
>
> This can occur when the second-to-last app in a queue's pending app list is 
> made runnable.  The app is pulled out from under the iterator. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1600) RM does not startup when security is enabled without spnego configured

2014-01-27 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883388#comment-13883388
 ] 

Jason Lowe commented on YARN-1600:
--

+1, lgtm.  Will commit this shortly.

> RM does not startup when security is enabled without spnego configured
> --
>
> Key: YARN-1600
> URL: https://issues.apache.org/jira/browse/YARN-1600
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: YARN-1600.000.patch
>
>
> We have a custom auth filter in front of our various UI pages that handles 
> user authentication.  However currently the RM assumes that if security is 
> enabled then the user must have configured spnego as well for the RM web 
> pages which is not true in our case.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1600) RM does not startup when security is enabled without spnego configured

2014-01-27 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883390#comment-13883390
 ] 

Jason Lowe commented on YARN-1600:
--

On second thought, holding off the commit until the recent branch-2.3 
re-swizzle is sorted out.

> RM does not startup when security is enabled without spnego configured
> --
>
> Key: YARN-1600
> URL: https://issues.apache.org/jira/browse/YARN-1600
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: YARN-1600.000.patch
>
>
> We have a custom auth filter in front of our various UI pages that handles 
> user authentication.  However currently the RM assumes that if security is 
> enabled then the user must have configured spnego as well for the RM web 
> pages which is not true in our case.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1600) RM does not startup when security is enabled without spnego configured

2014-01-27 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1600:
-

 Target Version/s: 2.3.0  (was: 2.4.0)
Affects Version/s: (was: 2.4.0)
   2.3.0
 Hadoop Flags: Reviewed

> RM does not startup when security is enabled without spnego configured
> --
>
> Key: YARN-1600
> URL: https://issues.apache.org/jira/browse/YARN-1600
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: YARN-1600.000.patch
>
>
> We have a custom auth filter in front of our various UI pages that handles 
> user authentication.  However currently the RM assumes that if security is 
> enabled then the user must have configured spnego as well for the RM web 
> pages which is not true in our case.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1642) RMDTRenewer#getRMClient should use ClientRMProxy

2014-01-27 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883392#comment-13883392
 ] 

Sandy Ryza commented on YARN-1642:
--

+1

> RMDTRenewer#getRMClient should use ClientRMProxy
> 
>
> Key: YARN-1642
> URL: https://issues.apache.org/jira/browse/YARN-1642
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-1642-1.patch
>
>
> RMDTRenewer#getRMClient gets a proxy to the RM in the conf directly instead 
> of going through ClientRMProxy. 
> {code}
>   final YarnRPC rpc = YarnRPC.create(conf);
>   return 
> (ApplicationClientProtocol)rpc.getProxy(ApplicationClientProtocol.class, 
> addr, conf);
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1600) RM does not startup when security is enabled without spnego configured

2014-01-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883470#comment-13883470
 ] 

Hadoop QA commented on YARN-1600:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625140/YARN-1600.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2944//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2944//console

This message is automatically generated.

> RM does not startup when security is enabled without spnego configured
> --
>
> Key: YARN-1600
> URL: https://issues.apache.org/jira/browse/YARN-1600
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Haohui Mai
>Priority: Blocker
> Attachments: YARN-1600.000.patch
>
>
> We have a custom auth filter in front of our various UI pages that handles 
> user authentication.  However currently the RM assumes that if security is 
> enabled then the user must have configured spnego as well for the RM web 
> pages which is not true in our case.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store

2014-01-27 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883536#comment-13883536
 ] 

Karthik Kambatla commented on YARN-1618:


Made this a blocker for 2.3, as this leads to the RM going down when recovery 
is enabled. 

> Applications transition from NEW to FINAL_SAVING, and try to update 
> non-existing entries in the state-store
> ---
>
> Key: YARN-1618
> URL: https://issues.apache.org/jira/browse/YARN-1618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-1618-1.patch, yarn-1618-2.patch
>
>
> YARN-891 augments the RMStateStore to store information on completed 
> applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
> This leads to the RM trying to update entries in the state-store that do not 
> exist. On ZKRMStateStore, this leads to the RM crashing. 
> Previous description:
> ZKRMStateStore fails to handle updates to znodes that don't exist. For 
> instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
> In these cases, the store should create the missing znode and handle the 
> update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store

2014-01-27 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1618:
---

Target Version/s: 2.3.0  (was: 2.4.0)

> Applications transition from NEW to FINAL_SAVING, and try to update 
> non-existing entries in the state-store
> ---
>
> Key: YARN-1618
> URL: https://issues.apache.org/jira/browse/YARN-1618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-1618-1.patch, yarn-1618-2.patch
>
>
> YARN-891 augments the RMStateStore to store information on completed 
> applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
> This leads to the RM trying to update entries in the state-store that do not 
> exist. On ZKRMStateStore, this leads to the RM crashing. 
> Previous description:
> ZKRMStateStore fails to handle updates to znodes that don't exist. For 
> instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
> In these cases, the store should create the missing znode and handle the 
> update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever

2014-01-27 Thread Aditya Acharya (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Acharya updated YARN-1630:
-

Attachment: (was: diff-1.txt)

> Unbounded waiting for response in YarnClientImpl.java causes thread to hang 
> forever
> ---
>
> Key: YARN-1630
> URL: https://issues.apache.org/jira/browse/YARN-1630
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.2.0
>Reporter: Aditya Acharya
>Assignee: Aditya Acharya
> Attachments: diff-1.txt, diff.txt
>
>
> I ran an MR2 application that would have been long running, and killed it 
> programmatically using a YarnClient. The app was killed, but the client hung 
> forever. The message that I saw, which spammed the logs, was "Watiting for 
> application application_1389036507624_0018 to be killed."
> The RM log indicated that the app had indeed transitioned from RUNNING to 
> KILLED, but for some reason future responses to the RPC to kill the 
> application did not indicate that the app had been terminated.
> I tracked this down to YarnClientImpl.java, and though I was unable to 
> reproduce the bug, I wrote a patch to introduce a bound on the number of 
> times that YarnClientImpl retries the RPC before giving up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever

2014-01-27 Thread Aditya Acharya (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Acharya updated YARN-1630:
-

Attachment: diff-1.txt

> Unbounded waiting for response in YarnClientImpl.java causes thread to hang 
> forever
> ---
>
> Key: YARN-1630
> URL: https://issues.apache.org/jira/browse/YARN-1630
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.2.0
>Reporter: Aditya Acharya
>Assignee: Aditya Acharya
> Attachments: diff-1.txt, diff.txt
>
>
> I ran an MR2 application that would have been long running, and killed it 
> programmatically using a YarnClient. The app was killed, but the client hung 
> forever. The message that I saw, which spammed the logs, was "Watiting for 
> application application_1389036507624_0018 to be killed."
> The RM log indicated that the app had indeed transitioned from RUNNING to 
> KILLED, but for some reason future responses to the RPC to kill the 
> application did not indicate that the app had been terminated.
> I tracked this down to YarnClientImpl.java, and though I was unable to 
> reproduce the bug, I wrote a patch to introduce a bound on the number of 
> times that YarnClientImpl retries the RPC before giving up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever

2014-01-27 Thread Aditya Acharya (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883585#comment-13883585
 ] 

Aditya Acharya commented on YARN-1630:
--

Updated  patch, including a unit test this time.

> Unbounded waiting for response in YarnClientImpl.java causes thread to hang 
> forever
> ---
>
> Key: YARN-1630
> URL: https://issues.apache.org/jira/browse/YARN-1630
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.2.0
>Reporter: Aditya Acharya
>Assignee: Aditya Acharya
> Attachments: diff-1.txt, diff.txt
>
>
> I ran an MR2 application that would have been long running, and killed it 
> programmatically using a YarnClient. The app was killed, but the client hung 
> forever. The message that I saw, which spammed the logs, was "Watiting for 
> application application_1389036507624_0018 to be killed."
> The RM log indicated that the app had indeed transitioned from RUNNING to 
> KILLED, but for some reason future responses to the RPC to kill the 
> application did not indicate that the app had been terminated.
> I tracked this down to YarnClientImpl.java, and though I was unable to 
> reproduce the bug, I wrote a patch to introduce a bound on the number of 
> times that YarnClientImpl retries the RPC before giving up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store

2014-01-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883587#comment-13883587
 ] 

Bikas Saha commented on YARN-1618:
--

Is this related? Does not look like a compatible change. If it was valid 
earlier then we should not change the logic now.
{code}
-Assert.assertTrue("application finish time is not greater then 0",
-(application.getFinishTime() > 0)); 
+Assert.assertTrue("application start time is less than 0",
+(application.getStartTime() >= 0));
{code}

I am not sure this would happen in real life since only a START event would 
trigger going to the scheduler and introduce the possibility of a REJECTED 
event. If that is the case then this transition should not exist since it would 
be a bug if this got triggered.
{code}
+.addTransition(RMAppState.NEW, RMAppState.FAILED,
+RMAppEventType.APP_REJECTED, new AppRejectedTransition())
{code}

We should add a testAppNewKilled() test and possibly remove the 
testAppNewReject() if the previous comment is correct.


> Applications transition from NEW to FINAL_SAVING, and try to update 
> non-existing entries in the state-store
> ---
>
> Key: YARN-1618
> URL: https://issues.apache.org/jira/browse/YARN-1618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-1618-1.patch, yarn-1618-2.patch
>
>
> YARN-891 augments the RMStateStore to store information on completed 
> applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
> This leads to the RM trying to update entries in the state-store that do not 
> exist. On ZKRMStateStore, this leads to the RM crashing. 
> Previous description:
> ZKRMStateStore fails to handle updates to znodes that don't exist. For 
> instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
> In these cases, the store should create the missing znode and handle the 
> update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store

2014-01-27 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883598#comment-13883598
 ] 

Karthik Kambatla commented on YARN-1618:


bq. Is this related? Does not look like a compatible change. If it was valid 
earlier then we should not change the logic now.
This isn't related. However, the test fails for me on trunk too occasionally. I 
can leave the fix out.

Agree NEW -> FAILED shouldn't exist. Thanks for catching this. Will fix up the 
patch shortly. 

> Applications transition from NEW to FINAL_SAVING, and try to update 
> non-existing entries in the state-store
> ---
>
> Key: YARN-1618
> URL: https://issues.apache.org/jira/browse/YARN-1618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-1618-1.patch, yarn-1618-2.patch
>
>
> YARN-891 augments the RMStateStore to store information on completed 
> applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
> This leads to the RM trying to update entries in the state-store that do not 
> exist. On ZKRMStateStore, this leads to the RM crashing. 
> Previous description:
> ZKRMStateStore fails to handle updates to znodes that don't exist. For 
> instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
> In these cases, the store should create the missing znode and handle the 
> update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (YARN-1655) [YARN-1197] Add implementations to FairScheduler to support increase/decrease container resource

2014-01-27 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza reassigned YARN-1655:


Assignee: Sandy Ryza

> [YARN-1197] Add implementations to FairScheduler to support increase/decrease 
> container resource
> 
>
> Key: YARN-1655
> URL: https://issues.apache.org/jira/browse/YARN-1655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Sandy Ryza
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1630) Unbounded waiting for response in YarnClientImpl.java causes thread to hang forever

2014-01-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883614#comment-13883614
 ] 

Hadoop QA commented on YARN-1630:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625490/diff-1.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

org.apache.hadoop.yarn.client.api.impl.TestNMClient

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2945//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2945//console

This message is automatically generated.

> Unbounded waiting for response in YarnClientImpl.java causes thread to hang 
> forever
> ---
>
> Key: YARN-1630
> URL: https://issues.apache.org/jira/browse/YARN-1630
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.2.0
>Reporter: Aditya Acharya
>Assignee: Aditya Acharya
> Attachments: diff-1.txt, diff.txt
>
>
> I ran an MR2 application that would have been long running, and killed it 
> programmatically using a YarnClient. The app was killed, but the client hung 
> forever. The message that I saw, which spammed the logs, was "Watiting for 
> application application_1389036507624_0018 to be killed."
> The RM log indicated that the app had indeed transitioned from RUNNING to 
> KILLED, but for some reason future responses to the RPC to kill the 
> application did not indicate that the app had been terminated.
> I tracked this down to YarnClientImpl.java, and though I was unable to 
> reproduce the bug, I wrote a patch to introduce a bound on the number of 
> times that YarnClientImpl retries the RPC before giving up.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1660) add the ability to set yarn.resourcemanager.hostname.rm-id instead of setting all the various host:port properties for RM

2014-01-27 Thread Arpit Gupta (JIRA)
Arpit Gupta created YARN-1660:
-

 Summary: add the ability to set 
yarn.resourcemanager.hostname.rm-id instead of setting all the various 
host:port properties for RM
 Key: YARN-1660
 URL: https://issues.apache.org/jira/browse/YARN-1660
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Arpit Gupta


Currently the user has to specify all the various host:port properties for RM. 
We should follow the pattern that we do for non HA setup where we can specify 
yarn.resourcemanager.hostname.rm-id and the defaults are used for all other 
affected properties.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1660) add the ability to set yarn.resourcemanager.hostname.rm-id instead of setting all the various host:port properties for RM

2014-01-27 Thread Arpit Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883658#comment-13883658
 ] 

Arpit Gupta commented on YARN-1660:
---

Here is a list of properties that one needs to set for each rm

yarn.resourcemanager.address.rm1
yarn.resourcemanager.scheduler.address.rm1
yarn.resourcemanager.webapp.address.rm1
yarn.resourcemanager.webapp.https.address.rm1
yarn.resourcemanager.resource-tracker.address.rm1
yarn.resourcemanager.admin.address.rm1
yarn.resourcemanager.ha.admin.address.rm1

> add the ability to set yarn.resourcemanager.hostname.rm-id instead of setting 
> all the various host:port properties for RM
> -
>
> Key: YARN-1660
> URL: https://issues.apache.org/jira/browse/YARN-1660
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Arpit Gupta
>
> Currently the user has to specify all the various host:port properties for 
> RM. We should follow the pattern that we do for non HA setup where we can 
> specify yarn.resourcemanager.hostname.rm-id and the defaults are used for all 
> other affected properties.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store

2014-01-27 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883659#comment-13883659
 ] 

Karthik Kambatla commented on YARN-1618:


bq. I am not sure this would happen in real life since only a START event would 
trigger going to the scheduler and introduce the possibility of a REJECTED 
event.
Actually, looking at all the places an APP_REJECTED event is called, found that 
YARN-674 triggers APP_REJECTED on NEW. We could either change this to KILL, or 
update our comments in RMAppEventType to reflect that APP_REJECTED could come 
from places other than the scheduler. [~bikassaha] - thoughts? 

> Applications transition from NEW to FINAL_SAVING, and try to update 
> non-existing entries in the state-store
> ---
>
> Key: YARN-1618
> URL: https://issues.apache.org/jira/browse/YARN-1618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-1618-1.patch, yarn-1618-2.patch
>
>
> YARN-891 augments the RMStateStore to store information on completed 
> applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
> This leads to the RM trying to update entries in the state-store that do not 
> exist. On ZKRMStateStore, this leads to the RM crashing. 
> Previous description:
> ZKRMStateStore fails to handle updates to znodes that don't exist. For 
> instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
> In these cases, the store should create the missing znode and handle the 
> update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1660) add the ability to set yarn.resourcemanager.hostname.rm-id instead of setting all the various host:port properties for RM

2014-01-27 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883660#comment-13883660
 ] 

Karthik Kambatla commented on YARN-1660:


+1 to doing this. Thanks for filing this, Arpit. 

> add the ability to set yarn.resourcemanager.hostname.rm-id instead of setting 
> all the various host:port properties for RM
> -
>
> Key: YARN-1660
> URL: https://issues.apache.org/jira/browse/YARN-1660
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Arpit Gupta
>
> Currently the user has to specify all the various host:port properties for 
> RM. We should follow the pattern that we do for non HA setup where we can 
> specify yarn.resourcemanager.hostname.rm-id and the defaults are used for all 
> other affected properties.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1660) add the ability to set yarn.resourcemanager.hostname.rm-id instead of setting all the various host:port properties for RM

2014-01-27 Thread Arpit Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Gupta updated YARN-1660:
--

Assignee: Xuan Gong

> add the ability to set yarn.resourcemanager.hostname.rm-id instead of setting 
> all the various host:port properties for RM
> -
>
> Key: YARN-1660
> URL: https://issues.apache.org/jira/browse/YARN-1660
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Arpit Gupta
>Assignee: Xuan Gong
>
> Currently the user has to specify all the various host:port properties for 
> RM. We should follow the pattern that we do for non HA setup where we can 
> specify yarn.resourcemanager.hostname.rm-id and the defaults are used for all 
> other affected properties.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1661) AppMaster logs says failing even if an application does succeed.

2014-01-27 Thread Tassapol Athiapinya (JIRA)
Tassapol Athiapinya created YARN-1661:
-

 Summary: AppMaster logs says failing even if an application does 
succeed.
 Key: YARN-1661
 URL: https://issues.apache.org/jira/browse/YARN-1661
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.4.0
Reporter: Tassapol Athiapinya
 Fix For: 2.4.0


Run:
/usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
 -shell_command ls

Open AM logs. Last line would indicate AM failure even though container logs 
print good ls result.

{code}
2014-01-24 21:45:29,592 INFO  [main] distributedshell.ApplicationMaster 
(ApplicationMaster.java:finish(599)) - Application completed. Signalling finish 
to RM
2014-01-24 21:45:29,612 INFO  [main] impl.AMRMClientImpl 
(AMRMClientImpl.java:unregisterApplicationMaster(315)) - Waiting for 
application to be successfully unregistered.
2014-01-24 21:45:29,816 INFO  [main] distributedshell.ApplicationMaster 
(ApplicationMaster.java:main(267)) - Application Master failed. exiting
{code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store

2014-01-27 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1618:
---

Attachment: yarn-1618-3.patch

The patch that retains receiving APP_REJECTED on NEW. Fixed tests accordingly. 

> Applications transition from NEW to FINAL_SAVING, and try to update 
> non-existing entries in the state-store
> ---
>
> Key: YARN-1618
> URL: https://issues.apache.org/jira/browse/YARN-1618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-1618-1.patch, yarn-1618-2.patch, yarn-1618-3.patch
>
>
> YARN-891 augments the RMStateStore to store information on completed 
> applications. In the process, it adds transitions from NEW to FINAL_SAVING. 
> This leads to the RM trying to update entries in the state-store that do not 
> exist. On ZKRMStateStore, this leads to the RM crashing. 
> Previous description:
> ZKRMStateStore fails to handle updates to znodes that don't exist. For 
> instance, this can happen when an app transitions from NEW to FINAL_SAVING. 
> In these cases, the store should create the missing znode and handle the 
> update.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1578) Fix how to handle ApplicationHistory about the container

2014-01-27 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883816#comment-13883816
 ] 

Zhijie Shen commented on YARN-1578:
---

It should be fine if the container is not finished. In this case, the finish 
information will not be persisted into the store. Then, in the history file, 
the history information entry should not exist.

However, the exception showed in the description indicates either historyData 
or finishData is null. historyData cannot be null, because it's just 
constructed in this method. Then, finishData is only one to be nullable, though 
it shouldn't be, because mergeContainerHistoryData is called when the finish 
information entry is found.

Would you please debug the following code in FileSystemApplicationHistorySever 
again? Or would you please provide more log about the bug?
{code}
  while ((!readStartData || !readFinishData) && hfReader.hasNext()) {
HistoryFileReader.Entry entry = hfReader.next();
if (entry.key.id.equals(containerId.toString())) {
  if (entry.key.suffix.equals(START_DATA_SUFFIX)) {
ContainerStartData startData =
parseContainerStartData(entry.value);
mergeContainerHistoryData(historyData, startData);
readStartData = true;
  } else if (entry.key.suffix.equals(FINISH_DATA_SUFFIX)) {
ContainerFinishData finishData =
parseContainerFinishData(entry.value);
mergeContainerHistoryData(historyData, finishData);
readFinishData = true;
  }
}
  }
{code}

> Fix how to handle ApplicationHistory about the container
> 
>
> Key: YARN-1578
> URL: https://issues.apache.org/jira/browse/YARN-1578
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-321
>Reporter: Shinichi Yamashita
>Assignee: Shinichi Yamashita
> Attachments: YARN-1578.patch, screenshot.png
>
>
> I carried out PiEstimator job at Hadoop cluster which applied YARN-321.
> After the job end and when I accessed Web UI of HistoryServer, it displayed 
> "500". And HistoryServer daemon log was output as follows.
> {code}
> 2014-01-09 13:31:12,227 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
> handling URI: 
> /applicationhistory/appattempt/appattempt_1389146249925_0008_01
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> (snip...)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.mergeContainerHistoryData(FileSystemApplicationHistoryStore.java:696)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getContainers(FileSystemApplicationHistoryStore.java:429)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainers(ApplicationHistoryManagerImpl.java:201)
> at 
> org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:110)
> (snip...)
> {code}
> I confirmed that there was container which was not finished from 
> ApplicationHistory file.
> In ResourceManager daemon log, ResourceManager reserved this container, but 
> did not allocate it.
> Therefore, about a container which is not allocated, it is necessary to 
> change how to handle in ApplicationHistory.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-925) Augment HistoryStorage Reader Interface to Support Filters When Getting Applications

2014-01-27 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883828#comment-13883828
 ] 

Zhijie Shen commented on YARN-925:
--

[~sinchii], thanks for taking care of the filters. I had quick look at the 
patch. IMO, it's on the right track. However, the major task of this issue is 
to optimize the filtering in the implementation of application history store, 
in particular FileSystemApplicationHistoryStore. The current patch still reads 
each individual history file and loads the full historical information of an 
application, followed by a number of filtering conditions. It doesn't make the 
difference with doing this filtering in ApplicationHistoryManager. Given a 
million history files, it will be a disaster to read all of them.

By pushing the filters back to the implementation of application history store, 
I suppose that the implementation knows best about how the historic data is 
stored, and we can do optimization here. In the FS implementation, ideally, we 
should build an index in some way, and only read the historical files that hit 
the filters.

> Augment HistoryStorage Reader Interface to Support Filters When Getting 
> Applications
> 
>
> Key: YARN-925
> URL: https://issues.apache.org/jira/browse/YARN-925
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Fix For: YARN-321
>
> Attachments: YARN-925-1.patch, YARN-925-2.patch, YARN-925-3.patch, 
> YARN-925-4.patch, YARN-925-5.patch, YARN-925-6.patch, YARN-925-7.patch, 
> YARN-925-8.patch
>
>
> We need to allow filter parameters for getApplications, pushing filtering to 
> the implementations of the interface. The implementations should know the 
> best about optimizing filtering. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-925) Augment HistoryStorage Reader Interface to Support Filters When Getting Applications

2014-01-27 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-925:
-

Assignee: Shinichi Yamashita  (was: Mayank Bansal)

> Augment HistoryStorage Reader Interface to Support Filters When Getting 
> Applications
> 
>
> Key: YARN-925
> URL: https://issues.apache.org/jira/browse/YARN-925
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Mayank Bansal
>Assignee: Shinichi Yamashita
> Fix For: YARN-321
>
> Attachments: YARN-925-1.patch, YARN-925-2.patch, YARN-925-3.patch, 
> YARN-925-4.patch, YARN-925-5.patch, YARN-925-6.patch, YARN-925-7.patch, 
> YARN-925-8.patch
>
>
> We need to allow filter parameters for getApplications, pushing filtering to 
> the implementations of the interface. The implementations should know the 
> best about optimizing filtering. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (YARN-1662) Capacity Scheduler reservation issue cause Job Hang

2014-01-27 Thread Sunil G (JIRA)
Sunil G created YARN-1662:
-

 Summary: Capacity Scheduler reservation issue cause Job Hang
 Key: YARN-1662
 URL: https://issues.apache.org/jira/browse/YARN-1662
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0
 Environment: Suse 11 SP1 + Linux
Reporter: Sunil G


There are 2 node managers in my cluster.
NM1 with 8GB
NM2 with 8GB

I am submitting a Job with below details:
AM with 2GB
Map needs 5GB
Reducer needs 3GB
slowstart is enabled with 0.5
10maps and 50reducers are assigned.

5maps are completed. Now few reducers got scheduled.

Now NM1 has 2GB AM and 3Gb Reducer_1[Used 5GB]
NM2 has 3Gb Reducer_2[Used 3GB]

A Map has now reserved(5GB) in NM1 which has only 3Gb free.
It hangs forever.

Potential issue is, reservation is now blocked in NM1 for a Map which needs 5GB.
But the Reducer_1 hangs by waiting for few map ouputs.

Reducer side preemption also not happened as few headroom is still available.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1639) YARM RM HA requires different configs on different RM hosts

2014-01-27 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883863#comment-13883863
 ] 

Xuan Gong commented on YARN-1639:
-

My proposal:
1. yarn.resourcemanager.ha.id will become optional. 
2. When RM starts(if ha.id is not specified), it will automatically figure out 
its rm_id by using checking whether RM_ADDRESS is matched with its local 
address. For example, the RM local_address is 1.1.1.1, and we have 
configuration for yarn.resourcemanager.address_rm2 is 1.1.1.1. So, this rm can 
figure out its rm_id as rm2. (There is an assumption here. One node can only 
launch one RM.)
3. We can still explicitly specify the ha.id. If this value is explicitly 
specified, the rm can use this value directly. It is mostly for testing 
purpose, such as MiniYarnCluster, etc.

> YARM RM HA requires different configs on different RM hosts
> ---
>
> Key: YARN-1639
> URL: https://issues.apache.org/jira/browse/YARN-1639
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Arpit Gupta
>Assignee: Xuan Gong
>
> We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you 
> want to first or second.
> This means we have different configs on different RM nodes. This is unlike 
> HDFS HA where the same configs are pushed to both NN's and it would be better 
> to have the same setup for RM as this would make installation and managing 
> easier.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1639) YARM RM HA requires different configs on different RM hosts

2014-01-27 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883864#comment-13883864
 ] 

Xuan Gong commented on YARN-1639:
-

Tested the patch in a two-node HA cluster

> YARM RM HA requires different configs on different RM hosts
> ---
>
> Key: YARN-1639
> URL: https://issues.apache.org/jira/browse/YARN-1639
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Arpit Gupta
>Assignee: Xuan Gong
> Attachments: YARN-1639.1.patch
>
>
> We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you 
> want to first or second.
> This means we have different configs on different RM nodes. This is unlike 
> HDFS HA where the same configs are pushed to both NN's and it would be better 
> to have the same setup for RM as this would make installation and managing 
> easier.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1639) YARM RM HA requires different configs on different RM hosts

2014-01-27 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1639:


Attachment: YARN-1639.1.patch

> YARM RM HA requires different configs on different RM hosts
> ---
>
> Key: YARN-1639
> URL: https://issues.apache.org/jira/browse/YARN-1639
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Arpit Gupta
>Assignee: Xuan Gong
> Attachments: YARN-1639.1.patch
>
>
> We need to set yarn.resourcemanager.ha.id to rm1 or rm2 based on which rm you 
> want to first or second.
> This means we have different configs on different RM nodes. This is unlike 
> HDFS HA where the same configs are pushed to both NN's and it would be better 
> to have the same setup for RM as this would make installation and managing 
> easier.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)