[jira] [Commented] (YARN-3236) cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.

2015-02-19 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328641#comment-14328641
 ] 

zhihai xu commented on YARN-3236:
-

This is a code cleanup(remove unused variable), I think a test case is not 
needed.

> cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
> -
>
> Key: YARN-3236
> URL: https://issues.apache.org/jira/browse/YARN-3236
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Trivial
>  Labels: cleanup, maintenance
> Attachments: YARN-3236.000.patch
>
>
> cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
> RMAuthenticationFilter#AUTH_HANDLER_PROPERTY is added in YARN-2247. but the 
> code which use  AUTH_HANDLER_PROPERTY is removed at YARN-2656. We would 
> better remove it to avoid confusion since it is only introduced for a very 
> short time and no one use it now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3236) cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.

2015-02-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328638#comment-14328638
 ] 

Hadoop QA commented on YARN-3236:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699819/YARN-3236.000.patch
  against trunk revision c0d9b93.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6680//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6680//console

This message is automatically generated.

> cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
> -
>
> Key: YARN-3236
> URL: https://issues.apache.org/jira/browse/YARN-3236
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Trivial
>  Labels: cleanup, maintenance
> Attachments: YARN-3236.000.patch
>
>
> cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
> RMAuthenticationFilter#AUTH_HANDLER_PROPERTY is added in YARN-2247. but the 
> code which use  AUTH_HANDLER_PROPERTY is removed at YARN-2656. We would 
> better remove it to avoid confusion since it is only introduced for a very 
> short time and no one use it now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-02-19 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328631#comment-14328631
 ] 

Sunil G commented on YARN-3225:
---

Yes [~devaraj.k]. Thank you for the clarification.

> New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
> ---
>
> Key: YARN-3225
> URL: https://issues.apache.org/jira/browse/YARN-3225
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Devaraj K
>
> New CLI (or existing CLI with parameters) should put each node on 
> decommission list to decommissioning status and track timeout to terminate 
> the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-02-19 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328629#comment-14328629
 ] 

Devaraj K commented on YARN-3225:
-

Thanks [~djp] for clarification.

> New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
> ---
>
> Key: YARN-3225
> URL: https://issues.apache.org/jira/browse/YARN-3225
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Devaraj K
>
> New CLI (or existing CLI with parameters) should put each node on 
> decommission list to decommissioning status and track timeout to terminate 
> the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-02-19 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328627#comment-14328627
 ] 

Devaraj K commented on YARN-3225:
-

I see the same mentioned in the design doc 
https://issues.apache.org/jira/secure/attachment/12699496/GracefullyDecommissionofNodeManagerv3.pdf
{quote} Before NMs get decommissioned, the timeout can be updated to shorter or
longer. e.g. admin can terminate the CLI and resubmit it with a different 
timeout
value.{quote}


> New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
> ---
>
> Key: YARN-3225
> URL: https://issues.apache.org/jira/browse/YARN-3225
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Devaraj K
>
> New CLI (or existing CLI with parameters) should put each node on 
> decommission list to decommissioning status and track timeout to terminate 
> the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3195) [YARN]Missing uniformity In Yarn Queue CLI command

2015-02-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328619#comment-14328619
 ] 

Hadoop QA commented on YARN-3195:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699685/YARN-3195.patch
  against trunk revision c0d9b93.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  org.apache.hadoop.yarn.client.cli.TestLogsCLI
  org.apache.hadoop.yarn.client.cli.TestYarnCLI

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

org.apache.hadoop.yarn.client.TestResourceTrackerOnHA

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6679//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6679//console

This message is automatically generated.

> [YARN]Missing uniformity  In Yarn Queue CLI command
> ---
>
> Key: YARN-3195
> URL: https://issues.apache.org/jira/browse/YARN-3195
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.6.0
> Environment: SUSE Linux SP3
>Reporter: Jagadesh Kiran N
>Assignee: Jagadesh Kiran N
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: Helptobe removed in Queue.png, YARN-3195.patch
>
>
> Help is generic command should not be placed here because of this uniformity 
> is missing compared to other commands.Remove -help command inside ./yarn 
> queue as uniformity with respect to other commands 
> {code}
> SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn 
> queue -help
> 15/02/13 19:30:20 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> usage: queue
> * -help  Displays help for all commands.*
>  -statusList queue information about given queue.
> SO486LDPag65:/home/OpenSource/HA/install/hadoop/resourcemanager/bin # ./yarn 
> queue
> 15/02/13 19:33:14 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> Invalid Command Usage :
> usage: queue
> * -help  Displays help for all commands.*
>  -statusList queue information about given queue.
> {code}
> * -help  Displays help for all commands.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3003) Provide API for client to retrieve label to node mapping

2015-02-19 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena resolved YARN-3003.

Resolution: Fixed

Thanks [~tedyu] for reporting.
Resolving it as fixed by YARN-3075 and YARN-3076. Not sure if need to be marked 
as Duplicate or some other resolution status. 

> Provide API for client to retrieve label to node mapping
> 
>
> Key: YARN-3003
> URL: https://issues.apache.org/jira/browse/YARN-3003
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Ted Yu
>Assignee: Varun Saxena
> Attachments: YARN-3003.001.patch, YARN-3003.002.patch
>
>
> Currently YarnClient#getNodeToLabels() returns the mapping from NodeId to set 
> of labels associated with the node.
> Client (such as Slider) may be interested in label to node mapping - given 
> label, return the nodes with this label.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3076) Add API/Implementation to YarnClient to retrieve label-to-node mapping

2015-02-19 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328617#comment-14328617
 ] 

Varun Saxena commented on YARN-3076:


Thanks [~leftnoteasy] for the review and commit.

> Add API/Implementation to YarnClient to retrieve label-to-node mapping
> --
>
> Key: YARN-3076
> URL: https://issues.apache.org/jira/browse/YARN-3076
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 2.7.0
>
> Attachments: YARN-3076.001.patch, YARN-3076.002.patch, 
> YARN-3076.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3236) cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.

2015-02-19 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3236:

Labels: maintenance  (was: )

> cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
> -
>
> Key: YARN-3236
> URL: https://issues.apache.org/jira/browse/YARN-3236
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Trivial
>  Labels: cleanup, maintenance
> Attachments: YARN-3236.000.patch
>
>
> cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
> RMAuthenticationFilter#AUTH_HANDLER_PROPERTY is added in YARN-2247. but the 
> code which use  AUTH_HANDLER_PROPERTY is removed at YARN-2656. We would 
> better remove it to avoid confusion since it is only introduced for a very 
> short time and no one use it now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3236) cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.

2015-02-19 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3236:

Labels: cleanup maintenance  (was: maintenance)

> cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
> -
>
> Key: YARN-3236
> URL: https://issues.apache.org/jira/browse/YARN-3236
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Trivial
>  Labels: cleanup, maintenance
> Attachments: YARN-3236.000.patch
>
>
> cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
> RMAuthenticationFilter#AUTH_HANDLER_PROPERTY is added in YARN-2247. but the 
> code which use  AUTH_HANDLER_PROPERTY is removed at YARN-2656. We would 
> better remove it to avoid confusion since it is only introduced for a very 
> short time and no one use it now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3236) cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.

2015-02-19 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3236:

Issue Type: Improvement  (was: Bug)

> cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
> -
>
> Key: YARN-3236
> URL: https://issues.apache.org/jira/browse/YARN-3236
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Trivial
> Attachments: YARN-3236.000.patch
>
>
> cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
> RMAuthenticationFilter#AUTH_HANDLER_PROPERTY is added in YARN-2247. but the 
> code which use  AUTH_HANDLER_PROPERTY is removed at YARN-2656. We would 
> better remove it to avoid confusion since it is only introduced for a very 
> short time and no one use it now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3236) cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.

2015-02-19 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3236:

Attachment: YARN-3236.000.patch

> cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
> -
>
> Key: YARN-3236
> URL: https://issues.apache.org/jira/browse/YARN-3236
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Trivial
> Attachments: YARN-3236.000.patch
>
>
> cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
> RMAuthenticationFilter#AUTH_HANDLER_PROPERTY is added in YARN-2247. but the 
> code which use  AUTH_HANDLER_PROPERTY is removed at YARN-2656. We would 
> better remove it to avoid confusion since it is only introduce for a very 
> short time and no one use it now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3236) cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.

2015-02-19 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3236:

Description: 
cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
RMAuthenticationFilter#AUTH_HANDLER_PROPERTY is added in YARN-2247. but the 
code which use  AUTH_HANDLER_PROPERTY is removed at YARN-2656. We would better 
remove it to avoid confusion since it is only introduced for a very short time 
and no one use it now.

  was:
cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
RMAuthenticationFilter#AUTH_HANDLER_PROPERTY is added in YARN-2247. but the 
code which use  AUTH_HANDLER_PROPERTY is removed at YARN-2656. We would better 
remove it to avoid confusion since it is only introduce for a very short time 
and no one use it now.


> cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
> -
>
> Key: YARN-3236
> URL: https://issues.apache.org/jira/browse/YARN-3236
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Trivial
> Attachments: YARN-3236.000.patch
>
>
> cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
> RMAuthenticationFilter#AUTH_HANDLER_PROPERTY is added in YARN-2247. but the 
> code which use  AUTH_HANDLER_PROPERTY is removed at YARN-2656. We would 
> better remove it to avoid confusion since it is only introduced for a very 
> short time and no one use it now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3236) cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.

2015-02-19 Thread zhihai xu (JIRA)
zhihai xu created YARN-3236:
---

 Summary: cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
 Key: YARN-3236
 URL: https://issues.apache.org/jira/browse/YARN-3236
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial


cleanup RMAuthenticationFilter#AUTH_HANDLER_PROPERTY.
RMAuthenticationFilter#AUTH_HANDLER_PROPERTY is added in YARN-2247. but the 
code which use  AUTH_HANDLER_PROPERTY is removed at YARN-2656. We would better 
remove it to avoid confusion since it is only introduce for a very short time 
and no one use it now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.

2015-02-19 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328593#comment-14328593
 ] 

Tsuyoshi OZAWA commented on YARN-2820:
--

I'll take a look.

> Do retry in FileSystemRMStateStore for better error recovery when 
> update/store failure due to IOException.
> --
>
> Key: YARN-2820
> URL: https://issues.apache.org/jira/browse/YARN-2820
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
> YARN-2820.002.patch, YARN-2820.003.patch
>
>
> Do retry in FileSystemRMStateStore for better error recovery when 
> update/store failure due to IOException.
> When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
> saw the following IOexception cause the RM shutdown.
> {code}
> 2014-10-29 23:49:12,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Updating info for attempt: appattempt_1409135750325_109118_01 at: 
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01
> 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:46,283 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Error updating info for attempt: appattempt_1409135750325_109118_01
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
> 2014-10-29 23:49:46,284 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
> Error storing/updating appAttempt: appattempt_1409135750325_109118_01
> 2014-10-29 23:49:46,916 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
> Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause: 
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas. 
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
>  
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
>  
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
> at java.lang.Thread.run(Thread.java:744) 
> {code}
> As discussed at YARN-1778, TestFSRMStateStore failure is also due to  
> IOException in storeApplicationStateInternal.
> Stack trace from TestFSRMStateStore failure:
> {code}
>  2015-02-03 00:09:19,092 INFO  [Thread-110] recovery.TestFSRMStateStore 
> (TestFSRMStateStore.java:run(285)) - testFSRMStateStoreClientRetry: Exception
>  org.apache.hadoop.ipc.RemoteException(java.io.IOException): NameNode still 
> not started
>at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkNNStartup(NameNodeRpcServer.java:1876)
>at 
> org.apache.hadoop.hdf

[jira] [Commented] (YARN-2693) Priority Label Manager in RM to manage priority labels

2015-02-19 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328582#comment-14328582
 ] 

Sunil G commented on YARN-2693:
---

Hi [~leftnoteasy]
Thank you for the update.

NodeLabels and AppPrioirty managers are more or less same, but we cant merge 
more closer as we have different PBs for each operation. However a plan can 
laid to merge most of FileSystem and Manager classes so that more common part 
of code can be shared. 

As mentioned, I will move the parsing and config support changes to 
RMAppManager (as a separate class), and will have a minimal implementation. I 
will still keep this JIRA open so as to handle the same after the major 
scheduler changes and api support is done. 

> Priority Label Manager in RM to manage priority labels
> --
>
> Key: YARN-2693
> URL: https://issues.apache.org/jira/browse/YARN-2693
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 
> 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch
>
>
> Focus of this JIRA is to have a centralized service to handle priority labels.
> Support operations such as
> * Add/Delete priority label to a specified queue
> * Manage integer mapping associated with each priority label
> * Support managing default priority label of a given queue
> * ACL support in queue level for priority label
> * Expose interface to RM to validate priority label
> Storage for this labels will be done in FileSystem and in Memory similar to 
> NodeLabel
> * FileSystem Based : persistent across RM restart
> * Memory Based: non-persistent across RM restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.

2015-02-19 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328576#comment-14328576
 ] 

zhihai xu commented on YARN-2820:
-

All these 5 findbugs are not related to my change.

> Do retry in FileSystemRMStateStore for better error recovery when 
> update/store failure due to IOException.
> --
>
> Key: YARN-2820
> URL: https://issues.apache.org/jira/browse/YARN-2820
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
> YARN-2820.002.patch, YARN-2820.003.patch
>
>
> Do retry in FileSystemRMStateStore for better error recovery when 
> update/store failure due to IOException.
> When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
> saw the following IOexception cause the RM shutdown.
> {code}
> 2014-10-29 23:49:12,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Updating info for attempt: appattempt_1409135750325_109118_01 at: 
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01
> 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:46,283 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Error updating info for attempt: appattempt_1409135750325_109118_01
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
> 2014-10-29 23:49:46,284 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
> Error storing/updating appAttempt: appattempt_1409135750325_109118_01
> 2014-10-29 23:49:46,916 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
> Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause: 
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas. 
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
>  
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
>  
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
> at java.lang.Thread.run(Thread.java:744) 
> {code}
> As discussed at YARN-1778, TestFSRMStateStore failure is also due to  
> IOException in storeApplicationStateInternal.
> Stack trace from TestFSRMStateStore failure:
> {code}
>  2015-02-03 00:09:19,092 INFO  [Thread-110] recovery.TestFSRMStateStore 
> (TestFSRMStateStore.java:run(285)) - testFSRMStateStoreClientRetry: Exception
>  org.apache.hadoop.ipc.RemoteException(java.io.IOException): NameNode still 
> not started
>at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkNNStartup(NameNodeRpcServer.java:1876)
>at 

[jira] [Commented] (YARN-3033) [Aggregator wireup] Implement NM starting the ATS writer companion

2015-02-19 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328453#comment-14328453
 ] 

Sangjin Lee commented on YARN-3033:
---

Thanks for the well-written proposal [~gtCarrera9]! It looks fine except for 
one thing IMO: whether the RM's aggregator needs to use the app-level 
aggregators.

I'm not convinced that an "Application Level Aggregator inside RM" is needed or 
beneficial. The main use case of RM writing application-related data is writing 
application life-cycle events. This doesn't represent much volume for each app 
(at most a few events per app). Furthermore, it does not require any 
batching/aggregation of metrics of any kind. But by having the per-app 
aggregators it would retain a lot of memory for the duration the apps are 
alive. And it could be a significant amount of memory pressure for a big/busy 
cluster. IMO, it would be a superfluous abstraction with little benefit. Does 
the RM aggregator have to use it? Do you see it being a useful abstraction? If 
so, how?

In my opinion, it would be far simpler and also perform better if the RM 
aggregator writes data to the storage outside the app-level context.

{quote}
If N_app > N_node or N_app >> N_node, we may consider to launch a constant
number of aggregators inside each NodeManager, so the total aggregator entities 
is
bounded by the number of NMs. The reason we’d like to avoid running too many
aggregators is the pressure on the storage ­ too many writers writing to say 
HBase
RegionServers. We can override the aggregator mapping in this case.
{quote}
+1 with Junping's comment to keep the model simple. This can also be handled by 
a different manner if it is for HBase. One can use a single shared HBase client 
for all app level aggregators on a per-node aggregator, which would mitigate 
that concern. If app level aggregators are separate processes, it's a different 
story of course. Also, it's been my observation that the nodes usually 
outnumber the active apps unless the apps are real tiny.

> [Aggregator wireup] Implement NM starting the ATS writer companion
> --
>
> Key: YARN-3033
> URL: https://issues.apache.org/jira/browse/YARN-3033
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Li Lu
> Attachments: MappingandlaunchingApplevelTimelineaggregators.pdf
>
>
> Per design in YARN-2928, implement node managers starting the ATS writer 
> companion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2015-02-19 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328450#comment-14328450
 ] 

Sunil G commented on YARN-1963:
---

Thank you Wangda and Jason for the input

Yes,  it's good to change the priority of an application at runtime.  I had 
mentioned it in the design doc. 
 I have created a user api jira already,  and it's client part can be handled 
there. 

> Support priorities across applications within the same queue 
> -
>
> Key: YARN-1963
> URL: https://issues.apache.org/jira/browse/YARN-1963
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Reporter: Arun C Murthy
>Assignee: Sunil G
> Attachments: YARN Application Priorities Design.pdf, YARN Application 
> Priorities Design_01.pdf
>
>
> It will be very useful to support priorities among applications within the 
> same queue, particularly in production scenarios. It allows for finer-grained 
> controls without having to force admins to create a multitude of queues, plus 
> allows existing applications to continue using existing queues which are 
> usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3226) UI changes for decommissioning node

2015-02-19 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-3226:
-

Assignee: Sunil G

> UI changes for decommissioning node
> ---
>
> Key: YARN-3226
> URL: https://issues.apache.org/jira/browse/YARN-3226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Sunil G
>
> Some initial thought is:
> decommissioning nodes should still show up in the active nodes list since 
> they are still running containers. 
> A separate decommissioning tab to filter for those nodes would be nice, 
> although I suppose users can also just use the jquery table to sort/search for
> nodes in that state from the active nodes list if it's too crowded to add yet 
> another node
> state tab (or maybe get rid of some effectively dead tabs like the reboot 
> state tab).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3232) Some application states are not necessarily exposed to users

2015-02-19 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-3232:
--

Assignee: Varun Saxena

> Some application states are not necessarily exposed to users
> 
>
> Key: YARN-3232
> URL: https://issues.apache.org/jira/browse/YARN-3232
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Varun Saxena
>
> application NEW_SAVING and SUBMITTED states are not necessarily exposed to 
> users as they mostly internal to the system, transient and not user-facing. 
> We may deprecate these two states and remove them from the web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-02-19 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328437#comment-14328437
 ] 

Sunil G commented on YARN-3225:
---

Another point is,  suppose if we fire same command with different time units 
immediately. And if first timeout is still ongoing,  do we need to update 
timeout? 

> New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
> ---
>
> Key: YARN-3225
> URL: https://issues.apache.org/jira/browse/YARN-3225
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Devaraj K
>
> New CLI (or existing CLI with parameters) should put each node on 
> decommission list to decommissioning status and track timeout to terminate 
> the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3224) Notify AM with containers (on decommissioning node) could be preempted after timeout.

2015-02-19 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-3224:
-

Assignee: Sunil G

> Notify AM with containers (on decommissioning node) could be preempted after 
> timeout.
> -
>
> Key: YARN-3224
> URL: https://issues.apache.org/jira/browse/YARN-3224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Sunil G
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server

2015-02-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328426#comment-14328426
 ] 

Hadoop QA commented on YARN-2556:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699764/YARN-2556.patch
  against trunk revision d49ae72.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

  org.apache.hadoop.conf.TestJobConf

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6675//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6675//console

This message is automatically generated.

> Tool to measure the performance of the timeline server
> --
>
> Key: YARN-2556
> URL: https://issues.apache.org/jira/browse/YARN-2556
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Jonathan Eagles
>Assignee: Chang Li
> Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, 
> YARN-2556.patch, yarn2556.patch, yarn2556.patch, yarn2556_wip.patch
>
>
> We need to be able to understand the capacity model for the timeline server 
> to give users the tools they need to deploy a timeline server with the 
> correct capacity.
> I propose we create a mapreduce job that can measure timeline server write 
> and read performance. Transactions per second, I/O for both read and write 
> would be a good start.
> This could be done as an example or test job that could be tied into gridmix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2015-02-19 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328400#comment-14328400
 ] 

Robert Kanter commented on YARN-2423:
-

{quote}as we anyway need to make the REST APIs compatible, which is the 
internal stuff within the java wrapper.{quote}
exactly

> TimelineClient should wrap all GET APIs to facilitate Java users
> 
>
> Key: YARN-2423
> URL: https://issues.apache.org/jira/browse/YARN-2423
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Robert Kanter
> Attachments: YARN-2423.004.patch, YARN-2423.005.patch, 
> YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, 
> YARN-2423.patch
>
>
> TimelineClient provides the Java method to put timeline entities. It's also 
> good to wrap over all GET APIs (both entity and domain), and deserialize the 
> json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3235) Support uniformed scheduler configuration in FairScheduler

2015-02-19 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3235:


 Summary: Support uniformed scheduler configuration in FairScheduler
 Key: YARN-3235
 URL: https://issues.apache.org/jira/browse/YARN-3235
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3234) Add changes in CapacityScheduler to use the abstracted configuration layer

2015-02-19 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3234:


 Summary: Add changes in CapacityScheduler to use the abstracted 
configuration layer
 Key: YARN-3234
 URL: https://issues.apache.org/jira/browse/YARN-3234
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3233) Implement scheduler common configuration parser and create an abstraction CapacityScheduler configuration layer to support plain/hierarchy configuration.

2015-02-19 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-3233:


Assignee: Wangda Tan

> Implement scheduler common configuration parser and create an abstraction 
> CapacityScheduler configuration layer to support plain/hierarchy 
> configuration.
> -
>
> Key: YARN-3233
> URL: https://issues.apache.org/jira/browse/YARN-3233
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3233) Implement scheduler common configuration parser and create an abstraction CapacityScheduler configuration layer to support plain/hierarchy configuration.

2015-02-19 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3233:
-
Summary: Implement scheduler common configuration parser and create an 
abstraction CapacityScheduler configuration layer to support plain/hierarchy 
configuration.  (was: Implement scheduler common configuration parser and 
create abstraction layer in CapacityScheduler to support plain/hierarchy 
configuration.)

> Implement scheduler common configuration parser and create an abstraction 
> CapacityScheduler configuration layer to support plain/hierarchy 
> configuration.
> -
>
> Key: YARN-3233
> URL: https://issues.apache.org/jira/browse/YARN-3233
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3233) Implement scheduler common configuration parser and create abstraction layer in CapacityScheduler to support plain/hierarchy configuration.

2015-02-19 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-3233:


 Summary: Implement scheduler common configuration parser and 
create abstraction layer in CapacityScheduler to support plain/hierarchy 
configuration.
 Key: YARN-3233
 URL: https://issues.apache.org/jira/browse/YARN-3233
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2986) (Umbrella) Support hierarchical and unified scheduler configuration

2015-02-19 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2986:
-
Summary: (Umbrella) Support hierarchical and unified scheduler 
configuration  (was: Support hierarchical and unified scheduler configuration)

> (Umbrella) Support hierarchical and unified scheduler configuration
> ---
>
> Key: YARN-2986
> URL: https://issues.apache.org/jira/browse/YARN-2986
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Wangda Tan
> Attachments: YARN-2986.1.patch
>
>
> Today's scheduler configuration is fragmented and non-intuitive, and needs to 
> be improved. Details in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2015-02-19 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328388#comment-14328388
 ] 

Zhijie Shen commented on YARN-2423:
---

Sure, I'll review the last patch. No matter the java client lib exists or not, 
we have exposed the REST getter APIs, and have users that depend on them. 
Having java client lib may make put more issue on backward compatibility of TS 
v2, but hopefully it's not going to be a big addition, as we anyway need to 
make the REST APIs compatible, which is the internal stuff within the java 
wrapper.

> TimelineClient should wrap all GET APIs to facilitate Java users
> 
>
> Key: YARN-2423
> URL: https://issues.apache.org/jira/browse/YARN-2423
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Robert Kanter
> Attachments: YARN-2423.004.patch, YARN-2423.005.patch, 
> YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, 
> YARN-2423.patch
>
>
> TimelineClient provides the Java method to put timeline entities. It's also 
> good to wrap over all GET APIs (both entity and domain), and deserialize the 
> json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle

2015-02-19 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328387#comment-14328387
 ] 

Sangjin Lee commented on YARN-3047:
---

Thanks for the response [~varun_saxena]. I'm OK with 
TimelineClientServiceManager.

> [Data Serving] Set up ATS reader with basic request serving structure and 
> lifecycle
> ---
>
> Key: YARN-3047
> URL: https://issues.apache.org/jira/browse/YARN-3047
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3047.001.patch
>
>
> Per design in YARN-2938, set up the ATS reader as a service and implement the 
> basic structure as a service. It includes lifecycle management, request 
> serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3230) Clarify application states on the web UI

2015-02-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328386#comment-14328386
 ] 

Wangda Tan commented on YARN-3230:
--

Since the new_saving issue seems hard to fit in this ticket, I suggest to file 
a separated one to tracking it.

Patch looks good to me, findbugs warning not related to this patch. I will 
commit it today.

> Clarify application states on the web UI
> 
>
> Key: YARN-3230
> URL: https://issues.apache.org/jira/browse/YARN-3230
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3230.1.patch, YARN-3230.2.patch, YARN-3230.3.patch, 
> YARN-3230.3.patch, application page.png
>
>
> Today, application state are simply surfaced as a single word on the web UI. 
> Not everyone understands the meaning of "NEW_SAVING, SUBMITTED, ACCEPTED". 
> This jira is to clarify the meaning of these states, things like what the 
> application is waiting for at this state. 
> In addition,the difference between application state and FinalStatus are 
> fairly confusing to users, especially when state=FINISHED, but 
> FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3230) Clarify application states on the web UI

2015-02-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328380#comment-14328380
 ] 

Hadoop QA commented on YARN-3230:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699770/YARN-3230.3.patch
  against trunk revision d49ae72.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6677//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6677//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6677//console

This message is automatically generated.

> Clarify application states on the web UI
> 
>
> Key: YARN-3230
> URL: https://issues.apache.org/jira/browse/YARN-3230
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3230.1.patch, YARN-3230.2.patch, YARN-3230.3.patch, 
> YARN-3230.3.patch, application page.png
>
>
> Today, application state are simply surfaced as a single word on the web UI. 
> Not everyone understands the meaning of "NEW_SAVING, SUBMITTED, ACCEPTED". 
> This jira is to clarify the meaning of these states, things like what the 
> application is waiting for at this state. 
> In addition,the difference between application state and FinalStatus are 
> fairly confusing to users, especially when state=FINISHED, but 
> FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-02-19 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328376#comment-14328376
 ] 

Hitesh Shah commented on YARN-3131:
---

[~lichangleo] I do not think that continuously polling until RUNNING is a good 
idea. The most common case on a busy cluster is that an app can be submitted at 
time X but not start running until a long time later. Making client code block 
until then is not a good idea especially for cases where jobs are submitted in 
a fire-n-forget manner. 

I think for now, we should probably not address this jira in this manner. As it 
stands today, it might be better to live with these issues in the short term ( 
so as to not break current expected behavior ).

As I mentioned earlier, I still believe that doing some basic checks in-line in 
ClientRMService itself and throwing an exception back straight away is probably 
a better idea than polling for any RUNNING/FAILED state. 


> YarnClientImpl should check FAILED and KILLED state in submitApplication
> 
>
> Key: YARN-3131
> URL: https://issues.apache.org/jira/browse/YARN-3131
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: yarn_3131_v1.patch
>
>
> Just run into a issue when submit a job into a non-existent queue and 
> YarnClient raise no exception. Though that job indeed get submitted 
> successfully and just failed immediately after, it will be better if 
> YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3033) [Aggregator wireup] Implement NM starting the ATS writer companion

2015-02-19 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328375#comment-14328375
 ] 

Li Lu commented on YARN-3033:
-

Thanks [~djp]! Some quick comments:

bq. I think we should mention our best-effort to place logical app aggregator 
to be on the same physical node with AM containers, given the assumption that 
AM could consume aggregator service heavier so we want to keep data transferred 
in local as much as possible. 
bq. For the reason I mentioned above, we should do our best effort to keep 
logical aggregator stay with AM container and we don't have to have different 
policies for different cases.
I agree. We should make every effort to improve the locality here. Will cover 
this later. 

bq.  I didn't see the discussion on launching (or we call binding - to physical 
aggregator) the logical aggregator (by who and how). I assume this is the scope 
of this JIRA. Isn't it?
Yes. Let's focus on launching aggregators for the whole flow in this JIRA. We 
may then consider if we need a separate JIRA for the standalone mode. 

> [Aggregator wireup] Implement NM starting the ATS writer companion
> --
>
> Key: YARN-3033
> URL: https://issues.apache.org/jira/browse/YARN-3033
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Li Lu
> Attachments: MappingandlaunchingApplevelTimelineaggregators.pdf
>
>
> Per design in YARN-2928, implement node managers starting the ATS writer 
> companion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-02-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328370#comment-14328370
 ] 

Wangda Tan commented on YARN-2495:
--

[~cwelch],
Thanks for your thoughtful inputs, configuration switches include changes 
proposed in this patch is:
{code}
  node-label
   / \
  enable disable
 /  \
centralized distributed
 |  |
 node-label-store  node-manager script
 need to be configured need to be configured in NM
{code}

You can take a look at discussion of why we do YARN-2800, we need 
enable/disable this feature (not enable/disable centralized API).
And there're some difference between centralized/distributed configure
- We don't need to store node->label mappings since NM will store it locally
- We cannot mix them together as I mentioned in 
https://issues.apache.org/jira/browse/YARN-2495?focusedCommentId=14317048&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14317048.

So there're totally 3 options of node-label feature itself in RM's view: 
\{disabled, centralized, distributed\}. I don't think we can eliminate any of 
them, and I didn't see your suggestion can make it :).

So if you agree, I suggest we can move forward this with following changes:
- Change {{decentralized-configuration.enabled}} to {{input}}, accept value of 
\{centralized, distributed\} (by default is centralized)
- Combining suggestion in 
https://issues.apache.org/jira/browse/YARN-2495?focusedCommentId=14317048&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14317048.
- Not sure if https://issues.apache.org/jira/browse/YARN-2980 will impact 
implementation of this patch (I haven't looked at it), but it's best to reusage 
existing code.

Sounds good? [~Naganarasimha] please let me know if you have any other ideas :).

> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
> YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
> YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
> using script suggested by [~aw] (YARN-2729) )
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3033) [Aggregator wireup] Implement NM starting the ATS writer companion

2015-02-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328365#comment-14328365
 ] 

Junping Du commented on YARN-3033:
--

Thanks [~gtCarrera9] for delivering a proposal here which looks pretty good. A 
couple of comments here:
bq. In the below discussion, we start with a notion of AMs/NMs not making any 
assumptions on the physical locations of the aggregators an
aggregator can sit next to the AM in a separate container, it can side on the 
local node etc.
I think we should mention our best-effort to place logical app aggregator to be 
on the same physical node with AM containers, given the assumption that AM 
could consume aggregator service heavier so we want to keep data transferred in 
local as much as possible. The other cases that AM don't stay with logical 
aggregator is AM get failed over to other node but NM physical aggregator 
service is still alive so logic aggregator keep the same place. We'd better to 
mention here.

bq. Depending on the number of running applications ( N_app ) and the number of 
nodes ( N_node ) or on the size of individual applications, we may choose 
different mapping policies (implemented as different aggregatorcollections).
For the reason I mentioned above, we should do our best effort to keep logical 
aggregator stay with AM container and we don't have to have different policies 
for different cases. Even for the case N_app >> N_node, we should make sure AMs 
doesn't get skewed (aggregated) across the cluster.

 Also I didn't see the discussion on launching (or we call binding - to 
physical aggregator) the logical aggregator (by who and how). I assume this is 
the scope of this JIRA. Isn't it?

> [Aggregator wireup] Implement NM starting the ATS writer companion
> --
>
> Key: YARN-3033
> URL: https://issues.apache.org/jira/browse/YARN-3033
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Li Lu
> Attachments: MappingandlaunchingApplevelTimelineaggregators.pdf
>
>
> Per design in YARN-2928, implement node managers starting the ATS writer 
> companion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-02-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328350#comment-14328350
 ] 

Hadoop QA commented on YARN-3131:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699763/yarn_3131_v1.patch
  against trunk revision d49ae72.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6676//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6676//console

This message is automatically generated.

> YarnClientImpl should check FAILED and KILLED state in submitApplication
> 
>
> Key: YARN-3131
> URL: https://issues.apache.org/jira/browse/YARN-3131
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: yarn_3131_v1.patch
>
>
> Just run into a issue when submit a job into a non-existent queue and 
> YarnClient raise no exception. Though that job indeed get submitted 
> successfully and just failed immediately after, it will be better if 
> YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-933) Potential InvalidStateTransitonException: Invalid event: LAUNCHED at FINAL_SAVING

2015-02-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328348#comment-14328348
 ] 

Hudson commented on YARN-933:
-

FAILURE: Integrated in Hadoop-trunk-Commit #7158 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7158/])
YARN-933. Fixed InvalidStateTransitonException at FINAL_SAVING state in RMApp. 
Contributed by Rohith Sharmaks (jianhe: rev 
c0d9b93953767608dfe429ddb9bd4c1c3bd3debf)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


> Potential InvalidStateTransitonException: Invalid event: LAUNCHED at 
> FINAL_SAVING
> -
>
> Key: YARN-933
> URL: https://issues.apache.org/jira/browse/YARN-933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.5-alpha
>Reporter: J.Andreina
>Assignee: Rohith
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, 
> 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch
>
>
> am max retries configured as 3 at client and RM side.
> Step 1: Install cluster with NM on 2 Machines 
> Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
> using Hostname should fail
> Step 3: Execute a job
> Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
> connection loss happened.
> Observation :
> ==
> After AppAttempt_1 has moved to failed state ,release of container for 
> AppAttempt_1 and Application removal are successful. New AppAttempt_2 is 
> sponed.
> 1. Then again retry for AppAttempt_1 happens.
> 2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
> InvalidStateTransitonException
> 3. Client got exited after AppAttempt_1 is been finished [But actually job is 
> still running ], while the appattempts configured is 3 and rest appattempts 
> are all sponed and running.
> RMLogs:
> ==
> 2013-07-17 16:22:51,013 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
> 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
> maxRetries=45
> 2013-07-17 16:36:07,091 INFO 
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
> Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
> 2013-07-17 16:36:07,093 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED 
> to EXPIRED
> 2013-07-17 16:36:07,093 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
> Registering appattempt_1373952096466_0056_02
> 2013-07-17 16:36:07,131 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
> 2013-07-17 16:36:07,131 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Application removed - appId: application_1373952096466_0056 user: Rex 
> leaf-queue of parent: root #applications: 35
> 2013-07-17 16:36:07,132 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Application Submission: appattempt_1373952096466_0056_02, 
> 2013-07-17 16:36:07,138 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED
> 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
> maxRetries=45
> 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
> maxRetries=45
> 2013-07-17 16:38:56,207 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
> launching appattempt_1373952096466_0056_01. Got exception: 
> java.lang.reflect.UndeclaredThrowableException
> 2013-07-17 16:38:56,207 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> LAUNCH_FAILED at FAILED
>  at 
> org.apache.ha

[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer

2015-02-19 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328346#comment-14328346
 ] 

Li Lu commented on YARN-3034:
-

I agree that the RM may have a derived type of aggregator. Meanwhile, maybe 
we'd like to consider reuse the code for web server/data storage layer 
connections? BTW, I've done a simple write up for app-level aggregators and 
their relationships with RM/NMs, posted in YARN-3033. To make sure we're on the 
same page, could some one of you take a look at it? Thanks! 

> [Aggregator wireup] Implement RM starting its ATS writer
> 
>
> Key: YARN-3034
> URL: https://issues.apache.org/jira/browse/YARN-3034
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3034.20150205-1.patch
>
>
> Per design in YARN-2928, implement resource managers starting their own ATS 
> writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-02-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328345#comment-14328345
 ] 

Junping Du commented on YARN-3225:
--

Hi [~devaraj.k], thanks for taking on this. I think seconds here should be fine 
as it is usually take minutes on the timeout.

> New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
> ---
>
> Key: YARN-3225
> URL: https://issues.apache.org/jira/browse/YARN-3225
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Devaraj K
>
> New CLI (or existing CLI with parameters) should put each node on 
> decommission list to decommissioning status and track timeout to terminate 
> the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3166) [Source organization] Decide detailed package structures for timeline service v2 components

2015-02-19 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328344#comment-14328344
 ] 

Zhijie Shen commented on YARN-3166:
---

1. If YARN-3087 can be resolved, we can probably make NM avoid the dependency 
on the timeline service module.

2. I'm not aware that RM has something similar to NM aux service to decouple RM 
and aggregator, too. It seems to be more elegant to have base-aggregator or 
server-oriented interfaces in server-common, but I okay with both. In addition 
to packaging, coupling with RM means that we cannot deploy RM aggregator with 
different versions of RM.

3. We can add the method to operate the new data model into the existing client 
class, instead of creating a new one. The benefit is that we can reuse the 
whole skeleton, including http rest wrapper, security and retry code, but just 
need to handle some different data objects, and direct the request to a 
different location. Instead of deprecate the whole class, we can deprecate the 
individual methods. Thought?

> [Source organization] Decide detailed package structures for timeline service 
> v2 components
> ---
>
> Key: YARN-3166
> URL: https://issues.apache.org/jira/browse/YARN-3166
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>
> Open this JIRA to track all discussions on detailed package structures for 
> timeline services v2. This JIRA is for discussion only.
> For our current timeline service v2 design, aggregator (previously called 
> "writer") implementation is in hadoop-yarn-server's:
> {{org.apache.hadoop.yarn.server.timelineservice.aggregator}}
> In YARN-2928's design, the next gen ATS reader is also a server. Maybe we 
> want to put reader related implementations into hadoop-yarn-server's:
> {{org.apache.hadoop.yarn.server.timelineservice.reader}}
> Both readers and aggregators will expose features that may be used by YARN 
> and other 3rd party components, such as aggregator/reader APIs. For those 
> features, maybe we would like to expose their interfaces to 
> hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? 
> Let's use this JIRA as a centralized place to track all related discussions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3033) [Aggregator wireup] Implement NM starting the ATS writer companion

2015-02-19 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3033:

Attachment: MappingandlaunchingApplevelTimelineaggregators.pdf

Since there are some confusions on our current design of app-level aggregators, 
I wrote a simple document about my understanding on their structures. I'd hope 
this writeup may help us to be on the same page. 

> [Aggregator wireup] Implement NM starting the ATS writer companion
> --
>
> Key: YARN-3033
> URL: https://issues.apache.org/jira/browse/YARN-3033
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Li Lu
> Attachments: MappingandlaunchingApplevelTimelineaggregators.pdf
>
>
> Per design in YARN-2928, implement node managers starting the ATS writer 
> companion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3194) After NM restart, RM should handle NMCotainerStatuses sent by NM while registering if NM is Reconnected node

2015-02-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328333#comment-14328333
 ] 

Junping Du commented on YARN-3194:
--

lgtm three. :-)

> After NM restart, RM should handle NMCotainerStatuses sent by NM while 
> registering if NM is Reconnected node
> 
>
> Key: YARN-3194
> URL: https://issues.apache.org/jira/browse/YARN-3194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: NM restart is enabled
>Reporter: Rohith
>Assignee: Rohith
>Priority: Blocker
> Attachments: 0001-YARN-3194.patch, 0001-yarn-3194-v1.patch
>
>
> On NM restart ,NM sends all the outstanding NMContainerStatus to RM during 
> registration. The registration can be treated by RM as New node or 
> Reconnecting node. RM triggers corresponding event on the basis of node added 
> or node reconnected state. 
> # Node added event : Again here 2 scenario's can occur 
> ## New node is registering with different ip:port – NOT A PROBLEM
> ## Old node is re-registering because of RESYNC command from RM when RM 
> restart – NOT A PROBLEM
> # Node reconnected event : 
> ## Existing node is re-registering i.e RM treat it as reconnecting node when 
> RM is not restarted 
> ### NM RESTART NOT Enabled – NOT A PROBLEM
> ### NM RESTART is Enabled 
>  Some applications are running on this node – *Problem is here*
>  Zero applications are running on this node – NOT A PROBLEM
> Since NMContainerStatus are not handled, RM never get to know about 
> completedContainer and never release resource held be containers. RM will not 
> allocate new containers for pending resource request as long as the 
> completedContainer event is triggered. This results in applications to wait 
> indefinitly because of pending containers are not served by RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-19 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3231:
---
Priority: Critical  (was: Major)

> FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
> job stuck
> --
>
> Key: YARN-3231
> URL: https://issues.apache.org/jira/browse/YARN-3231
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Critical
> Attachments: YARN-3231.v1.patch
>
>
> When a queue is piling up with a lot of pending jobs due to the 
> maxRunningApps limit. We want to increase this property on the fly to make 
> some of the pending job active. However, once we increase the limit, all 
> pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328317#comment-14328317
 ] 

Hadoop QA commented on YARN-3231:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699776/YARN-3231.v1.patch
  against trunk revision d49ae72.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6678//console

This message is automatically generated.

> FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
> job stuck
> --
>
> Key: YARN-3231
> URL: https://issues.apache.org/jira/browse/YARN-3231
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-3231.v1.patch
>
>
> When a queue is piling up with a lot of pending jobs due to the 
> maxRunningApps limit. We want to increase this property on the fly to make 
> some of the pending job active. However, once we increase the limit, all 
> pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3232) Some application states are not necessarily exposed to users

2015-02-19 Thread Jian He (JIRA)
Jian He created YARN-3232:
-

 Summary: Some application states are not necessarily exposed to 
users
 Key: YARN-3232
 URL: https://issues.apache.org/jira/browse/YARN-3232
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He


application NEW_SAVING and SUBMITTED states are not necessarily exposed to 
users as they mostly internal to the system, transient and not user-facing. We 
may deprecate these two states and remove them from the web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-19 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li reassigned YARN-3231:
-

Assignee: Siqi Li

> FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
> job stuck
> --
>
> Key: YARN-3231
> URL: https://issues.apache.org/jira/browse/YARN-3231
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-3231.v1.patch
>
>
> When a queue is piling up with a lot of pending jobs due to the 
> maxRunningApps limit. We want to increase this property on the fly to make 
> some of the pending job active. However, once we increase the limit, all 
> pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-19 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328312#comment-14328312
 ] 

Siqi Li commented on YARN-3231:
---

The problem seems come from MaxRunningAppsEnforcer, I will upload a patch 
shortly

> FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
> job stuck
> --
>
> Key: YARN-3231
> URL: https://issues.apache.org/jira/browse/YARN-3231
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>
> When a queue is piling up with a lot of pending jobs due to the 
> maxRunningApps limit. We want to increase this property on the fly to make 
> some of the pending job active. However, once we increase the limit, all 
> pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-19 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-3231:
--
Attachment: YARN-3231.v1.patch

> FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
> job stuck
> --
>
> Key: YARN-3231
> URL: https://issues.apache.org/jira/browse/YARN-3231
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
> Attachments: YARN-3231.v1.patch
>
>
> When a queue is piling up with a lot of pending jobs due to the 
> maxRunningApps limit. We want to increase this property on the fly to make 
> some of the pending job active. However, once we increase the limit, all 
> pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-19 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-3231:
--
Description: When a queue is piling up with a lot of pending jobs due to 
the maxRunningApps limit. We want to increase this property on the fly to make 
some of the pending job active. However, once we increase the limit, all 
pending jobs were not assigned any resource, and were stuck forever.

> FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
> job stuck
> --
>
> Key: YARN-3231
> URL: https://issues.apache.org/jira/browse/YARN-3231
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>
> When a queue is piling up with a lot of pending jobs due to the 
> maxRunningApps limit. We want to increase this property on the fly to make 
> some of the pending job active. However, once we increase the limit, all 
> pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-19 Thread Siqi Li (JIRA)
Siqi Li created YARN-3231:
-

 Summary: FairScheduler changing queueMaxRunningApps on the fly 
will cause all pending job stuck
 Key: YARN-3231
 URL: https://issues.apache.org/jira/browse/YARN-3231
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2986) Support hierarchical and unified scheduler configuration

2015-02-19 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2986:
-
Attachment: YARN-2986.1.patch

Uploaded a WIP patch, which consists:

1) Implementation of hierarchy CS configuration:
- Basic xml parser utilities
- Base implementation of scheduler hierarchy configuration (can by reused by 
Fair and other schedulers)
- Capacity Scheduler configuration implementation (*)

(*) Capacity Scheduler configuration implementation:
- Since we need to maintain two different configuration style (plain & 
hierarchy), so we need an abstraction layer to make Capacity Scheduler less 
impacted. So now:
- CapacitySchedulerConfiguration becomes an interface, with all getter/setter 
needed
- The original CapacitySchedulerConfiguration is renamed CSPlainConfiguration, 
implements CapacitySchedulerConfiguration
- The new hierarchy configuration is named CSHierarchyConfiguration

2) Changes in Capacity Scheduler to use the new interface -- 
CapacitySchedulerConfiguration
- This is half way done, now calling CapacitySchedulerContext.getConfiguration 
will return the interface class.
- Other part of CapacityScheduler class is still using CSPlainConfiguration

For 2), pending works:
- Add smart detectation code choose to read configuration file
- Eleminate directly using CSPlainConfiguration, this is blocked by an issue in 
existing CapacityScheduler implementation:
In existing CS implementation, CapacitySchedulerConfiguration will 
automatically includes "yarn-site.xml", which is not correct, we need limit 
CapacitySchedulerConfiguration only use "capacity-scheduler.xml".
- Add more tests to make sure CS will work under both configuration file style.

Now config file style
{code}

   
  value
  value
  value
   

   
   
   value
   value

   
   
   

   
   
   


{code}

Suggestion of moving forward:
I suggest to commit existing patch *if we agree with config style and verified 
changes to CS is safe*. And we can address pending works in following ticket, 
now the patch is already huge, I'm afraid putting all things together may make 
harder patch review.

> Support hierarchical and unified scheduler configuration
> 
>
> Key: YARN-2986
> URL: https://issues.apache.org/jira/browse/YARN-2986
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Wangda Tan
> Attachments: YARN-2986.1.patch
>
>
> Today's scheduler configuration is fragmented and non-intuitive, and needs to 
> be improved. Details in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3230) Clarify application states on the web UI

2015-02-19 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3230:
--
Attachment: YARN-3230.3.patch

> Clarify application states on the web UI
> 
>
> Key: YARN-3230
> URL: https://issues.apache.org/jira/browse/YARN-3230
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3230.1.patch, YARN-3230.2.patch, YARN-3230.3.patch, 
> YARN-3230.3.patch, application page.png
>
>
> Today, application state are simply surfaced as a single word on the web UI. 
> Not everyone understands the meaning of "NEW_SAVING, SUBMITTED, ACCEPTED". 
> This jira is to clarify the meaning of these states, things like what the 
> application is waiting for at this state. 
> In addition,the difference between application state and FinalStatus are 
> fairly confusing to users, especially when state=FINISHED, but 
> FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3230) Clarify application states on the web UI

2015-02-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328290#comment-14328290
 ] 

Hadoop QA commented on YARN-3230:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699766/YARN-3230.3.patch
  against trunk revision d49ae72.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6673//console

This message is automatically generated.

> Clarify application states on the web UI
> 
>
> Key: YARN-3230
> URL: https://issues.apache.org/jira/browse/YARN-3230
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3230.1.patch, YARN-3230.2.patch, YARN-3230.3.patch, 
> application page.png
>
>
> Today, application state are simply surfaced as a single word on the web UI. 
> Not everyone understands the meaning of "NEW_SAVING, SUBMITTED, ACCEPTED". 
> This jira is to clarify the meaning of these states, things like what the 
> application is waiting for at this state. 
> In addition,the difference between application state and FinalStatus are 
> fairly confusing to users, especially when state=FINISHED, but 
> FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3230) Clarify application states on the web UI

2015-02-19 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3230:
--
Attachment: application page.png

> Clarify application states on the web UI
> 
>
> Key: YARN-3230
> URL: https://issues.apache.org/jira/browse/YARN-3230
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3230.1.patch, YARN-3230.2.patch, YARN-3230.3.patch, 
> application page.png
>
>
> Today, application state are simply surfaced as a single word on the web UI. 
> Not everyone understands the meaning of "NEW_SAVING, SUBMITTED, ACCEPTED". 
> This jira is to clarify the meaning of these states, things like what the 
> application is waiting for at this state. 
> In addition,the difference between application state and FinalStatus are 
> fairly confusing to users, especially when state=FINISHED, but 
> FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3230) Clarify application states on the web UI

2015-02-19 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328287#comment-14328287
 ] 

Jason Lowe commented on YARN-3230:
--

bq. NEW_SAVING: is not necessary to be seen by client?

I agree, it's not a state that seems relevant to a client.  We're not even 
consistent about it, since FINAL_SAVING is not visible to clients and is 
silently translated to the previous state.  However might be kind of hard to 
remove completely at this point since it's been published.

> Clarify application states on the web UI
> 
>
> Key: YARN-3230
> URL: https://issues.apache.org/jira/browse/YARN-3230
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3230.1.patch, YARN-3230.2.patch, YARN-3230.3.patch
>
>
> Today, application state are simply surfaced as a single word on the web UI. 
> Not everyone understands the meaning of "NEW_SAVING, SUBMITTED, ACCEPTED". 
> This jira is to clarify the meaning of these states, things like what the 
> application is waiting for at this state. 
> In addition,the difference between application state and FinalStatus are 
> fairly confusing to users, especially when state=FINISHED, but 
> FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3230) Clarify application states on the web UI

2015-02-19 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328286#comment-14328286
 ] 

Jian He commented on YARN-3230:
---

thanks for reviewing the patch, Wangda !
bq. 2) NEW_SAVING: is not necessary to be seen by client?
Agree, I think both NEW_SAVING and SUBMITTED are not necessary to be exposed as 
they are mostly transient and are internal to the system. I can open a jira for 
this.
Addressed other comments.

> Clarify application states on the web UI
> 
>
> Key: YARN-3230
> URL: https://issues.apache.org/jira/browse/YARN-3230
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3230.1.patch, YARN-3230.2.patch, YARN-3230.3.patch
>
>
> Today, application state are simply surfaced as a single word on the web UI. 
> Not everyone understands the meaning of "NEW_SAVING, SUBMITTED, ACCEPTED". 
> This jira is to clarify the meaning of these states, things like what the 
> application is waiting for at this state. 
> In addition,the difference between application state and FinalStatus are 
> fairly confusing to users, especially when state=FINISHED, but 
> FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3230) Clarify application states on the web UI

2015-02-19 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3230:
--
Attachment: YARN-3230.3.patch

> Clarify application states on the web UI
> 
>
> Key: YARN-3230
> URL: https://issues.apache.org/jira/browse/YARN-3230
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3230.1.patch, YARN-3230.2.patch, YARN-3230.3.patch
>
>
> Today, application state are simply surfaced as a single word on the web UI. 
> Not everyone understands the meaning of "NEW_SAVING, SUBMITTED, ACCEPTED". 
> This jira is to clarify the meaning of these states, things like what the 
> application is waiting for at this state. 
> In addition,the difference between application state and FinalStatus are 
> fairly confusing to users, especially when state=FINISHED, but 
> FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-02-19 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-3131:
---
Attachment: yarn_3131_v1.patch

> YarnClientImpl should check FAILED and KILLED state in submitApplication
> 
>
> Key: YARN-3131
> URL: https://issues.apache.org/jira/browse/YARN-3131
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: yarn_3131_v1.patch
>
>
> Just run into a issue when submit a job into a non-existent queue and 
> YarnClient raise no exception. Though that job indeed get submitted 
> successfully and just failed immediately after, it will be better if 
> YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2556) Tool to measure the performance of the timeline server

2015-02-19 Thread Amit Tiwari (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amit Tiwari updated YARN-2556:
--
Attachment: YARN-2556.patch

Hi guys,
I've done the following enhancements to the previous patches that were posted:
1) Earlier, the payload was getting set as the entityId. Since the entityId is 
used as a key, by LevelDB it was crashing under moderate loads, because each 
key size was ~2MB. Hence I've changed it to send the payload as a part of 
OtherInfo. This is handled well.
2) Instead of posting a string of repeated 'a's as a payload, I choose from a 
set of characters. This ensures that the LevelDB does not get away easily with 
compression ('cos algos can easily compress a string if it comprises a single 
repeated character)

Here are some of the performance numbers that I've got:
I run 20 concurrent jobs, with the argument -m 300 -s 10 -t 20 
On a 36 node cluster, this results in ~830 concurrent containers (e.g maps), 
each firing 10KB of payload, 20 times.

Level DB seems to hold up fine.

Would you have other ways that I could stress/load the system even more?
thanks
--amit

> Tool to measure the performance of the timeline server
> --
>
> Key: YARN-2556
> URL: https://issues.apache.org/jira/browse/YARN-2556
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Jonathan Eagles
>Assignee: Chang Li
> Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, 
> YARN-2556.patch, yarn2556.patch, yarn2556.patch, yarn2556_wip.patch
>
>
> We need to be able to understand the capacity model for the timeline server 
> to give users the tools they need to deploy a timeline server with the 
> correct capacity.
> I propose we create a mapreduce job that can measure timeline server write 
> and read performance. Transactions per second, I/O for both read and write 
> would be a good start.
> This could be done as an example or test job that could be tied into gridmix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3230) Clarify application states on the web UI

2015-02-19 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3230:
--
Attachment: (was: application page.png)

> Clarify application states on the web UI
> 
>
> Key: YARN-3230
> URL: https://issues.apache.org/jira/browse/YARN-3230
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3230.1.patch, YARN-3230.2.patch
>
>
> Today, application state are simply surfaced as a single word on the web UI. 
> Not everyone understands the meaning of "NEW_SAVING, SUBMITTED, ACCEPTED". 
> This jira is to clarify the meaning of these states, things like what the 
> application is waiting for at this state. 
> In addition,the difference between application state and FinalStatus are 
> fairly confusing to users, especially when state=FINISHED, but 
> FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3230) Clarify application states on the web UI

2015-02-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328264#comment-14328264
 ] 

Wangda Tan commented on YARN-3230:
--

[~jianhe], thanks for working on this, generally looks good to me, some minor 
comments:
1) "FinalStatus from Application's POV:" to "Final State Reported by 
Application Master"?
2) NEW_SAVING: is not necessary to be seen by client?
3) RUNNING: AM container has registered to RM and started running.

Wangda

> Clarify application states on the web UI
> 
>
> Key: YARN-3230
> URL: https://issues.apache.org/jira/browse/YARN-3230
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3230.1.patch, YARN-3230.2.patch, application 
> page.png
>
>
> Today, application state are simply surfaced as a single word on the web UI. 
> Not everyone understands the meaning of "NEW_SAVING, SUBMITTED, ACCEPTED". 
> This jira is to clarify the meaning of these states, things like what the 
> application is waiting for at this state. 
> In addition,the difference between application state and FinalStatus are 
> fairly confusing to users, especially when state=FINISHED, but 
> FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3230) Clarify application states on the web UI

2015-02-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328253#comment-14328253
 ] 

Hadoop QA commented on YARN-3230:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12699760/application%20page.png
  against trunk revision d49ae72.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6672//console

This message is automatically generated.

> Clarify application states on the web UI
> 
>
> Key: YARN-3230
> URL: https://issues.apache.org/jira/browse/YARN-3230
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3230.1.patch, YARN-3230.2.patch, application 
> page.png
>
>
> Today, application state are simply surfaced as a single word on the web UI. 
> Not everyone understands the meaning of "NEW_SAVING, SUBMITTED, ACCEPTED". 
> This jira is to clarify the meaning of these states, things like what the 
> application is waiting for at this state. 
> In addition,the difference between application state and FinalStatus are 
> fairly confusing to users, especially when state=FINISHED, but 
> FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3230) Clarify application states on the web UI

2015-02-19 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3230:
--
Attachment: application page.png

uploaded an application page screen shot

> Clarify application states on the web UI
> 
>
> Key: YARN-3230
> URL: https://issues.apache.org/jira/browse/YARN-3230
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3230.1.patch, YARN-3230.2.patch, application 
> page.png
>
>
> Today, application state are simply surfaced as a single word on the web UI. 
> Not everyone understands the meaning of "NEW_SAVING, SUBMITTED, ACCEPTED". 
> This jira is to clarify the meaning of these states, things like what the 
> application is waiting for at this state. 
> In addition,the difference between application state and FinalStatus are 
> fairly confusing to users, especially when state=FINISHED, but 
> FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3230) Clarify application states on the web UI

2015-02-19 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3230:
--
Attachment: (was: application page.png)

> Clarify application states on the web UI
> 
>
> Key: YARN-3230
> URL: https://issues.apache.org/jira/browse/YARN-3230
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3230.1.patch, YARN-3230.2.patch
>
>
> Today, application state are simply surfaced as a single word on the web UI. 
> Not everyone understands the meaning of "NEW_SAVING, SUBMITTED, ACCEPTED". 
> This jira is to clarify the meaning of these states, things like what the 
> application is waiting for at this state. 
> In addition,the difference between application state and FinalStatus are 
> fairly confusing to users, especially when state=FINISHED, but 
> FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3230) Clarify application states on the web UI

2015-02-19 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3230:
--
Attachment: application page.png

> Clarify application states on the web UI
> 
>
> Key: YARN-3230
> URL: https://issues.apache.org/jira/browse/YARN-3230
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3230.1.patch, YARN-3230.2.patch
>
>
> Today, application state are simply surfaced as a single word on the web UI. 
> Not everyone understands the meaning of "NEW_SAVING, SUBMITTED, ACCEPTED". 
> This jira is to clarify the meaning of these states, things like what the 
> application is waiting for at this state. 
> In addition,the difference between application state and FinalStatus are 
> fairly confusing to users, especially when state=FINISHED, but 
> FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3230) Clarify application states on the web UI

2015-02-19 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3230:
--
Attachment: YARN-3230.2.patch

> Clarify application states on the web UI
> 
>
> Key: YARN-3230
> URL: https://issues.apache.org/jira/browse/YARN-3230
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3230.1.patch, YARN-3230.2.patch
>
>
> Today, application state are simply surfaced as a single word on the web UI. 
> Not everyone understands the meaning of "NEW_SAVING, SUBMITTED, ACCEPTED". 
> This jira is to clarify the meaning of these states, things like what the 
> application is waiting for at this state. 
> In addition,the difference between application state and FinalStatus are 
> fairly confusing to users, especially when state=FINISHED, but 
> FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3229) Incorrect processing of container as LOST on Interruption during NM shutdown

2015-02-19 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot reassigned YARN-3229:
---

Assignee: Anubhav Dhoot

> Incorrect processing of container as LOST on Interruption during NM shutdown
> 
>
> Key: YARN-3229
> URL: https://issues.apache.org/jira/browse/YARN-3229
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>
> YARN-2846 fixed the issue of writing to the state store incorrectly that the 
> process is LOST. But even after that we still process the ContainerExitEvent. 
> If notInterrupted is false in RecoveredContainerLaunch#call we should skip 
> the following
> {noformat}
>  if (retCode != 0) {
>   LOG.warn("Recovered container exited with a non-zero exit code "
>   + retCode);
>   this.dispatcher.getEventHandler().handle(new ContainerExitEvent(
>   containerId,
>   ContainerEventType.CONTAINER_EXITED_WITH_FAILURE, retCode,
>   "Container exited with a non-zero exit code " + retCode));
>   return retCode;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3230) Clarify application states on the web UI

2015-02-19 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3230:
--
Attachment: YARN-3230.1.patch

Uploaded a patch to add more text to clarify the application state

> Clarify application states on the web UI
> 
>
> Key: YARN-3230
> URL: https://issues.apache.org/jira/browse/YARN-3230
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3230.1.patch
>
>
> Today, application state are simply surfaced as a single word on the web UI. 
> Not everyone understands the meaning of "NEW_SAVING, SUBMITTED, ACCEPTED". 
> This jira is to clarify the meaning of these states, things like what the 
> application is waiting for at this state. 
> In addition,the difference between application state and FinalStatus are 
> fairly confusing to users, especially when state=FINISHED, but 
> FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3230) Clarify application states on the web UI

2015-02-19 Thread Jian He (JIRA)
Jian He created YARN-3230:
-

 Summary: Clarify application states on the web UI
 Key: YARN-3230
 URL: https://issues.apache.org/jira/browse/YARN-3230
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Jian He


Today, application state are simply surfaced as a single word on the web UI. 
Not everyone understands the meaning of "NEW_SAVING, SUBMITTED, ACCEPTED". This 
jira is to clarify the meaning of these states, things like what the 
application is waiting for at this state. 

In addition,the difference between application state and FinalStatus are fairly 
confusing to users, especially when state=FINISHED, but FinalStatus=FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2015-02-19 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328071#comment-14328071
 ] 

Jason Lowe commented on YARN-1963:
--

I'd like to see changing app priorities addressed as it is a common ask from 
users.  In many cases jobs are submitted to the cluster via some 
workflow/pipeline, and they would like to change the priority of apps already 
submitted.  Otherwise they have to update their workflow/pipeline to change the 
submit-time priority, kill the active jobs, and resubmit the apps for the 
priority to take effect.  Then eventually they need to change it all back to 
normal priorities later.

> Support priorities across applications within the same queue 
> -
>
> Key: YARN-1963
> URL: https://issues.apache.org/jira/browse/YARN-1963
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Reporter: Arun C Murthy
>Assignee: Sunil G
> Attachments: YARN Application Priorities Design.pdf, YARN Application 
> Priorities Design_01.pdf
>
>
> It will be very useful to support priorities among applications within the 
> same queue, particularly in production scenarios. It allows for finer-grained 
> controls without having to force admins to create a multitude of queues, plus 
> allows existing applications to continue using existing queues which are 
> usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3040) [Data Model] Implement client-side API for handling flows

2015-02-19 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328059#comment-14328059
 ] 

Sangjin Lee commented on YARN-3040:
---

[~Naganarasimha], the scope of this JIRA is to provide a way for a YARN client 
to set and pass information about the flow along with the application. With 
this, we don't need to worry about actually creating a flow on the timeline 
service side of things yet.

Basically we need the ability to set flow id, flow version, and the flow run id 
as strings.

As for the cluster, I think trivially the cluster will be assumed to be the 
same cluster it is on, so the client doesn't need to set it. The RM and/or the 
aggregator will use the current cluster and write it out to the timeline 
service storage.

> [Data Model] Implement client-side API for handling flows
> -
>
> Key: YARN-3040
> URL: https://issues.apache.org/jira/browse/YARN-3040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Robert Kanter
>
> Per design in YARN-2928, implement client-side API for handling *flows*. 
> Frameworks should be able to define and pass in all attributes of flows and 
> flow runs to YARN, and they should be passed into ATS writers.
> YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3227) Timeline renew delegation token fails when RM user's TGT is expired

2015-02-19 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328043#comment-14328043
 ] 

Zhijie Shen commented on YARN-3227:
---

Does HDFS token renew requires the kerberos authentication or just the token 
authentication? Timeline token requires kerberos authentication being passed 
first.

> Timeline renew delegation token fails when RM user's TGT is expired
> ---
>
> Key: YARN-3227
> URL: https://issues.apache.org/jira/browse/YARN-3227
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Priority: Critical
>
> When the RM user's kerberos TGT is expired, the RM renew delegation token 
> operation fails as part of job submission. Expected behavior is that RM will 
> relogin to get a new TGT.
> {quote}
> 2015-02-06 18:54:05,617 [DelegationTokenRenewer #25954] WARN
> security.DelegationTokenRenewer: Unable to add the application to the
> delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN,
> Service: timelineserver.example.com:4080, Ident: (owner=user,
> renewer=rmuser, realUser=oozie, issueDate=1423248845528,
> maxDate=1423853645528, sequenceNumber=9716, masterKeyId=9)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:443)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$800(DelegationTokenRenewer.java:77)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:808)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:789)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.io.IOException: HTTP status [401], message [Unauthorized]
> at
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:169)
> at
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:286)
> at
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:211)
> at
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:414)
> at
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:374)
> at
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:360)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$4.run(TimelineClientImpl.java:429)
> at
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:161)
> at
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:444)
> at
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:378)
> at
> org.apache.hadoop.yarn.security.client.TimelineDelegationTokenIdentifier$Renewer.renew(TimelineDelegationTokenIdentifier.java:81)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:532)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:529)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer

2015-02-19 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328048#comment-14328048
 ] 

Sangjin Lee commented on YARN-3034:
---

Sorry for my delayed response...

As for the code organization, also see the discussion on YARN-3166.

{quote}
If we plan to handle similar to current approach i.e send the Entity data 
through a rest client to a timeline writer service(RMTimelineAggregator), where 
should this service be running i.e. as part of which process or should it be a 
daemon on its own?
{quote}
Since this is going to be exclusively used by the resource manager, we have two 
options: (1) start it inside the RM process as a service, or (2) start it as a 
standalone daemon on the RM machine. I think either is fine, although it would 
be good to both as options.

{quote}
Is RMTimelineAggregator is expected to do any primary (preliminary) aggregation 
of some metrics ? Just wanted to the know reason to have a specific 
TimeLineAggregator for RM separately?
{quote}
I do think the RM will simply push data directly to the timeline storage 
without much aggregation, and so on. The RM doesn’t really handle the app-level 
aggregation or separation (that’s why the RM’s aggregator service needs to 
extend the BaseAggregatorService directly).

{quote}
IIUC RM needs to add User and Queue Entities when application is created if the 
specified user and queue doesnt exist as entity in ATS ? Apart from this Queue 
Entity has Parent Queue information, is it something like when CS/FS is 
initialized we need to create Entities for new queues and hierarcies ? Is it 
not sufficent to just have for Leaf Queue Entity and just have parent path as 
its meta info, is hierarchy req?
{quote}
I would say let’s not worry about the queue and the user from the RM point of 
view. As discussed in YARN-2928, the user and queue entity classes are there 
primarily for serving reads.

{quote}
In the original design, it is proposed that we need to organize application 
level aggregators into collections (either on the NMs or on the RM, supposedly 
implemented as AppLevelServiceManager? ), and the servers launches its own 
collection.
{quote}
As mentioned above, the RM doesn’t really need to deal with the per-app 
boundary. For a practical perspective also, if it did it could become a fairly 
big memory hotspot if the RM’s aggregator starts collecting per-app stuff as it 
deals with a large number of applications. We envision the RM aggregator as 
pushing entities pretty directly onto storage.


> [Aggregator wireup] Implement RM starting its ATS writer
> 
>
> Key: YARN-3034
> URL: https://issues.apache.org/jira/browse/YARN-3034
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3034.20150205-1.patch
>
>
> Per design in YARN-2928, implement resource managers starting their own ATS 
> writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2015-02-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327995#comment-14327995
 ] 

Wangda Tan commented on YARN-1963:
--

One more question: I didn't see there's an API proposed to update app priority, 
I think it may be very useful when a job ran for some time, and need get 
completed as soon as we can.

Is this a valid use case that we need to do within YARN-1963 scope?

> Support priorities across applications within the same queue 
> -
>
> Key: YARN-1963
> URL: https://issues.apache.org/jira/browse/YARN-1963
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Reporter: Arun C Murthy
>Assignee: Sunil G
> Attachments: YARN Application Priorities Design.pdf, YARN Application 
> Priorities Design_01.pdf
>
>
> It will be very useful to support priorities among applications within the 
> same queue, particularly in production scenarios. It allows for finer-grained 
> controls without having to force admins to create a multitude of queues, plus 
> allows existing applications to continue using existing queues which are 
> usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3166) [Source organization] Decide detailed package structures for timeline service v2 components

2015-02-19 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327987#comment-14327987
 ] 

Sangjin Lee commented on YARN-3166:
---

[~gtCarrera9], sorry for my delayed response.

It looks good mostly. I have some feedback:
- As [~zjshen] mentioned, we need to sort out the RM/NM dependency on the 
timelineservice module. The NM dependency is more of a fluke, but we need to 
think about the RM dependency because it needs to start its own aggregator 
service. I believe [~Naganarasimha] mentioned this in another JIRA. Perhaps 
this is unavoidable if RM is going to start the aggregator? I am not aware of 
any clean pluggable service mechanism for RM (like the aux services for NM). 
Another idea if we don't want that is to move the base aggregator class into 
yarn-server-common.
- I think as a rule, it would be good to make sure not to disturb the old ATS 
classes. IIUC we're deprecating the old ATS classes, but we're not going to 
modify them in an incompatible way (e.g. moving classes, removing classes, 
changing interfaces, etc.), as that would be extremely disruptive once this is 
merged.
- What is the difference between TimelineStorage and TimelineStorageImpl?

> [Source organization] Decide detailed package structures for timeline service 
> v2 components
> ---
>
> Key: YARN-3166
> URL: https://issues.apache.org/jira/browse/YARN-3166
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>
> Open this JIRA to track all discussions on detailed package structures for 
> timeline services v2. This JIRA is for discussion only.
> For our current timeline service v2 design, aggregator (previously called 
> "writer") implementation is in hadoop-yarn-server's:
> {{org.apache.hadoop.yarn.server.timelineservice.aggregator}}
> In YARN-2928's design, the next gen ATS reader is also a server. Maybe we 
> want to put reader related implementations into hadoop-yarn-server's:
> {{org.apache.hadoop.yarn.server.timelineservice.reader}}
> Both readers and aggregators will expose features that may be used by YARN 
> and other 3rd party components, such as aggregator/reader APIs. For those 
> features, maybe we would like to expose their interfaces to 
> hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? 
> Let's use this JIRA as a centralized place to track all related discussions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2693) Priority Label Manager in RM to manage priority labels

2015-02-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327981#comment-14327981
 ] 

Wangda Tan commented on YARN-2693:
--

[~sunilg],
After thinking about this, I feel like maybe this part is not required before 
adding major functionalities. 

I found existing implementation of priority label manager is very similar to 
node label manager, but they're two different use cases.

In node label manager, each node can be assigned labels, there're lots of 
mappings in the cluster. 
However, priority labels will be much simpler, less than 2 dozens of text-based 
priority labels should satisfy most use cases, and priority labels will not 
likely to be changed frequently.

So what I suggest now is making a simple configuration-based labels first, if 
RM HA need to be supported, admin can put same priority-label configuration 
item to different RM nodes : now we don't have a centralized configuration for 
Hadoop daemon, we assume different RM nodes should have same yarn-site.xml 
setting.

After major functionality completed (saying RM / scheduler / API / Client 
side), more time could be spent on this part :).

Ideas?

> Priority Label Manager in RM to manage priority labels
> --
>
> Key: YARN-2693
> URL: https://issues.apache.org/jira/browse/YARN-2693
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 
> 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch
>
>
> Focus of this JIRA is to have a centralized service to handle priority labels.
> Support operations such as
> * Add/Delete priority label to a specified queue
> * Manage integer mapping associated with each priority label
> * Support managing default priority label of a given queue
> * ACL support in queue level for priority label
> * Expose interface to RM to validate priority label
> Storage for this labels will be done in FileSystem and in Memory similar to 
> NodeLabel
> * FileSystem Based : persistent across RM restart
> * Memory Based: non-persistent across RM restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3076) Add API/Implementation to YarnClient to retrieve label-to-node mapping

2015-02-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327969#comment-14327969
 ] 

Hudson commented on YARN-3076:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7157 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7157/])
YARN-3076. Add API/Implementation to YarnClient to retrieve label-to-node 
mapping (Varun Saxena via wangda) (wangda: rev 
d49ae725d5fa3eecf879ac42c42a368dd811f854)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestPBImplRecords.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetLabelsToNodesRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetLabelsToNodesResponse.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java


> Add API/Implementation to YarnClient to retrieve label-to-node mapping
> --
>
> Key: YARN-3076
> URL: https://issues.apache.org/jira/browse/YARN-3076
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 2.7.0
>
> Attachments: YARN-3076.001.patch, YARN-3076.002.patch, 
> YARN-3076.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3076) Add API/Implementation to YarnClient to retrieve label-to-node mapping

2015-02-19 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3076:
-
Summary: Add API/Implementation to YarnClient to retrieve label-to-node 
mapping  (was: YarnClient implementation to retrieve label to node mapping)

> Add API/Implementation to YarnClient to retrieve label-to-node mapping
> --
>
> Key: YARN-3076
> URL: https://issues.apache.org/jira/browse/YARN-3076
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3076.001.patch, YARN-3076.002.patch, 
> YARN-3076.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-02-19 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327941#comment-14327941
 ] 

Devaraj K commented on YARN-3225:
-

What would be the timeout units here, are we thinking of any constrained range 
for timeout value? Thanks

> New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
> ---
>
> Key: YARN-3225
> URL: https://issues.apache.org/jira/browse/YARN-3225
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>
> New CLI (or existing CLI with parameters) should put each node on 
> decommission list to decommissioning status and track timeout to terminate 
> the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-02-19 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reassigned YARN-3225:
---

Assignee: Devaraj K

> New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
> ---
>
> Key: YARN-3225
> URL: https://issues.apache.org/jira/browse/YARN-3225
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Devaraj K
>
> New CLI (or existing CLI with parameters) should put each node on 
> decommission list to decommissioning status and track timeout to terminate 
> the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3194) After NM restart, RM should handle NMCotainerStatuses sent by NM while registering if NM is Reconnected node

2015-02-19 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327937#comment-14327937
 ] 

Jian He commented on YARN-3194:
---

lgtm too

> After NM restart, RM should handle NMCotainerStatuses sent by NM while 
> registering if NM is Reconnected node
> 
>
> Key: YARN-3194
> URL: https://issues.apache.org/jira/browse/YARN-3194
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: NM restart is enabled
>Reporter: Rohith
>Assignee: Rohith
>Priority: Blocker
> Attachments: 0001-YARN-3194.patch, 0001-yarn-3194-v1.patch
>
>
> On NM restart ,NM sends all the outstanding NMContainerStatus to RM during 
> registration. The registration can be treated by RM as New node or 
> Reconnecting node. RM triggers corresponding event on the basis of node added 
> or node reconnected state. 
> # Node added event : Again here 2 scenario's can occur 
> ## New node is registering with different ip:port – NOT A PROBLEM
> ## Old node is re-registering because of RESYNC command from RM when RM 
> restart – NOT A PROBLEM
> # Node reconnected event : 
> ## Existing node is re-registering i.e RM treat it as reconnecting node when 
> RM is not restarted 
> ### NM RESTART NOT Enabled – NOT A PROBLEM
> ### NM RESTART is Enabled 
>  Some applications are running on this node – *Problem is here*
>  Zero applications are running on this node – NOT A PROBLEM
> Since NMContainerStatus are not handled, RM never get to know about 
> completedContainer and never release resource held be containers. RM will not 
> allocate new containers for pending resource request as long as the 
> completedContainer event is triggered. This results in applications to wait 
> indefinitly because of pending containers are not served by RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3229) Incorrect processing of container as LOST on Interruption during NM shutdown

2015-02-19 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-3229:
---

 Summary: Incorrect processing of container as LOST on Interruption 
during NM shutdown
 Key: YARN-3229
 URL: https://issues.apache.org/jira/browse/YARN-3229
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot


YARN-2846 fixed the issue of writing to the state store incorrectly that the 
process is LOST. But even after that we still process the ContainerExitEvent. 
If notInterrupted is false in RecoveredContainerLaunch#call we should skip the 
following
{noformat}
 if (retCode != 0) {
  LOG.warn("Recovered container exited with a non-zero exit code "
  + retCode);
  this.dispatcher.getEventHandler().handle(new ContainerExitEvent(
  containerId,
  ContainerEventType.CONTAINER_EXITED_WITH_FAILURE, retCode,
  "Container exited with a non-zero exit code " + retCode));
  return retCode;
}
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3204) Fix new findbug warnings in hadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)

2015-02-19 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3204:
---
Summary: Fix new findbug warnings in 
hadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)  (was: Fix 
new findbug warnings 
inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair))

> Fix new findbug warnings in 
> hadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
> --
>
> Key: YARN-3204
> URL: https://issues.apache.org/jira/browse/YARN-3204
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: YARN-3204-001.patch, YARN-3204-002.patch
>
>
> Please check following findbug report..
> https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3204) Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)

2015-02-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327816#comment-14327816
 ] 

Hadoop QA commented on YARN-3204:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12699701/YARN-3204-002.patch
  against trunk revision 2fd02af.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6669//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6669//console

This message is automatically generated.

> Fix new findbug warnings 
> inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
> -
>
> Key: YARN-3204
> URL: https://issues.apache.org/jira/browse/YARN-3204
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: YARN-3204-001.patch, YARN-3204-002.patch
>
>
> Please check following findbug report..
> https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3224) Notify AM with containers (on decommissioning node) could be preempted after timeout.

2015-02-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327811#comment-14327811
 ] 

Junping Du commented on YARN-3224:
--

Sure. Please go ahead to take on this JIRA. Thanks [~sunilg]!

> Notify AM with containers (on decommissioning node) could be preempted after 
> timeout.
> -
>
> Key: YARN-3224
> URL: https://issues.apache.org/jira/browse/YARN-3224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-02-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327807#comment-14327807
 ] 

Junping Du commented on YARN-3225:
--

Thanks [~sunilg] for the comments! Yes. I mean mradmin command line. I think it 
could be better to pass a timeout with adding a parameter something like "-t". 
Without this parameter, it will decommission node forcefully just like the old. 
Thoughts?

> New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
> ---
>
> Key: YARN-3225
> URL: https://issues.apache.org/jira/browse/YARN-3225
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>
> New CLI (or existing CLI with parameters) should put each node on 
> decommission list to decommissioning status and track timeout to terminate 
> the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3227) Timeline renew delegation token fails when RM user's TGT is expired

2015-02-19 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327798#comment-14327798
 ] 

Vinod Kumar Vavilapalli commented on YARN-3227:
---

Is it only the Timeline delegation token that fails renewal or all the tokens?

> Timeline renew delegation token fails when RM user's TGT is expired
> ---
>
> Key: YARN-3227
> URL: https://issues.apache.org/jira/browse/YARN-3227
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Priority: Critical
>
> When the RM user's kerberos TGT is expired, the RM renew delegation token 
> operation fails as part of job submission. Expected behavior is that RM will 
> relogin to get a new TGT.
> {quote}
> 2015-02-06 18:54:05,617 [DelegationTokenRenewer #25954] WARN
> security.DelegationTokenRenewer: Unable to add the application to the
> delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN,
> Service: timelineserver.example.com:4080, Ident: (owner=user,
> renewer=rmuser, realUser=oozie, issueDate=1423248845528,
> maxDate=1423853645528, sequenceNumber=9716, masterKeyId=9)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:443)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$800(DelegationTokenRenewer.java:77)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:808)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:789)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.io.IOException: HTTP status [401], message [Unauthorized]
> at
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:169)
> at
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:286)
> at
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:211)
> at
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:414)
> at
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:374)
> at
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:360)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$4.run(TimelineClientImpl.java:429)
> at
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:161)
> at
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:444)
> at
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:378)
> at
> org.apache.hadoop.yarn.security.client.TimelineDelegationTokenIdentifier$Renewer.renew(TimelineDelegationTokenIdentifier.java:81)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:532)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:529)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3228) Deadlock altering user resource queue

2015-02-19 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-3228.
---
Resolution: Incomplete

Not sure how/why this is related to Hadoop. In any case, please first try to 
resolve user issues in the user mailing lists 
(http://hadoop.apache.org/mailing_lists.html).

The JIRA is a place to address existing bugs/new features in the project. 
Closing this for now. Thanks.

> Deadlock altering user resource queue
> -
>
> Key: YARN-3228
> URL: https://issues.apache.org/jira/browse/YARN-3228
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager, scheduler
>Affects Versions: 2.0.1-alpha
> Environment: hadoop yarn, postgresql 
>Reporter: Christian Hott
>Priority: Blocker
>  Labels: newbie
>   Original Estimate: 203h
>  Remaining Estimate: 203h
>
> let me introduce you with my problem:
> all of this began after we created some resources queues on postgresql,
> well we created it, assign it to the users and all was fine...
> until we run a process (a large one iterative query) and I do an Alter Role 
> over the user and the resource queue that he was using, before that I can't 
> login whit the user and got a message saying "deadlock detection, locking 
> against self"
> does you have any idea why this for? or if have any comprensible log in to I 
> can search for more information?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3228) Deadlock altering user resource queue

2015-02-19 Thread Christian Hott (JIRA)
Christian Hott created YARN-3228:


 Summary: Deadlock altering user resource queue
 Key: YARN-3228
 URL: https://issues.apache.org/jira/browse/YARN-3228
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager, scheduler
Affects Versions: 2.0.1-alpha
 Environment: hadoop yarn, postgresql 
Reporter: Christian Hott
Priority: Blocker


let me introduce you with my problem:
all of this began after we created some resources queues on postgresql,
well we created it, assign it to the users and all was fine...
until we run a process (a large one iterative query) and I do an Alter Role 
over the user and the resource queue that he was using, before that I can't 
login whit the user and got a message saying "deadlock detection, locking 
against self"
does you have any idea why this for? or if have any comprensible log in to I 
can search for more information?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-02-19 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2004:
--
Attachment: 0002-YARN-2004.patch

> Priority scheduling support in Capacity scheduler
> -
>
> Key: YARN-2004
> URL: https://issues.apache.org/jira/browse/YARN-2004
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-2004.patch, 0002-YARN-2004.patch
>
>
> Based on the priority of the application, Capacity Scheduler should be able 
> to give preference to application while doing scheduling.
> Comparator applicationComparator can be changed as below.   
> 
> 1.Check for Application priority. If priority is available, then return 
> the highest priority job.
> 2.Otherwise continue with existing logic such as App ID comparison and 
> then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2004) Priority scheduling support in Capacity scheduler

2015-02-19 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327730#comment-14327730
 ] 

Sunil G commented on YARN-2004:
---

As per YARN-2003, RMAppManager#submitApplication process input from 
submissionContext. I will add a case here which will handle the scenario where 
priority is NULL from submission context. It can be updated with default 
priority from queue. 

As for this patch, i can remove NULL check. Will only have a direct compareTo 
check for priority.

> Priority scheduling support in Capacity scheduler
> -
>
> Key: YARN-2004
> URL: https://issues.apache.org/jira/browse/YARN-2004
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-2004.patch
>
>
> Based on the priority of the application, Capacity Scheduler should be able 
> to give preference to application while doing scheduling.
> Comparator applicationComparator can be changed as below.   
> 
> 1.Check for Application priority. If priority is available, then return 
> the highest priority job.
> 2.Otherwise continue with existing logic such as App ID comparison and 
> then TimeStamp comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3227) Timeline renew delegation token fails when RM user's TGT is expired

2015-02-19 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created YARN-3227:
-

 Summary: Timeline renew delegation token fails when RM user's TGT 
is expired
 Key: YARN-3227
 URL: https://issues.apache.org/jira/browse/YARN-3227
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Priority: Critical


When the RM user's kerberos TGT is expired, the RM renew delegation token 
operation fails as part of job submission. Expected behavior is that RM will 
relogin to get a new TGT.

{quote}
2015-02-06 18:54:05,617 [DelegationTokenRenewer #25954] WARN
security.DelegationTokenRenewer: Unable to add the application to the
delegation token renewer.
java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN,
Service: timelineserver.example.com:4080, Ident: (owner=user,
renewer=rmuser, realUser=oozie, issueDate=1423248845528,
maxDate=1423853645528, sequenceNumber=9716, masterKeyId=9)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:443)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$800(DelegationTokenRenewer.java:77)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:808)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:789)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: HTTP status [401], message [Unauthorized]
at
org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:169)
at
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:286)
at
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:211)
at
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:414)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:374)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:360)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$4.run(TimelineClientImpl.java:429)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:161)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:444)
at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:378)
at
org.apache.hadoop.yarn.security.client.TimelineDelegationTokenIdentifier$Renewer.renew(TimelineDelegationTokenIdentifier.java:81)
at org.apache.hadoop.security.token.Token.renew(Token.java:377)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:532)
at
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:529)
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-02-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327718#comment-14327718
 ] 

Hadoop QA commented on YARN-2495:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12685787/YARN-2495.20141208-1.patch
  against trunk revision 2fd02af.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6670//console

This message is automatically generated.

> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
> YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
> YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
> using script suggested by [~aw] (YARN-2729) )
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-02-19 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327716#comment-14327716
 ] 

Craig Welch commented on YARN-2495:
---

So, here's my proposal [~Naganarasimha] [~leftnoteasy], take a minute and 
consider whether or not DECENTRALIZED_CONFIGURATION_ENABLED is more likely to 
cause difficulty than prevent it, as I'm suggesting, and then you all can 
decide to keep it or not as you wish - I don't want to hold up the way forward 
over something which is, on the whole, a detail...

> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
> YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
> YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
> using script suggested by [~aw] (YARN-2729) )
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission

2015-02-19 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327711#comment-14327711
 ] 

Varun Saxena commented on YARN-3223:


Junping Du, pls reassign if you plan to work on this

> Resource update during NM graceful decommission
> ---
>
> Key: YARN-3223
> URL: https://issues.apache.org/jira/browse/YARN-3223
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Junping Du
>Assignee: Varun Saxena
>
> During NM graceful decommission, we should handle resource update properly, 
> include: make RMNode keep track of old resource for possible rollback, keep 
> available resource to 0 and used resource get updated when
> container finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3204) Fix new findbug warnings inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)

2015-02-19 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3204:
---
Attachment: YARN-3204-002.patch

> Fix new findbug warnings 
> inhadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
> -
>
> Key: YARN-3204
> URL: https://issues.apache.org/jira/browse/YARN-3204
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: YARN-3204-001.patch, YARN-3204-002.patch
>
>
> Please check following findbug report..
> https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >