date:20131018


 [ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1172:
-

Attachment: YARN-1172.7.patch

 Convert *SecretManagers in the RM to services
 -

 Key: YARN-1172
 URL: https://issues.apache.org/jira/browse/YARN-1172
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1172.1.patch, YARN-1172.2.patch, YARN-1172.3.patch, 
 YARN-1172.4.patch, YARN-1172.5.patch, YARN-1172.6.patch, YARN-1172.7.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services


[ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798875#comment-13798875
 ] 

Hadoop QA commented on YARN-1172:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609100/YARN-1172.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 21 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2218//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2218//console

This message is automatically generated.

 Convert *SecretManagers in the RM to services
 -

 Key: YARN-1172
 URL: https://issues.apache.org/jira/browse/YARN-1172
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1172.1.patch, YARN-1172.2.patch, YARN-1172.3.patch, 
 YARN-1172.4.patch, YARN-1172.5.patch, YARN-1172.6.patch, YARN-1172.7.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1319) Documentation has wrong entry

2013-10-18 Thread Siddharth Tiwari (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798961#comment-13798961
 ] 

Siddharth Tiwari commented on YARN-1319:


The installation documentation for Hadoop yarn at this link 
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
 has error in the yarn-site for property yarn.nodemanager.aux-services. it 
should be  mapreduce_shuffle rather than mapreduce.shuffle.

 Documentation has wrong entry 
 --

 Key: YARN-1319
 URL: https://issues.apache.org/jira/browse/YARN-1319
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.2.0
 Environment: Linux
Reporter: Siddharth Tiwari
Priority: Minor
 Fix For: 2.2.0






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1319) Documentation has wrong entry

2013-10-18 Thread Siddharth Tiwari (JIRA)

Siddharth Tiwari created YARN-1319:
--

 Summary: Documentation has wrong entry 
 Key: YARN-1319
 URL: https://issues.apache.org/jira/browse/YARN-1319
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.2.0
 Environment: Linux
Reporter: Siddharth Tiwari
Priority: Minor
 Fix For: 2.2.0






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1319) Documentation has wrong entry

2013-10-18 Thread Siddharth Tiwari (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Tiwari updated YARN-1319:
---

Description: The installation documentation for Hadoop yarn at this link 
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
 has error in the yarn-site for property yarn.nodemanager.aux-services. it 
should be mapreduce_shuffle rather than mapreduce.shuffle.

 Documentation has wrong entry 
 --

 Key: YARN-1319
 URL: https://issues.apache.org/jira/browse/YARN-1319
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.2.0
 Environment: Linux
Reporter: Siddharth Tiwari
Priority: Minor
 Fix For: 2.2.0


 The installation documentation for Hadoop yarn at this link 
 http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
  has error in the yarn-site for property yarn.nodemanager.aux-services. it 
 should be mapreduce_shuffle rather than mapreduce.shuffle.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Resolved] (YARN-1319) Documentation has wrong entry

2013-10-18 Thread Arpit Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Gupta resolved YARN-1319.
---

Resolution: Duplicate

 Documentation has wrong entry 
 --

 Key: YARN-1319
 URL: https://issues.apache.org/jira/browse/YARN-1319
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.2.0
 Environment: Linux
Reporter: Siddharth Tiwari
Priority: Minor
 Fix For: 2.2.0


 The installation documentation for Hadoop yarn at this link 
 http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
  has error in the yarn-site for property yarn.nodemanager.aux-services. it 
 should be mapreduce_shuffle rather than mapreduce.shuffle.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

[
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799170#comment-13799170
]

Steve Loughran commented on YARN-913:
-

# your design reminds me a lot of bonjour, just a thought.
# my use case is ensure that no other instance of my application name exists,
eg {{steve/hoya/cluster4}} , so avoiding race conditions. I'd have the server
attempt to register on startup -and if it could not, fail. Implication: atomic
registration by name
# Hadoop now ships with the ZK JAR, for HA NN, soon the RM will use it too.
This will let us assume that ZK is a live service, and make use of it.

Add a way to register long-lived services in a YARN cluster
---

Key: YARN-913
URL: https://issues.apache.org/jira/browse/YARN-913
Project: Hadoop YARN
Issue Type: New Feature
Components: api
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Robert Joseph Evans
Attachments: RegistrationServiceDetails.txt

In a YARN cluster you can't predict where services will come up -or on what
ports. The services need to work those things out as they come up and then
publish them somewhere.
Applications need to be able to find the service instance they are to bond to
-and not any others in the cluster.
Some kind of service registry -in the RM, in ZK, could do this. If the RM
held the write access to the ZK nodes, it would be more secure than having
apps register with ZK themselves.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1


[ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799180#comment-13799180
 ] 

Steve Loughran commented on YARN-614:
-

Chris -are you using this? For long lived services we'd need that sliding 
window of failures

 Retry attempts automatically for hardware failures or YARN issues and set 
 default app retries to 1
 --

 Key: YARN-614
 URL: https://issues.apache.org/jira/browse/YARN-614
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Chris Riccomini
 Fix For: 2.3.0

 Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
 YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch


 Attempts can fail due to a large number of user errors and they should not be 
 retried unnecessarily. The only reason YARN should retry an attempt is when 
 the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
 errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests


[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799189#comment-13799189
 ] 

Steve Loughran commented on YARN-796:
-

I'd like to be able to allocate different labels to different queues, so that 
analytics workloads could go to one set of machines, network ingress/egress 
applications to another pool. You don't want to add label awareness to these 
applications, whereas queue-level would seem more appropriate, as it puts the 
cluster admins in charge

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-896) Roll up for long-lived services in YARN


[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799199#comment-13799199
 ] 

Steve Loughran commented on YARN-896:
-

Link to YARN-810, CGroup limits for CPU

 Roll up for long-lived services in YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1305) RMHAProtocolService#serviceInit should handle HAUtil's IllegalArgumentException


 [ 
https://issues.apache.org/jira/browse/YARN-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1305:
-

Attachment: YARN-1305.4.patch

Thank you, Bikas. Updated a patch based on your review. 

* Updated getRMHAId()/setConfValue() to show invalid value.
* Updated getRMHAIds() to handle a case RM_HA_IDS is empty. Added a test to 
testGetRMServiceId for this.
* Updated getRMHAId() to be an error when HA is enabled but RM_HA_IDS is not 
set to have multiple values of RM Id's.

Additionally, I noticed that HAUtil cannot handle configs with 
spaces/tabs/return values, because HAUtil uses Configuration#get(), not 
getTrimmed(). This patch fixes it.

The log messages with this patch are as follows.

A case RM_HA_ID is empty:
{code}
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Invalid configuration! 
yarn.resourcemanager.ha.id needs to be set in a HA configuration
{code}

A case RM_HA_ID is invalid:
{code}
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Invalid configuration! 
Invalid value of yarn.resourcemanager.ha.id. Current value is .rm1
{code}

A case RM_HA_IDS is empty or invalid:
{code}
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Invalid configuration! 
yarn.resourcemanager.ha.rm-ids is invalid. Current value is null
{code}

A case RM_HA_IDS doesn't contain RM_HA_ID:
{code}
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Invalid configuration! 
yarn.resourcemanager.ha.rm-ids([rm2, rm3]) need to contain 
yarn.resourcemanager.ha.id(rm1) in a HA configuration. 
{code}

A case HAUtil.RPC_ADDRESS_CONF_KEYS related configuration is not set:
{code}
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Invalid configuration! 
yarn.resourcemanager.address.rm1 needs to be set in a HA configuration.
{code}

 RMHAProtocolService#serviceInit should handle HAUtil's 
 IllegalArgumentException
 ---

 Key: YARN-1305
 URL: https://issues.apache.org/jira/browse/YARN-1305
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
  Labels: ha
 Attachments: YARN-1305.1.patch, YARN-1305.2.patch, YARN-1305.3.patch, 
 YARN-1305.4.patch


 When yarn.resourcemanager.ha.enabled is true, RMHAProtocolService#serviceInit 
 calls HAUtil.setAllRpcAddresses. If the configuration values are null, it 
 just throws IllegalArgumentException.
 It's messy to analyse which keys are null, so we should handle it and log the 
 name of keys which are null.
 A current log dump is as follows:
 {code}
 2013-10-15 06:24:53,431 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered 
 UNIX signal handlers for [TERM, HUP, INT]
 2013-10-15 06:24:54,203 INFO org.apache.hadoop.service.AbstractService: 
 Service RMHAProtocolService failed in state INITED; cause: 
 java.lang.IllegalArgumentException: Property value must not be null
 java.lang.IllegalArgumentException: Property value must not be null
 at 
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:816)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:798)
 at org.apache.hadoop.yarn.conf.HAUtil.setConfValue(HAUtil.java:100)
 at 
 org.apache.hadoop.yarn.conf.HAUtil.setAllRpcAddresses(HAUtil.java:105)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService.serviceInit(RMHAProtocolService.java:60)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:940)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-614) Retry attempts automatically for hardware failures or YARN issues and set default app retries to 1

2013-10-18 Thread Chris Riccomini (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799262#comment-13799262
 ] 

Chris Riccomini commented on YARN-614:
--

Hey Steve,

Sadly, no. I haven't had time to rebase/make the tests work.

Sorry :/

 Retry attempts automatically for hardware failures or YARN issues and set 
 default app retries to 1
 --

 Key: YARN-614
 URL: https://issues.apache.org/jira/browse/YARN-614
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Chris Riccomini
 Fix For: 2.3.0

 Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
 YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch


 Attempts can fail due to a large number of user errors and they should not be 
 retried unnecessarily. The only reason YARN should retry an attempt is when 
 the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
 errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1305) RMHAProtocolService#serviceInit should handle HAUtil's IllegalArgumentException


[ 
https://issues.apache.org/jira/browse/YARN-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799281#comment-13799281
 ] 

Hadoop QA commented on YARN-1305:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609167/YARN-1305.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2219//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2219//console

This message is automatically generated.

 RMHAProtocolService#serviceInit should handle HAUtil's 
 IllegalArgumentException
 ---

 Key: YARN-1305
 URL: https://issues.apache.org/jira/browse/YARN-1305
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
  Labels: ha
 Attachments: YARN-1305.1.patch, YARN-1305.2.patch, YARN-1305.3.patch, 
 YARN-1305.4.patch


 When yarn.resourcemanager.ha.enabled is true, RMHAProtocolService#serviceInit 
 calls HAUtil.setAllRpcAddresses. If the configuration values are null, it 
 just throws IllegalArgumentException.
 It's messy to analyse which keys are null, so we should handle it and log the 
 name of keys which are null.
 A current log dump is as follows:
 {code}
 2013-10-15 06:24:53,431 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered 
 UNIX signal handlers for [TERM, HUP, INT]
 2013-10-15 06:24:54,203 INFO org.apache.hadoop.service.AbstractService: 
 Service RMHAProtocolService failed in state INITED; cause: 
 java.lang.IllegalArgumentException: Property value must not be null
 java.lang.IllegalArgumentException: Property value must not be null
 at 
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:816)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:798)
 at org.apache.hadoop.yarn.conf.HAUtil.setConfValue(HAUtil.java:100)
 at 
 org.apache.hadoop.yarn.conf.HAUtil.setAllRpcAddresses(HAUtil.java:105)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService.serviceInit(RMHAProtocolService.java:60)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:940)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1320) Custom log4j properties does not work properly.

2013-10-18 Thread Tassapol Athiapinya (JIRA)

Tassapol Athiapinya created YARN-1320:
-

 Summary: Custom log4j properties does not work properly.
 Key: YARN-1320
 URL: https://issues.apache.org/jira/browse/YARN-1320
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
 Fix For: 2.2.1


Distributed shell cannot pick up custom log4j properties (specified with 
-log_properties). It always uses default log4j properties.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1052) Enforce submit application queue ACLs outside the scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1052:
--

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-1317

 Enforce submit application queue ACLs outside the scheduler
 ---

 Key: YARN-1052
 URL: https://issues.apache.org/jira/browse/YARN-1052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Xuan Gong

 Per discussion in YARN-899, schedulers should not need to enforce queue ACLs 
 on their own.  Currently schedulers do this for application submission, and 
 this should be done in the RM code instead.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1317) Make Queue, QueueACLs and QueueMetrics first class citizens in YARN


[ 
https://issues.apache.org/jira/browse/YARN-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799296#comment-13799296
 ] 

Vinod Kumar Vavilapalli commented on YARN-1317:
---

Thanks Sandy, didn't see that before. Made it a sub-task.

 Make Queue, QueueACLs and QueueMetrics first class citizens in YARN
 ---

 Key: YARN-1317
 URL: https://issues.apache.org/jira/browse/YARN-1317
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 Today, we are duplicating the exact same code in all the schedulers. Queue is 
 a top class concept - clientService, web-services etc already recognize queue 
 as a top level concept.
 We need to move Queue, QueueMetrics and QueueACLs to be top level.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2013-10-18 Thread Arun C Murthy (JIRA)

[
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799331#comment-13799331
]

Arun C Murthy commented on YARN-415:

I'm sorry to come in late, I just did a cursory look.

One question: Do we really need to track ResourceUsage for each Container?
Can't we just add it up when a container finishes? Maybe I'm missing something?
But, I'd like to not have a lot of per-container state if possible. Thanks.

Capture memory utilization at the app-level for chargeback
--

Key: YARN-415
URL: https://issues.apache.org/jira/browse/YARN-415
Project: Hadoop YARN
Issue Type: New Feature
Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
Attachments: YARN-415--n2.patch, YARN-415--n3.patch,
YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch,
YARN-415--n7.patch, YARN-415--n8.patch, YARN-415.patch

For the purpose of chargeback, I'd like to be able to compute the cost of an
application in terms of cluster resource usage. To start out, I'd like to
get the memory utilization of an application. The unit should be MB-seconds
or something similar and, from a chargeback perspective, the memory amount
should be the memory reserved for the application, as even if the app didn't
use all that memory, no one else was able to use it.
(reserved ram for container 1 * lifetime of container 1) + (reserved ram for
container 2 * lifetime of container 2) + ... + (reserved ram for container n
* lifetime of container n)
It'd be nice to have this at the app level instead of the job level because:
1. We'd still be able to get memory usage for jobs that crashed (and wouldn't
appear on the job history server).
2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
This new metric should be available both through the RM UI and RM Web
Services REST API.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Assigned] (YARN-1320) Custom log4j properties does not work properly.


 [ 
https://issues.apache.org/jira/browse/YARN-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-1320:
---

Assignee: Xuan Gong

 Custom log4j properties does not work properly.
 ---

 Key: YARN-1320
 URL: https://issues.apache.org/jira/browse/YARN-1320
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1


 Distributed shell cannot pick up custom log4j properties (specified with 
 -log_properties). It always uses default log4j properties.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1305) RMHAProtocolService#serviceInit should handle HAUtil's IllegalArgumentException


[ 
https://issues.apache.org/jira/browse/YARN-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799336#comment-13799336
 ] 

Bikas Saha commented on YARN-1305:
--

Nice update! We are almost there. After seeing the patch I am feeling that we 
should consolidate all these verifications into a single method that we call in 
HAService.serviceInit(). That way the get* methods will be simple and will not 
be performing checks all the time (its unnecessary after the first time). After 
the verification method has passed then we can confidently proceed in the 
remaining code. We can add more verifications of conf in the same method and 
ensure that we give a clean and user friendly YARN HA setup experience to 
users. What do you think?

I did not see a test that verifies that more than 1 RM id must be specified in 
RM-HA-IDs?


 RMHAProtocolService#serviceInit should handle HAUtil's 
 IllegalArgumentException
 ---

 Key: YARN-1305
 URL: https://issues.apache.org/jira/browse/YARN-1305
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
  Labels: ha
 Attachments: YARN-1305.1.patch, YARN-1305.2.patch, YARN-1305.3.patch, 
 YARN-1305.4.patch


 When yarn.resourcemanager.ha.enabled is true, RMHAProtocolService#serviceInit 
 calls HAUtil.setAllRpcAddresses. If the configuration values are null, it 
 just throws IllegalArgumentException.
 It's messy to analyse which keys are null, so we should handle it and log the 
 name of keys which are null.
 A current log dump is as follows:
 {code}
 2013-10-15 06:24:53,431 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered 
 UNIX signal handlers for [TERM, HUP, INT]
 2013-10-15 06:24:54,203 INFO org.apache.hadoop.service.AbstractService: 
 Service RMHAProtocolService failed in state INITED; cause: 
 java.lang.IllegalArgumentException: Property value must not be null
 java.lang.IllegalArgumentException: Property value must not be null
 at 
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:816)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:798)
 at org.apache.hadoop.yarn.conf.HAUtil.setConfValue(HAUtil.java:100)
 at 
 org.apache.hadoop.yarn.conf.HAUtil.setAllRpcAddresses(HAUtil.java:105)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService.serviceInit(RMHAProtocolService.java:60)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:940)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1320) Custom log4j properties does not work properly.


[ 
https://issues.apache.org/jira/browse/YARN-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799354#comment-13799354
 ] 

Hadoop QA commented on YARN-1320:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609175/YARN-1320.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2220//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2220//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2220//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2220//console

This message is automatically generated.

 Custom log4j properties does not work properly.
 ---

 Key: YARN-1320
 URL: https://issues.apache.org/jira/browse/YARN-1320
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1320.1.patch


 Distributed shell cannot pick up custom log4j properties (specified with 
 -log_properties). It always uses default log4j properties.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2013-10-18 Thread Andrey Klochkov (JIRA)

[
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799363#comment-13799363
]

Andrey Klochkov commented on YARN-415:
--

Arun, the idea is to have the stats being updated in real time while the app is
running. Is there a way to get a list of running containers assigned to the
app, with their start times, without tracking it explicitly?

Capture memory utilization at the app-level for chargeback
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1320) Custom log4j properties does not work properly.


[ 
https://issues.apache.org/jira/browse/YARN-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799372#comment-13799372
 ] 

Xuan Gong commented on YARN-1320:
-

fix -1 findbug and -1release audit

 Custom log4j properties does not work properly.
 ---

 Key: YARN-1320
 URL: https://issues.apache.org/jira/browse/YARN-1320
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1320.1.patch, YARN-1320.2.patch


 Distributed shell cannot pick up custom log4j properties (specified with 
 -log_properties). It always uses default log4j properties.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1320) Custom log4j properties does not work properly.


 [ 
https://issues.apache.org/jira/browse/YARN-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1320:


Attachment: YARN-1320.2.patch

 Custom log4j properties does not work properly.
 ---

 Key: YARN-1320
 URL: https://issues.apache.org/jira/browse/YARN-1320
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1320.1.patch, YARN-1320.2.patch


 Distributed shell cannot pick up custom log4j properties (specified with 
 -log_properties). It always uses default log4j properties.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1319) Documentation has wrong entry


[ 
https://issues.apache.org/jira/browse/YARN-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799383#comment-13799383
 ] 

Tsuyoshi OZAWA commented on YARN-1319:
--

Thank you for reporting, Siddharth. The point you mentioned is now being fixed 
on HADOOP-10050. Please watch it and have discussion there if you have any 
comments. Again, thanks!


 Documentation has wrong entry 
 --

 Key: YARN-1319
 URL: https://issues.apache.org/jira/browse/YARN-1319
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.2.0
 Environment: Linux
Reporter: Siddharth Tiwari
Priority: Minor
 Fix For: 2.2.0


 The installation documentation for Hadoop yarn at this link 
 http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
  has error in the yarn-site for property yarn.nodemanager.aux-services. it 
 should be mapreduce_shuffle rather than mapreduce.shuffle.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1320) Custom log4j properties does not work properly.


[ 
https://issues.apache.org/jira/browse/YARN-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799391#comment-13799391
 ] 

Hadoop QA commented on YARN-1320:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609181/YARN-1320.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2221//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2221//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2221//console

This message is automatically generated.

 Custom log4j properties does not work properly.
 ---

 Key: YARN-1320
 URL: https://issues.apache.org/jira/browse/YARN-1320
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1320.1.patch, YARN-1320.2.patch


 Distributed shell cannot pick up custom log4j properties (specified with 
 -log_properties). It always uses default log4j properties.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Reopened] (YARN-925) HistoryStorage Reader Interface for Application History Server

2013-10-18 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reopened YARN-925:
--


geAllApplications method will not work if we have tons of applications stored. 
Users should be allowed to add some filters.

[~mayank_bansal], would you mind improving the reader interface?

 HistoryStorage Reader Interface for Application History Server
 --

 Key: YARN-925
 URL: https://issues.apache.org/jira/browse/YARN-925
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: YARN-321

 Attachments: YARN-925-1.patch, YARN-925-2.patch, YARN-925-3.patch, 
 YARN-925-4.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1305) RMHAProtocolService#serviceInit should handle HAUtil's IllegalArgumentException


[ 
https://issues.apache.org/jira/browse/YARN-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799417#comment-13799417
 ] 

Tsuyoshi OZAWA commented on YARN-1305:
--

Validation in RMHAProtocolService#initService is better idea. I also believe in 
that your proposal makes get* methods much simpler. I'll add 
HAUtil#validateConfiguration() and remove runtime verifications.

 I did not see a test that verifies that more than 1 RM id must be specified 
 in RM-HA-IDs?

This is missing point. I'll reflect this comment in next update. Thank you for 
your good suggestions!

 RMHAProtocolService#serviceInit should handle HAUtil's 
 IllegalArgumentException
 ---

 Key: YARN-1305
 URL: https://issues.apache.org/jira/browse/YARN-1305
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
  Labels: ha
 Attachments: YARN-1305.1.patch, YARN-1305.2.patch, YARN-1305.3.patch, 
 YARN-1305.4.patch


 When yarn.resourcemanager.ha.enabled is true, RMHAProtocolService#serviceInit 
 calls HAUtil.setAllRpcAddresses. If the configuration values are null, it 
 just throws IllegalArgumentException.
 It's messy to analyse which keys are null, so we should handle it and log the 
 name of keys which are null.
 A current log dump is as follows:
 {code}
 2013-10-15 06:24:53,431 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered 
 UNIX signal handlers for [TERM, HUP, INT]
 2013-10-15 06:24:54,203 INFO org.apache.hadoop.service.AbstractService: 
 Service RMHAProtocolService failed in state INITED; cause: 
 java.lang.IllegalArgumentException: Property value must not be null
 java.lang.IllegalArgumentException: Property value must not be null
 at 
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:816)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:798)
 at org.apache.hadoop.yarn.conf.HAUtil.setConfValue(HAUtil.java:100)
 at 
 org.apache.hadoop.yarn.conf.HAUtil.setAllRpcAddresses(HAUtil.java:105)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService.serviceInit(RMHAProtocolService.java:60)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:940)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1321) NMTokenCache should not be a singleton

Alejandro Abdelnur created YARN-1321:


 Summary: NMTokenCache should not be a singleton
 Key: YARN-1321
 URL: https://issues.apache.org/jira/browse/YARN-1321
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.2.1


NMTokenCache is a singleton. Because of this, if running multiple AMs in a 
single JVM NMTokens for the same node from different AMs step on each other and 
starting containers fail due to mismatch tokens.

The error observed in the client side is something like:

{code}
ERROR org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) 
cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to 
start container. 
NMToken for application attempt : appattempt_1382038445650_0002_01 was used 
for starting container with container token issued for application attempt : 
appattempt_1382038445650_0001_01
{code}




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1321) NMTokenCache should not be a singleton


[ 
https://issues.apache.org/jira/browse/YARN-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799423#comment-13799423
 ] 

Alejandro Abdelnur commented on YARN-1321:
--

NMTokens are set in YARN AMRMClientImpl and MR RMContainerAllocator. And got in 
the  YARNContainerManagementProtocolProxy via the NMTokenCache.

We need to make the NMTokenCache instantiable and make sure each AM uses its 
own instance of it.

IN the case of the YARN API this AMRMClientImpl and the NMClientImpl should 
share the same instance. 

 NMTokenCache should not be a singleton
 --

 Key: YARN-1321
 URL: https://issues.apache.org/jira/browse/YARN-1321
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.2.1


 NMTokenCache is a singleton. Because of this, if running multiple AMs in a 
 single JVM NMTokens for the same node from different AMs step on each other 
 and starting containers fail due to mismatch tokens.
 The error observed in the client side is something like:
 {code}
 ERROR org.apache.hadoop.security.UserGroupInformation: 
 PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) 
 cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request 
 to start container. 
 NMToken for application attempt : appattempt_1382038445650_0002_01 was 
 used for starting container with container token issued for application 
 attempt : appattempt_1382038445650_0001_01
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2013-10-18 Thread Jason Lowe (JIRA)

[
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799448#comment-13799448
]

Jason Lowe commented on YARN-415:
-

It's not just a real-time issue, it's also a correctness issue. When a
container finishes we need to know the time it was allocated. So regardless of
whether we want to compute the usage in real-time, the start time of a
container and its resource sizes need to be tracked somewhere in the RM.

ResourceUsage is just a Resource plus a start time, and the Resource should be
referencing the same object already referenced by the Container inside
RMContainerImpl. To implement this feature we need to track the containers
that are allocated/running (already being done by RMContainerImpl) and what
time they started (which we are not currently doing and why ResourceUsage was
created).

There is the issue of the HashMap to map a container ID to its resource and
start time. We could remove the need for this if we stored the container start
time in RMContainerImpl and had a safe way to lookup containers for an
application attempt. We can get the containers for an application via
scheduler.getSchedulerAppInfo, and RMAppAttemptImpl already does this when
generating an app report. However since RMAppAttemptImpl and the scheduler are
running in separate threads, I could see the scheduler already removing the
container before RMAppAttemptImpl received the container completion event and
tried to lookup the container for usage calculation. Given the race, along
with the fact that getSchedulerAppInfo is not necessarily cheap, it seems
reasonable to have RMAppAttemptImpl track what it needs for running containers
directly.

Capture memory utilization at the app-level for chargeback
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1322) AHS History Store Cache Implementation

2013-10-18 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1322:


Description: AHS History Store Cache Implementation  (was: Maybe we should 
include AHS classes as well (for developer usage) in yarn and yarn.cmd)

 AHS History Store Cache Implementation
 --

 Key: YARN-1322
 URL: https://issues.apache.org/jira/browse/YARN-1322
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal

 AHS History Store Cache Implementation



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1322) AHS History Store Cache Implementation

2013-10-18 Thread Mayank Bansal (JIRA)

Mayank Bansal created YARN-1322:
---

 Summary: AHS History Store Cache Implementation
 Key: YARN-1322
 URL: https://issues.apache.org/jira/browse/YARN-1322
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal


Maybe we should include AHS classes as well (for developer usage) in yarn and 
yarn.cmd



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Resolved] (YARN-884) AM expiry interval should be set to smaller of {am, nm}.liveness-monitor.expiry-interval-ms

2013-10-18 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved YARN-884.
---

  Resolution: Won't Fix
Target Version/s:   (was: )

 AM expiry interval should be set to smaller of {am, 
 nm}.liveness-monitor.expiry-interval-ms
 ---

 Key: YARN-884
 URL: https://issues.apache.org/jira/browse/YARN-884
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: configuration
 Attachments: yarn-884-1.patch


 As the AM can't outlive the NM on which it is running, it is a good idea to 
 disallow setting the am.liveness-monitor.expiry-interval-ms to a value higher 
 than nm.liveness-monitor.expiry-interval-ms



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1320) Custom log4j properties does not work properly.


 [ 
https://issues.apache.org/jira/browse/YARN-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1320:


Attachment: YARN-1320.3.patch

Fix -1 on findbug

 Custom log4j properties does not work properly.
 ---

 Key: YARN-1320
 URL: https://issues.apache.org/jira/browse/YARN-1320
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1320.1.patch, YARN-1320.2.patch, YARN-1320.3.patch


 Distributed shell cannot pick up custom log4j properties (specified with 
 -log_properties). It always uses default log4j properties.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1320) Custom log4j properties does not work properly.


[ 
https://issues.apache.org/jira/browse/YARN-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799501#comment-13799501
 ] 

Hadoop QA commented on YARN-1320:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609201/YARN-1320.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build///testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build///console

This message is automatically generated.

 Custom log4j properties does not work properly.
 ---

 Key: YARN-1320
 URL: https://issues.apache.org/jira/browse/YARN-1320
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.2.1

 Attachments: YARN-1320.1.patch, YARN-1320.2.patch, YARN-1320.3.patch


 Distributed shell cannot pick up custom log4j properties (specified with 
 -log_properties). It always uses default log4j properties.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing


[ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799504#comment-13799504
 ] 

Sandy Ryza commented on YARN-1222:
--

{code}
   LOG.error(Error in storing master key with KeyID:  + 
newKey.getKeyId());
+  LOG.error(Exception stack trace, e);
{code}
Why not put the exception in the first LOG.error?

 Make improvements in ZKRMStateStore for fencing
 ---

 Key: YARN-1222
 URL: https://issues.apache.org/jira/browse/YARN-1222
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: yarn-1222-1.patch, yarn-1222-2.patch


 Using multi-operations for every ZK interaction. 
 In every operation, automatically creating/deleting a lock znode that is the 
 child of the root znode. This is to achieve fencing by modifying the 
 create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1121) RMStateStore should flush all pending store events before closing


 [ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1121:
-

Fix Version/s: (was: 2.2.0)
   2.2.1

 RMStateStore should flush all pending store events before closing
 -

 Key: YARN-1121
 URL: https://issues.apache.org/jira/browse/YARN-1121
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
 Fix For: 2.2.1


 on serviceStop it should wait for all internal pending events to drain before 
 stopping.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-740) Document the YARN service lifecycle development


 [ 
https://issues.apache.org/jira/browse/YARN-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-740:


Affects Version/s: (was: 2.0.4-alpha)
   2.2.0

 Document the YARN service lifecycle  development
 -

 Key: YARN-740
 URL: https://issues.apache.org/jira/browse/YARN-740
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 2.2.0
Reporter: Steve Loughran
Assignee: Steve Loughran
   Original Estimate: 4h
  Remaining Estimate: 4h

 Once the API is stable, document how to write YARN services.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1053) Diagnostic message from ContainerExitEvent is ignored in ContainerImpl


 [ 
https://issues.apache.org/jira/browse/YARN-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1053:
-

Fix Version/s: (was: 2.2.0)
   2.2.1

 Diagnostic message from ContainerExitEvent is ignored in ContainerImpl
 --

 Key: YARN-1053
 URL: https://issues.apache.org/jira/browse/YARN-1053
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
  Labels: newbie
 Fix For: 2.3.0, 2.2.1

 Attachments: YARN-1053.20130809.patch


 If the container launch fails then we send ContainerExitEvent. This event 
 contains exitCode and diagnostic message. Today we are ignoring diagnostic 
 message while handling this event inside ContainerImpl. Fixing it as it is 
 useful in diagnosing the failure.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1158) ResourceManager UI has application stdout missing if application stdout is not in the same directory as AppMaster stdout


 [ 
https://issues.apache.org/jira/browse/YARN-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1158:
-

Fix Version/s: (was: 2.2.0)
   2.2.1

 ResourceManager UI has application stdout missing if application stdout is 
 not in the same directory as AppMaster stdout
 

 Key: YARN-1158
 URL: https://issues.apache.org/jira/browse/YARN-1158
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Tassapol Athiapinya
 Fix For: 2.2.1


 Configure yarn-site.xml's yarn.nodemanager.local-dirs to multiple 
 directories. Turn on log aggregation. Run distributed shell application. If 
 an application writes AppMaster.stdout in one directory and stdout in another 
 directory. Goto ResourceManager web UI. Open up container logs. Only 
 AppMaster.stdout would appear.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1022) Unnecessary INFO logs in AMRMClientAsync


 [ 
https://issues.apache.org/jira/browse/YARN-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1022:
-

Fix Version/s: (was: 2.2.0)
   2.2.1

 Unnecessary INFO logs in AMRMClientAsync
 

 Key: YARN-1022
 URL: https://issues.apache.org/jira/browse/YARN-1022
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Priority: Minor
  Labels: newbie
 Fix For: 2.2.1


 Logs like the following should be debug or else every legitimate stop causes 
 unnecessary exception traces in the logs.
 464 2013-08-03 20:01:34,459 INFO [AMRM Heartbeater thread] 
 org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl:
 Heartbeater interrupted
 465 java.lang.InterruptedException: sleep interrupted
 466   at java.lang.Thread.sleep(Native Method)
 467   at 
 org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:249)
 468 2013-08-03 20:01:34,460 INFO [AMRM Callback Handler Thread] 
 org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl:   
 Interrupted while waiting for queue
 469 java.lang.InterruptedException
 470   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.
  java:1961)
 471   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1996)
 472   at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 473   at 
 org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:275)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1142) MiniYARNCluster web ui does not work properly


 [ 
https://issues.apache.org/jira/browse/YARN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1142:
-

Fix Version/s: (was: 2.2.0)
   2.2.1

 MiniYARNCluster web ui does not work properly
 -

 Key: YARN-1142
 URL: https://issues.apache.org/jira/browse/YARN-1142
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
 Fix For: 2.2.1


 When going to the RM http port, the NM web ui is displayed. It seems there is 
 a singleton somewhere that breaks things when RM  NMs run in the same 
 process.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1234) Container localizer logs are not created in secured cluster


 [ 
https://issues.apache.org/jira/browse/YARN-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1234:
-

Fix Version/s: (was: 2.2.0)
   2.2.1

  Container localizer logs are not created in secured cluster
 

 Key: YARN-1234
 URL: https://issues.apache.org/jira/browse/YARN-1234
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Fix For: 2.2.1


 When we are running ContainerLocalizer in secured cluster we potentially are 
 not creating any log file to track log messages. This will be helpful in 
 potentially identifying ContainerLocalization issues in secured cluster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1053) Diagnostic message from ContainerExitEvent is ignored in ContainerImpl

2013-10-18 Thread Omkar Vinit Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1053:


Priority: Blocker  (was: Major)

 Diagnostic message from ContainerExitEvent is ignored in ContainerImpl
 --

 Key: YARN-1053
 URL: https://issues.apache.org/jira/browse/YARN-1053
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
Priority: Blocker
  Labels: newbie
 Fix For: 2.3.0, 2.2.1

 Attachments: YARN-1053.20130809.patch


 If the container launch fails then we send ContainerExitEvent. This event 
 contains exitCode and diagnostic message. Today we are ignoring diagnostic 
 message while handling this event inside ContainerImpl. Fixing it as it is 
 useful in diagnosing the failure.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-891) Store completed application information in RM state store

2013-10-18 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-891:
-

Attachment: YARN-891.1.patch

New patch created a new RMAppRecoveredTransition for recover flow and get rid 
of the isFinalSavingRequestSent flag

 Store completed application information in RM state store
 -

 Key: YARN-891
 URL: https://issues.apache.org/jira/browse/YARN-891
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-891.1.patch, YARN-891.patch, YARN-891.patch, 
 YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch


 Add information like exit status etc for the completed attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1305) RMHAProtocolService#serviceInit should handle HAUtil's IllegalArgumentException

2013-10-18 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799549#comment-13799549
 ] 

Karthik Kambatla commented on YARN-1305:


bq. I did not see a test that verifies that more than 1 RM id must be specified 
in RM-HA-IDs?
I don't think this needs to be a requirement. An empty value for RM-HA-IDs is a 
problem but having 1 RM id is not. We can may be warn the user, but continue to 
run.

 RMHAProtocolService#serviceInit should handle HAUtil's 
 IllegalArgumentException
 ---

 Key: YARN-1305
 URL: https://issues.apache.org/jira/browse/YARN-1305
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
  Labels: ha
 Attachments: YARN-1305.1.patch, YARN-1305.2.patch, YARN-1305.3.patch, 
 YARN-1305.4.patch


 When yarn.resourcemanager.ha.enabled is true, RMHAProtocolService#serviceInit 
 calls HAUtil.setAllRpcAddresses. If the configuration values are null, it 
 just throws IllegalArgumentException.
 It's messy to analyse which keys are null, so we should handle it and log the 
 name of keys which are null.
 A current log dump is as follows:
 {code}
 2013-10-15 06:24:53,431 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered 
 UNIX signal handlers for [TERM, HUP, INT]
 2013-10-15 06:24:54,203 INFO org.apache.hadoop.service.AbstractService: 
 Service RMHAProtocolService failed in state INITED; cause: 
 java.lang.IllegalArgumentException: Property value must not be null
 java.lang.IllegalArgumentException: Property value must not be null
 at 
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:816)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:798)
 at org.apache.hadoop.yarn.conf.HAUtil.setConfValue(HAUtil.java:100)
 at 
 org.apache.hadoop.yarn.conf.HAUtil.setAllRpcAddresses(HAUtil.java:105)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService.serviceInit(RMHAProtocolService.java:60)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:940)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-891) Store completed application information in RM state store


[ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799554#comment-13799554
 ] 

Hadoop QA commented on YARN-891:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609212/YARN-891.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2223//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2223//console

This message is automatically generated.

 Store completed application information in RM state store
 -

 Key: YARN-891
 URL: https://issues.apache.org/jira/browse/YARN-891
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-891.1.patch, YARN-891.patch, YARN-891.patch, 
 YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch


 Add information like exit status etc for the completed attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1139) [Umbrella] Convert all RM components to Services


[ 
https://issues.apache.org/jira/browse/YARN-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799553#comment-13799553
 ] 

Tsuyoshi OZAWA commented on YARN-1139:
--

[~ste...@apache.org], could you also check YARN-1305 and review a patch? The 
JIRA is subtask of this JIRA.

 [Umbrella] Convert all RM components to Services
 

 Key: YARN-1139
 URL: https://issues.apache.org/jira/browse/YARN-1139
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA

 Some of the RM components - state store, scheduler etc. are not services. 
 Converting them to services goes well with the Always On and Active 
 service separation proposed on YARN-1098.
 Given that some of them already have start(), stop() methods, it should not 
 be too hard to convert them to services.
 That would also be a cleaner way of addressing YARN-1125.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1323) Set HTTPS webapp address along with other RPC addresses

2013-10-18 Thread Karthik Kambatla (JIRA)

Karthik Kambatla created YARN-1323:
--

 Summary: Set HTTPS webapp address along with other RPC addresses
 Key: YARN-1323
 URL: https://issues.apache.org/jira/browse/YARN-1323
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


YARN-1232 adds the ability to configure multiple RMs, but missed out the https 
web app address. Need to add that in.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1172) Convert *SecretManagers in the RM to services


[ 
https://issues.apache.org/jira/browse/YARN-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799563#comment-13799563
 ] 

Tsuyoshi OZAWA commented on YARN-1172:
--

[~kkambatl], do you have feedbacks about this JIRA?

 Convert *SecretManagers in the RM to services
 -

 Key: YARN-1172
 URL: https://issues.apache.org/jira/browse/YARN-1172
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1172.1.patch, YARN-1172.2.patch, YARN-1172.3.patch, 
 YARN-1172.4.patch, YARN-1172.5.patch, YARN-1172.6.patch, YARN-1172.7.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1324) NodeManager should assign 1 local directory directory to a container

Bikas Saha created YARN-1324:


 Summary: NodeManager should assign 1 local directory directory to 
a container
 Key: YARN-1324
 URL: https://issues.apache.org/jira/browse/YARN-1324
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Bikas Saha


Currently, for every container, the NM creates a directory on every disk and 
expects the container-task to choose 1 of them and load balance the use of the 
disks across all containers. 
1) This may have worked fine in the MR world where MR tasks would randomly 
choose dirs but in general we cannot expect every app/task writer to understand 
these nuances and randomly pick disks. So we could end up overloading the first 
disk if most people decide to use the first disk.
2) This makes a number of NM operations to scan every disk (thus randomizing 
that disk) to locate the dir which the task has actually chosen to use for its 
files. Makes all these operations expensive for the NM as well as disruptive 
for users of disks that did not have the real task working dirs.

I propose that NM should up-front decide the disk it is assigning to tasks. It 
could choose to do so randomly or weighted-randomly by looking at space and 
load on each disk. So it could do a better job of load balancing. Then, it 
would associate the chosen working directory with the container context so that 
subsequent operations on the NM can directly seek to the correct location 
instead of having to seek on every disk.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1305) RMHAProtocolService#serviceInit should handle HAUtil's IllegalArgumentException


[ 
https://issues.apache.org/jira/browse/YARN-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799638#comment-13799638
 ] 

Tsuyoshi OZAWA commented on YARN-1305:
--

After reading Karthik's comment, I was thinking use cases when we enable RM HA 
configuration without multiple RM ids. It's useful in following cases: 
1. Developing.
2. Testing. 
3. Manual failover(?)
Therefore, we should support it IMO.

I came up with another idea to support strict mode to stop RM with a wrong 
configuration when RM startup as Bikas mentioned. It's useful to detect wrong 
operations. However, it's not time to do this IMO, because we're still 
developing RM HA now. After getting stable, we should support the strict 
mode. Thoughts?

 RMHAProtocolService#serviceInit should handle HAUtil's 
 IllegalArgumentException
 ---

 Key: YARN-1305
 URL: https://issues.apache.org/jira/browse/YARN-1305
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
  Labels: ha
 Attachments: YARN-1305.1.patch, YARN-1305.2.patch, YARN-1305.3.patch, 
 YARN-1305.4.patch


 When yarn.resourcemanager.ha.enabled is true, RMHAProtocolService#serviceInit 
 calls HAUtil.setAllRpcAddresses. If the configuration values are null, it 
 just throws IllegalArgumentException.
 It's messy to analyse which keys are null, so we should handle it and log the 
 name of keys which are null.
 A current log dump is as follows:
 {code}
 2013-10-15 06:24:53,431 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered 
 UNIX signal handlers for [TERM, HUP, INT]
 2013-10-15 06:24:54,203 INFO org.apache.hadoop.service.AbstractService: 
 Service RMHAProtocolService failed in state INITED; cause: 
 java.lang.IllegalArgumentException: Property value must not be null
 java.lang.IllegalArgumentException: Property value must not be null
 at 
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:816)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:798)
 at org.apache.hadoop.yarn.conf.HAUtil.setConfValue(HAUtil.java:100)
 at 
 org.apache.hadoop.yarn.conf.HAUtil.setAllRpcAddresses(HAUtil.java:105)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService.serviceInit(RMHAProtocolService.java:60)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:940)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1305) RMHAProtocolService#serviceInit should handle HAUtil's IllegalArgumentException


[ 
https://issues.apache.org/jira/browse/YARN-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799643#comment-13799643
 ] 

Bikas Saha commented on YARN-1305:
--

Sure. we can add the multiple RM's check later on. please open a sub-task for 
it.

 RMHAProtocolService#serviceInit should handle HAUtil's 
 IllegalArgumentException
 ---

 Key: YARN-1305
 URL: https://issues.apache.org/jira/browse/YARN-1305
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
  Labels: ha
 Attachments: YARN-1305.1.patch, YARN-1305.2.patch, YARN-1305.3.patch, 
 YARN-1305.4.patch


 When yarn.resourcemanager.ha.enabled is true, RMHAProtocolService#serviceInit 
 calls HAUtil.setAllRpcAddresses. If the configuration values are null, it 
 just throws IllegalArgumentException.
 It's messy to analyse which keys are null, so we should handle it and log the 
 name of keys which are null.
 A current log dump is as follows:
 {code}
 2013-10-15 06:24:53,431 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered 
 UNIX signal handlers for [TERM, HUP, INT]
 2013-10-15 06:24:54,203 INFO org.apache.hadoop.service.AbstractService: 
 Service RMHAProtocolService failed in state INITED; cause: 
 java.lang.IllegalArgumentException: Property value must not be null
 java.lang.IllegalArgumentException: Property value must not be null
 at 
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:816)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:798)
 at org.apache.hadoop.yarn.conf.HAUtil.setConfValue(HAUtil.java:100)
 at 
 org.apache.hadoop.yarn.conf.HAUtil.setAllRpcAddresses(HAUtil.java:105)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService.serviceInit(RMHAProtocolService.java:60)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:940)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1305) RMHAProtocolService#serviceInit should handle HAUtil's IllegalArgumentException


[ 
https://issues.apache.org/jira/browse/YARN-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799655#comment-13799655
 ] 

Tsuyoshi OZAWA commented on YARN-1305:
--

Filed YARN-1325 for the multiple RM's check.

 RMHAProtocolService#serviceInit should handle HAUtil's 
 IllegalArgumentException
 ---

 Key: YARN-1305
 URL: https://issues.apache.org/jira/browse/YARN-1305
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
  Labels: ha
 Attachments: YARN-1305.1.patch, YARN-1305.2.patch, YARN-1305.3.patch, 
 YARN-1305.4.patch


 When yarn.resourcemanager.ha.enabled is true, RMHAProtocolService#serviceInit 
 calls HAUtil.setAllRpcAddresses. If the configuration values are null, it 
 just throws IllegalArgumentException.
 It's messy to analyse which keys are null, so we should handle it and log the 
 name of keys which are null.
 A current log dump is as follows:
 {code}
 2013-10-15 06:24:53,431 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered 
 UNIX signal handlers for [TERM, HUP, INT]
 2013-10-15 06:24:54,203 INFO org.apache.hadoop.service.AbstractService: 
 Service RMHAProtocolService failed in state INITED; cause: 
 java.lang.IllegalArgumentException: Property value must not be null
 java.lang.IllegalArgumentException: Property value must not be null
 at 
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:816)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:798)
 at org.apache.hadoop.yarn.conf.HAUtil.setConfValue(HAUtil.java:100)
 at 
 org.apache.hadoop.yarn.conf.HAUtil.setAllRpcAddresses(HAUtil.java:105)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService.serviceInit(RMHAProtocolService.java:60)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:940)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1288) Make Fair Scheduler ACLs more user friendly


 [ 
https://issues.apache.org/jira/browse/YARN-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1288:
-

Attachment: YARN-1288-3.patch

 Make Fair Scheduler ACLs more user friendly
 ---

 Key: YARN-1288
 URL: https://issues.apache.org/jira/browse/YARN-1288
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1288-1.patch, YARN-1288-2.patch, YARN-1288-3.patch, 
 YARN-1288.patch


 The Fair Scheduler currently defaults the root queue's acl to empty and all 
 other queues' acl to *.  Now that YARN-1258 enables configuring the root 
 queue, we should reverse this.  This will also bring the Fair Scheduler in 
 line with the Capacity Scheduler.
 We should also not trim the acl strings, which makes it impossible to only 
 specify groups in an acl.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1324) NodeManager should assign 1 local directory directory to a container

[
https://issues.apache.org/jira/browse/YARN-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799667#comment-13799667
]

Vinod Kumar Vavilapalli commented on YARN-1324:
---

It wasn't done like that with MR world only in mind. Even outside MR, many apps
want to write data in parallel and want to take advantage of multiple disks. We
cannot make NM to decide one disk because of that.

Apps/containers that don't care about load-balancing or multiple disks can
chose to always write to the first disk and NM will eventually load balance
them.

To have true load-balancing all the time (and not just post container finish),
YARN needs cooperative containers. And the better solution for that is to make
apps ask the number of disks to write when they launch containers. That way
YARN isn't overriding users intention to use/not use multiple disks.

The title should be changed with problem description (and not the solution).

NodeManager should assign 1 local directory directory to a container

Key: YARN-1324
URL: https://issues.apache.org/jira/browse/YARN-1324
Project: Hadoop YARN
Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Bikas Saha

Currently, for every container, the NM creates a directory on every disk and
expects the container-task to choose 1 of them and load balance the use of
the disks across all containers.
1) This may have worked fine in the MR world where MR tasks would randomly
choose dirs but in general we cannot expect every app/task writer to
understand these nuances and randomly pick disks. So we could end up
overloading the first disk if most people decide to use the first disk.
2) This makes a number of NM operations to scan every disk (thus randomizing
that disk) to locate the dir which the task has actually chosen to use for
its files. Makes all these operations expensive for the NM as well as
disruptive for users of disks that did not have the real task working dirs.
I propose that NM should up-front decide the disk it is assigning to tasks.
It could choose to do so randomly or weighted-randomly by looking at space
and load on each disk. So it could do a better job of load balancing. Then,
it would associate the chosen working directory with the container context so
that subsequent operations on the NM can directly seek to the correct
location instead of having to seek on every disk.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-884) AM expiry interval should be set to smaller of {am, nm}.liveness-monitor.expiry-interval-ms


[ 
https://issues.apache.org/jira/browse/YARN-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799675#comment-13799675
 ] 

Vinod Kumar Vavilapalli commented on YARN-884:
--

bq. Vinod Kumar Vavilapalli, partly agree with you that they are two different 
knobs. However, at least in the current implementation, restarting an NM cleans 
up all the containers on it (correct me if I am wrong) including the AM. In 
that scenario, having a higher value for AM_EXPIRY will only delay starting the 
AM. No?
That is just a temporary artifact of us not having work-preserving restart. 
That shouldn't change our meaning of long term configuration properties.

 AM expiry interval should be set to smaller of {am, 
 nm}.liveness-monitor.expiry-interval-ms
 ---

 Key: YARN-884
 URL: https://issues.apache.org/jira/browse/YARN-884
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: configuration
 Attachments: yarn-884-1.patch


 As the AM can't outlive the NM on which it is running, it is a good idea to 
 disallow setting the am.liveness-monitor.expiry-interval-ms to a value higher 
 than nm.liveness-monitor.expiry-interval-ms



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1288) Make Fair Scheduler ACLs more user friendly


[ 
https://issues.apache.org/jira/browse/YARN-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799680#comment-13799680
 ] 

Hadoop QA commented on YARN-1288:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609237/YARN-1288-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2224//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2224//console

This message is automatically generated.

 Make Fair Scheduler ACLs more user friendly
 ---

 Key: YARN-1288
 URL: https://issues.apache.org/jira/browse/YARN-1288
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1288-1.patch, YARN-1288-2.patch, YARN-1288-3.patch, 
 YARN-1288.patch


 The Fair Scheduler currently defaults the root queue's acl to empty and all 
 other queues' acl to *.  Now that YARN-1258 enables configuring the root 
 queue, we should reverse this.  This will also bring the Fair Scheduler in 
 line with the Capacity Scheduler.
 We should also not trim the acl strings, which makes it impossible to only 
 specify groups in an acl.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1305) RMHAProtocolService#serviceInit should handle HAUtil's IllegalArgumentException


 [ 
https://issues.apache.org/jira/browse/YARN-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1305:
-

Attachment: YARN-1305.5.patch

Updated patches based on Bikas' and Karthik's comments.

* Moved all verifications to verifyConfiguration, and call it in 
RMHAProtocolService#serviceInit().
* Created verifyRMHAIds()/verifyRMHAId()/verifyAllRpcAddresses() methods. They 
verify configuration values and log verification error.
* For now, a configuration contains only one RM-IDs, log it as warning as 
Karthik described.

Log format is same at the last patch. 

Additionally, a case a configuration contains only one RM-IDs:
{quote}
2013-10-19 00:18:29,698 WARN org.apache.hadoop.yarn.conf.HAUtil: Resource 
Manager HA is enabled, but yarn.resourcemanager.ha.rm-ids has only one id([rm1])
{quote}

 RMHAProtocolService#serviceInit should handle HAUtil's 
 IllegalArgumentException
 ---

 Key: YARN-1305
 URL: https://issues.apache.org/jira/browse/YARN-1305
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
  Labels: ha
 Attachments: YARN-1305.1.patch, YARN-1305.2.patch, YARN-1305.3.patch, 
 YARN-1305.4.patch, YARN-1305.5.patch


 When yarn.resourcemanager.ha.enabled is true, RMHAProtocolService#serviceInit 
 calls HAUtil.setAllRpcAddresses. If the configuration values are null, it 
 just throws IllegalArgumentException.
 It's messy to analyse which keys are null, so we should handle it and log the 
 name of keys which are null.
 A current log dump is as follows:
 {code}
 2013-10-15 06:24:53,431 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered 
 UNIX signal handlers for [TERM, HUP, INT]
 2013-10-15 06:24:54,203 INFO org.apache.hadoop.service.AbstractService: 
 Service RMHAProtocolService failed in state INITED; cause: 
 java.lang.IllegalArgumentException: Property value must not be null
 java.lang.IllegalArgumentException: Property value must not be null
 at 
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:816)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:798)
 at org.apache.hadoop.yarn.conf.HAUtil.setConfValue(HAUtil.java:100)
 at 
 org.apache.hadoop.yarn.conf.HAUtil.setAllRpcAddresses(HAUtil.java:105)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService.serviceInit(RMHAProtocolService.java:60)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:940)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1324) NodeManager should assign 1 local directory directory to a container

[
https://issues.apache.org/jira/browse/YARN-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799698#comment-13799698
]

Bikas Saha commented on YARN-1324:
--

Are there current applications that want to write in parallel to multiple local
disks? If not, then we should probably figure out how to support them well when
they show up. In the meanwhile, we could look at the above mentioned drawbacks
and decide whether the they are worth fixing or not, either by restricting
solution above or some other solution. Are the above drawbacks worthwhile
issues? If yes, are there alternative proposals for a solution?

NodeManager should assign 1 local directory directory to a container

Key: YARN-1324
URL: https://issues.apache.org/jira/browse/YARN-1324
Project: Hadoop YARN
Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Bikas Saha

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1324) NodeManager potentially causes unnecessary operations on all its disks


 [ 
https://issues.apache.org/jira/browse/YARN-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-1324:
-

Summary: NodeManager potentially causes unnecessary operations on all its 
disks  (was: NodeManager should assign 1 local directory directory to a 
container)

 NodeManager potentially causes unnecessary operations on all its disks
 --

 Key: YARN-1324
 URL: https://issues.apache.org/jira/browse/YARN-1324
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Bikas Saha

 Currently, for every container, the NM creates a directory on every disk and 
 expects the container-task to choose 1 of them and load balance the use of 
 the disks across all containers. 
 1) This may have worked fine in the MR world where MR tasks would randomly 
 choose dirs but in general we cannot expect every app/task writer to 
 understand these nuances and randomly pick disks. So we could end up 
 overloading the first disk if most people decide to use the first disk.
 2) This makes a number of NM operations to scan every disk (thus randomizing 
 that disk) to locate the dir which the task has actually chosen to use for 
 its files. Makes all these operations expensive for the NM as well as 
 disruptive for users of disks that did not have the real task working dirs.
 I propose that NM should up-front decide the disk it is assigning to tasks. 
 It could choose to do so randomly or weighted-randomly by looking at space 
 and load on each disk. So it could do a better job of load balancing. Then, 
 it would associate the chosen working directory with the container context so 
 that subsequent operations on the NM can directly seek to the correct 
 location instead of having to seek on every disk.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery


[ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799707#comment-13799707
 ] 

Vinod Kumar Vavilapalli commented on YARN-1185:
---

Patch looks good to me. Can you address the test-issue?

 FileSystemRMStateStore can leave partial files that prevent subsequent 
 recovery
 ---

 Key: YARN-1185
 URL: https://issues.apache.org/jira/browse/YARN-1185
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1185.1.patch, YARN-1185.2.patch


 FileSystemRMStateStore writes directly to the destination file when storing 
 state. However if the RM were to crash in the middle of the write, the 
 recovery method could encounter a partially-written file and either outright 
 crash during recovery or silently load incomplete state.
 To avoid this, the data should be written to a temporary file and renamed to 
 the destination file afterwards.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1305) RMHAProtocolService#serviceInit should handle HAUtil's IllegalArgumentException


[ 
https://issues.apache.org/jira/browse/YARN-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799713#comment-13799713
 ] 

Hadoop QA commented on YARN-1305:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609241/YARN-1305.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.conf.TestHAUtil

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2225//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2225//console

This message is automatically generated.

 RMHAProtocolService#serviceInit should handle HAUtil's 
 IllegalArgumentException
 ---

 Key: YARN-1305
 URL: https://issues.apache.org/jira/browse/YARN-1305
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
  Labels: ha
 Attachments: YARN-1305.1.patch, YARN-1305.2.patch, YARN-1305.3.patch, 
 YARN-1305.4.patch, YARN-1305.5.patch


 When yarn.resourcemanager.ha.enabled is true, RMHAProtocolService#serviceInit 
 calls HAUtil.setAllRpcAddresses. If the configuration values are null, it 
 just throws IllegalArgumentException.
 It's messy to analyse which keys are null, so we should handle it and log the 
 name of keys which are null.
 A current log dump is as follows:
 {code}
 2013-10-15 06:24:53,431 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered 
 UNIX signal handlers for [TERM, HUP, INT]
 2013-10-15 06:24:54,203 INFO org.apache.hadoop.service.AbstractService: 
 Service RMHAProtocolService failed in state INITED; cause: 
 java.lang.IllegalArgumentException: Property value must not be null
 java.lang.IllegalArgumentException: Property value must not be null
 at 
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:816)
 at org.apache.hadoop.conf.Configuration.set(Configuration.java:798)
 at org.apache.hadoop.yarn.conf.HAUtil.setConfValue(HAUtil.java:100)
 at 
 org.apache.hadoop.yarn.conf.HAUtil.setAllRpcAddresses(HAUtil.java:105)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService.serviceInit(RMHAProtocolService.java:60)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:940)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1185) FileSystemRMStateStore can leave partial files that prevent subsequent recovery

2013-10-18 Thread Omkar Vinit Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1185:


Attachment: YARN-1185.3.patch

 FileSystemRMStateStore can leave partial files that prevent subsequent 
 recovery
 ---

 Key: YARN-1185
 URL: https://issues.apache.org/jira/browse/YARN-1185
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1185.1.patch, YARN-1185.2.patch, YARN-1185.3.patch


 FileSystemRMStateStore writes directly to the destination file when storing 
 state. However if the RM were to crash in the middle of the write, the 
 recovery method could encounter a partially-written file and either outright 
 crash during recovery or silently load incomplete state.
 To avoid this, the data should be written to a temporary file and renamed to 
 the destination file afterwards.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1326) RM should log using RMStore at startup time


 [ 
https://issues.apache.org/jira/browse/YARN-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1326:
-

Attachment: YARN-1326.1.patch

This patch enables RM to log using RMStore as follows.
{code}
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore is used 
for ResourceManager HA
{code}

 RM should log using RMStore at startup time
 ---

 Key: YARN-1326
 URL: https://issues.apache.org/jira/browse/YARN-1326
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1326.1.patch

   Original Estimate: 3h
  Remaining Estimate: 3h

 Currently there are no way to know which RMStore RM uses. It's useful to log 
 the information at RM's startup time.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1321) NMTokenCache should not be a singleton

2013-10-18 Thread Omkar Vinit Joshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799736#comment-13799736
 ] 

Omkar Vinit Joshi commented on YARN-1321:
-

why are you in fact running multiple AM's inside a same JVM? as per YARN we can 
never have multiple AM's per JVM per process. Definitely not a blocker. Please 
explain the use case for running multiple AMs inside same process? If you 
really want to run it that way ..Why not just update the NMTokenCache but 
default to single AM case but still I don't see why you are doing this?

 NMTokenCache should not be a singleton
 --

 Key: YARN-1321
 URL: https://issues.apache.org/jira/browse/YARN-1321
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.2.1


 NMTokenCache is a singleton. Because of this, if running multiple AMs in a 
 single JVM NMTokens for the same node from different AMs step on each other 
 and starting containers fail due to mismatch tokens.
 The error observed in the client side is something like:
 {code}
 ERROR org.apache.hadoop.security.UserGroupInformation: 
 PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) 
 cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request 
 to start container. 
 NMToken for application attempt : appattempt_1382038445650_0002_01 was 
 used for starting container with container token issued for application 
 attempt : appattempt_1382038445650_0001_01
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1326) RM should log using RMStore at startup time


[ 
https://issues.apache.org/jira/browse/YARN-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799734#comment-13799734
 ] 

Hadoop QA commented on YARN-1326:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609248/YARN-1326.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2227//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2227//console

This message is automatically generated.

 RM should log using RMStore at startup time
 ---

 Key: YARN-1326
 URL: https://issues.apache.org/jira/browse/YARN-1326
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1326.1.patch

   Original Estimate: 3h
  Remaining Estimate: 3h

 Currently there are no way to know which RMStore RM uses. It's useful to log 
 the information at RM's startup time.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1321) NMTokenCache should not be a singleton


[ 
https://issues.apache.org/jira/browse/YARN-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799735#comment-13799735
 ] 

Vinod Kumar Vavilapalli commented on YARN-1321:
---

Why is this a blocker? Don't think it is, multiple AMs in a JVM wasn't 
supported in a first class way - I'm sure you'll find more issues here. 

Also, please edit the title with the problem statement instead of the solution.

Now as to more details: Don't know the internal details, so is llama running 
with multiple AMs one after another or in parallel? And is the context an 
unmanaged AM?

 NMTokenCache should not be a singleton
 --

 Key: YARN-1321
 URL: https://issues.apache.org/jira/browse/YARN-1321
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.2.1


 NMTokenCache is a singleton. Because of this, if running multiple AMs in a 
 single JVM NMTokens for the same node from different AMs step on each other 
 and starting containers fail due to mismatch tokens.
 The error observed in the client side is something like:
 {code}
 ERROR org.apache.hadoop.security.UserGroupInformation: 
 PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) 
 cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request 
 to start container. 
 NMToken for application attempt : appattempt_1382038445650_0002_01 was 
 used for starting container with container token issued for application 
 attempt : appattempt_1382038445650_0001_01
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1321) NMTokenCache is a a singleton, prevents multiple AMs running in a single JVM to work correctly.


 [ 
https://issues.apache.org/jira/browse/YARN-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-1321:
-

Summary: NMTokenCache is a a singleton, prevents multiple AMs running in a 
single JVM to work correctly.  (was: NMTokenCache should not be a singleton)

 NMTokenCache is a a singleton, prevents multiple AMs running in a single JVM 
 to work correctly.
 ---

 Key: YARN-1321
 URL: https://issues.apache.org/jira/browse/YARN-1321
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.2.1


 NMTokenCache is a singleton. Because of this, if running multiple AMs in a 
 single JVM NMTokens for the same node from different AMs step on each other 
 and starting containers fail due to mismatch tokens.
 The error observed in the client side is something like:
 {code}
 ERROR org.apache.hadoop.security.UserGroupInformation: 
 PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) 
 cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request 
 to start container. 
 NMToken for application attempt : appattempt_1382038445650_0002_01 was 
 used for starting container with container token issued for application 
 attempt : appattempt_1382038445650_0001_01
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1321) NMTokenCache is a a singleton, prevents multiple AMs running in a single JVM to work correctly.

[
https://issues.apache.org/jira/browse/YARN-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799747#comment-13799747
]

Alejandro Abdelnur commented on YARN-1321:
--

We run into this issue in Llama. Llama is a single JVM hosting multiple
unmanaged ApplicationMasters that run at the same time (in parallel). Because
NMTokenCache is a singleton NMTokens for the same node from the different AMs
step on each other.

The patch that I'm working preserves the current behavior (singleton
NMTokenCache) while allowing a client to set a NMTokenCache instance to the
AMRMClient/NMClient (and Async versions). If an instance is set, then the
NMTokens are stored in it instead of the singleton. This preserves backward
compatibility both in behavior and in API.

NMTokenCache is a a singleton, prevents multiple AMs running in a single JVM
to work correctly.
---

Key: YARN-1321
URL: https://issues.apache.org/jira/browse/YARN-1321
Project: Hadoop YARN
Issue Type: Bug
Components: client
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Blocker
Fix For: 2.2.1

NMTokenCache is a singleton. Because of this, if running multiple AMs in a
single JVM NMTokens for the same node from different AMs step on each other
and starting containers fail due to mismatch tokens.
The error observed in the client side is something like:
{code}
ERROR org.apache.hadoop.security.UserGroupInformation:
PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE)
cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request
to start container.
NMToken for application attempt : appattempt_1382038445650_0002_01 was
used for starting container with container token issued for application
attempt : appattempt_1382038445650_0001_01
{code}

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1321) NMTokenCache is a a singleton, prevents multiple AMs running in a single JVM to work correctly.


 [ 
https://issues.apache.org/jira/browse/YARN-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-1321:
-

Attachment: YARN-1321.patch

Attached a patch with the proposed solution.

So far this is the only issue we've run while using multiple AMs in a single 
JVM.

 NMTokenCache is a a singleton, prevents multiple AMs running in a single JVM 
 to work correctly.
 ---

 Key: YARN-1321
 URL: https://issues.apache.org/jira/browse/YARN-1321
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.2.1

 Attachments: YARN-1321.patch


 NMTokenCache is a singleton. Because of this, if running multiple AMs in a 
 single JVM NMTokens for the same node from different AMs step on each other 
 and starting containers fail due to mismatch tokens.
 The error observed in the client side is something like:
 {code}
 ERROR org.apache.hadoop.security.UserGroupInformation: 
 PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) 
 cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request 
 to start container. 
 NMToken for application attempt : appattempt_1382038445650_0002_01 was 
 used for starting container with container token issued for application 
 attempt : appattempt_1382038445650_0001_01
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1321) NMTokenCache is a a singleton, prevents multiple AMs running in a single JVM to work correctly


 [ 
https://issues.apache.org/jira/browse/YARN-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-1321:
-

Summary: NMTokenCache is a a singleton, prevents multiple AMs running in a 
single JVM to work correctly  (was: NMTokenCache is a a singleton, prevents 
multiple AMs running in a single JVM to work correctly.)

 NMTokenCache is a a singleton, prevents multiple AMs running in a single JVM 
 to work correctly
 --

 Key: YARN-1321
 URL: https://issues.apache.org/jira/browse/YARN-1321
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.2.1

 Attachments: YARN-1321.patch


 NMTokenCache is a singleton. Because of this, if running multiple AMs in a 
 single JVM NMTokens for the same node from different AMs step on each other 
 and starting containers fail due to mismatch tokens.
 The error observed in the client side is something like:
 {code}
 ERROR org.apache.hadoop.security.UserGroupInformation: 
 PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) 
 cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request 
 to start container. 
 NMToken for application attempt : appattempt_1382038445650_0002_01 was 
 used for starting container with container token issued for application 
 attempt : appattempt_1382038445650_0001_01
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1326) RM should log using RMStore at startup time


[ 
https://issues.apache.org/jira/browse/YARN-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799753#comment-13799753
 ] 

Tsuyoshi OZAWA commented on YARN-1326:
--

This patch just adds to log, so no additional tests are needed.

 RM should log using RMStore at startup time
 ---

 Key: YARN-1326
 URL: https://issues.apache.org/jira/browse/YARN-1326
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1326.1.patch

   Original Estimate: 3h
  Remaining Estimate: 3h

 Currently there are no way to know which RMStore RM uses. It's useful to log 
 the information at RM's startup time.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1321) NMTokenCache is a a singleton, prevents multiple AMs running in a single JVM to work correctly


[ 
https://issues.apache.org/jira/browse/YARN-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799770#comment-13799770
 ] 

Hadoop QA commented on YARN-1321:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12609251/YARN-1321.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2228//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2228//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2228//console

This message is automatically generated.

 NMTokenCache is a a singleton, prevents multiple AMs running in a single JVM 
 to work correctly
 --

 Key: YARN-1321
 URL: https://issues.apache.org/jira/browse/YARN-1321
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.2.1

 Attachments: YARN-1321.patch


 NMTokenCache is a singleton. Because of this, if running multiple AMs in a 
 single JVM NMTokens for the same node from different AMs step on each other 
 and starting containers fail due to mismatch tokens.
 The error observed in the client side is something like:
 {code}
 ERROR org.apache.hadoop.security.UserGroupInformation: 
 PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) 
 cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request 
 to start container. 
 NMToken for application attempt : appattempt_1382038445650_0002_01 was 
 used for starting container with container token issued for application 
 attempt : appattempt_1382038445650_0001_01
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1305) RMHAProtocolService#serviceInit should handle HAUtil's IllegalArgumentException