[jira] [Commented] (YARN-2666) TestFairScheduler.testContinuousScheduling fails Intermittently

2015-04-06 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481037#comment-14481037
 ] 

Tsuyoshi Ozawa commented on YARN-2666:
--

+1. It's better to call scheduler.continuousSchedulingAttempt() instead of 
waiting for scheduling. Committing this shortly.

> TestFairScheduler.testContinuousScheduling fails Intermittently
> ---
>
> Key: YARN-2666
> URL: https://issues.apache.org/jira/browse/YARN-2666
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: scheduler
>Reporter: Tsuyoshi Ozawa
>Assignee: zhihai xu
> Attachments: YARN-2666.000.patch
>
>
> The test fails on trunk.
> {code}
> Tests run: 79, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.698 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
> testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
>   Time elapsed: 0.582 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3372)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3450) Application Master killed RPC Port AM Host not shown in CLI

2015-04-06 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-3450:
--

 Summary: Application Master killed RPC Port AM Host not shown in 
CLI
 Key: YARN-3450
 URL: https://issues.apache.org/jira/browse/YARN-3450
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, resourcemanager
Reporter: Bibin A Chundatt
Assignee: Naganarasimha G R
Priority: Minor


Start Sleep job
Kill application Master process
Check status of application Attempt
When Application Master killed RPC Port , AM Host not shown in CLI

{quote}
dsperf@host-10-19-92-117:~/HADOPV1R2/install/hadoop/nodemanager/bin> 
{color:red} ./yarn applicationattempt  -status 
appattempt_1428321793042_0005_01 {color}
15/04/06 13:40:52 INFO impl.TimelineClientImpl: Timeline service address: 
http://10.19.92.127:8188/ws/v1/timeline/
15/04/06 13:40:52 INFO client.AHSProxy: Connecting to Application History 
server at /10.19.92.127:45034
15/04/06 13:40:53 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm2
Application Attempt Report :
ApplicationAttempt-Id : appattempt_1428321793042_0005_01
State : FAILED
AMContainer : container_1428321793042_0005_01_01
Tracking-URL : 
http://host-10-19-92-127:45020/cluster/app/application_1428321793042_0005
 {color:red}
RPC Port : -1
AM Host : N/A
 {color}
Diagnostics : AM Container for appattempt_1428321793042_0005_01 
exited with  exitCode: 137

{quote}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3450) Application Master killed RPC Port AM Host not shown in CLI

2015-04-06 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481040#comment-14481040
 ] 

Bibin A Chundatt commented on YARN-3450:


/applicationhistory/appattempt/appattempt_1428321793042_0005_01

In Web UI the same is shown properly 

Application Attempt Overview

State   FAILED
Master Containercontainer_1428321793042_0005_01_01
Node:   host-10-19-92-143:49820
Tracking URL:   History

> Application Master killed RPC Port AM Host not shown in CLI
> ---
>
> Key: YARN-3450
> URL: https://issues.apache.org/jira/browse/YARN-3450
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Naganarasimha G R
>Priority: Minor
>
> Start Sleep job
> Kill application Master process
> Check status of application Attempt
> When Application Master killed RPC Port , AM Host not shown in CLI
> {quote}
> dsperf@host-10-19-92-117:~/HADOPV1R2/install/hadoop/nodemanager/bin> 
> {color:red} ./yarn applicationattempt  -status 
> appattempt_1428321793042_0005_01 {color}
> 15/04/06 13:40:52 INFO impl.TimelineClientImpl: Timeline service address: 
> http://10.19.92.127:8188/ws/v1/timeline/
> 15/04/06 13:40:52 INFO client.AHSProxy: Connecting to Application History 
> server at /10.19.92.127:45034
> 15/04/06 13:40:53 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Application Attempt Report :
> ApplicationAttempt-Id : appattempt_1428321793042_0005_01
> State : FAILED
> AMContainer : container_1428321793042_0005_01_01
> Tracking-URL : 
> http://host-10-19-92-127:45020/cluster/app/application_1428321793042_0005
>  {color:red}
> RPC Port : -1
> AM Host : N/A
>  {color}
> Diagnostics : AM Container for appattempt_1428321793042_0005_01 
> exited with  exitCode: 137
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3451) Add start time and end time in ApplicationAttemptReport and display the same in RMAttemptBlock WebUI

2015-04-06 Thread Rohith (JIRA)
Rohith created YARN-3451:


 Summary: Add start time and end time in ApplicationAttemptReport 
and display the same in RMAttemptBlock WebUI
 Key: YARN-3451
 URL: https://issues.apache.org/jira/browse/YARN-3451
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api, webapp
Reporter: Rohith
Assignee: Rohith


Unlike ApplicationReport and ApplicationBlock has *Started:* and *Elapsed:* 
time, It would be usefull if start time and Elapsed is sent in 
ApplicationAttemptReport and display in ApplicationAttemptBlock. 
This gives granular debugging ability when analyzing issue with multiple 
attempt failure like attempt timedout.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3110) Few issues in ApplicationHistory web ui

2015-04-06 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481163#comment-14481163
 ] 

Xuan Gong commented on YARN-3110:
-

[~Naganarasimha] The patch looks good, but it can not apply to trunk/branch-2 
any more. Could you rebase it , please ?

> Few issues in ApplicationHistory web ui
> ---
>
> Key: YARN-3110
> URL: https://issues.apache.org/jira/browse/YARN-3110
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, timelineserver
>Affects Versions: 2.6.0
>Reporter: Bibin A Chundatt
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-3110.20150209-1.patch, YARN-3110.20150315-1.patch
>
>
> Application state and History link wrong when Application is in unassigned 
> state
>  
> 1.Configure capacity schedular with queue size as 1  also max Absolute Max 
> Capacity:  10.0%
> (Current application state is Accepted and Unassigned from resource manager 
> side)
> 2.Submit application to queue and check the state and link in Application 
> history
> State= null and History link shown as N/A in applicationhistory page
> Kill the same application . In timeline server logs the below is show when 
> selecting application link.
> {quote}
> 2015-01-29 15:39:50,956 ERROR org.apache.hadoop.yarn.webapp.View: Failed to 
> read the AM container of the application attempt 
> appattempt_1422467063659_0007_01.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainer(ApplicationHistoryManagerOnTimelineStore.java:162)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAMContainer(ApplicationHistoryManagerOnTimelineStore.java:184)
>   at 
> org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:160)
>   at 
> org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:157)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:156)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
>   at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
>   at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
>   at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSController.app(AHSController.java:38)
>   at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$St

[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer

2015-04-06 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481225#comment-14481225
 ] 

Daryn Sharp commented on YARN-3055:
---

Correctly handling the "don't cancel" setting for jobs submitting job has been 
a recurring issue.  We're internally testing a small patch to continue renewing 
until all jobs using the token(s) have finished.  Handling the auto-fetch of 
proxy tokens proved a bit more difficult so I need to complete the internal 
patch.  I can take this over or post a partial patch if [~hitliuyi] would like 
to finish it.

> The token is not renewed properly if it's shared by jobs (oozie) in 
> DelegationTokenRenewer
> --
>
> Key: YARN-3055
> URL: https://issues.apache.org/jira/browse/YARN-3055
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Blocker
> Attachments: YARN-3055.001.patch, YARN-3055.002.patch
>
>
> After YARN-2964, there is only one timer to renew the token if it's shared by 
> jobs. 
> In {{removeApplicationFromRenewal}}, when going to remove a token, and the 
> token is shared by other jobs, we will not cancel the token. 
> Meanwhile, we should not cancel the _timerTask_, also we should not remove it 
> from {{allTokens}}. Otherwise for the existing submitted applications which 
> share this token will not get renew any more, and for new submitted 
> applications which share this token, the token will be renew immediately.
> For example, we have 3 applications: app1, app2, app3. And they share the 
> token1. See following scenario:
> *1).* app1 is submitted firstly, then app2, and then app3. In this case, 
> there is only one token renewal timer for token1, and is scheduled when app1 
> is submitted
> *2).* app1 is finished, then the renewal timer is cancelled. token1 will not 
> be renewed any more, but app2 and app3 still use it, so there is problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat

2015-04-06 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481239#comment-14481239
 ] 

Xuan Gong commented on YARN-1376:
-

fix -1 on release audit.

-1 on findbugs is not related

> NM need to notify the log aggregation status to RM through Node heartbeat
> -
>
> Key: YARN-1376
> URL: https://issues.apache.org/jira/browse/YARN-1376
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-1376.1.patch, YARN-1376.2.patch, YARN-1376.2.patch, 
> YARN-1376.2015-04-04.patch, YARN-1376.2015-04-06.patch, YARN-1376.3.patch, 
> YARN-1376.4.patch
>
>
> Expose a client API to allow clients to figure if log aggregation is 
> complete. The ticket is used to track the changes on NM side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat

2015-04-06 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1376:

Attachment: YARN-1376.2015-04-06.patch

> NM need to notify the log aggregation status to RM through Node heartbeat
> -
>
> Key: YARN-1376
> URL: https://issues.apache.org/jira/browse/YARN-1376
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-1376.1.patch, YARN-1376.2.patch, YARN-1376.2.patch, 
> YARN-1376.2015-04-04.patch, YARN-1376.2015-04-06.patch, YARN-1376.3.patch, 
> YARN-1376.4.patch
>
>
> Expose a client API to allow clients to figure if log aggregation is 
> complete. The ticket is used to track the changes on NM side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3449) Recover appTokenKeepAliveMap upon nodemanager restart

2015-04-06 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481249#comment-14481249
 ] 

Jason Lowe commented on YARN-3449:
--

While the NM is aggregating logs the application is still present in the state 
store, and the application should be recovered as still active after an NM 
restart.  The NM will then register with those applications listed as still 
active.  When the RM later tells the NM that those applications should be 
cleaned up, the applications should be added to the keep alive list as normal.  
Thus I think the appTokenKeepAliveMap state should already be recovered 
properly without explicitly persisting it -- or am I missing something?

> Recover appTokenKeepAliveMap upon nodemanager restart
> -
>
> Key: YARN-3449
> URL: https://issues.apache.org/jira/browse/YARN-3449
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
>
> appTokenKeepAliveMap in NodeStatusUpdaterImpl is used to keep application 
> alive after application is finished but NM still need app token to do log 
> aggregation (when enable security and log aggregation). 
> The applications are only inserted into this map when receiving 
> getApplicationsToCleanup() from RM heartbeat response. And RM only send this 
> info one time in RMNodeImpl.updateNodeHeartbeatResponseForCleanup(). NM 
> restart work preserving should put appTokenKeepAliveMap into NMStateStore and 
> get recovered after restart. Without doing this, RM could terminate 
> application earlier, so log aggregation could be failed if security is 
> enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat

2015-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481327#comment-14481327
 ] 

Hadoop QA commented on YARN-1376:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12723354/YARN-1376.2015-04-06.patch
  against trunk revision 53959e6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7223//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7223//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7223//console

This message is automatically generated.

> NM need to notify the log aggregation status to RM through Node heartbeat
> -
>
> Key: YARN-1376
> URL: https://issues.apache.org/jira/browse/YARN-1376
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-1376.1.patch, YARN-1376.2.patch, YARN-1376.2.patch, 
> YARN-1376.2015-04-04.patch, YARN-1376.2015-04-06.patch, YARN-1376.3.patch, 
> YARN-1376.4.patch
>
>
> Expose a client API to allow clients to figure if log aggregation is 
> complete. The ticket is used to track the changes on NM side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3449) Recover appTokenKeepAliveMap upon nodemanager restart

2015-04-06 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481330#comment-14481330
 ] 

Junping Du commented on YARN-3449:
--

Thanks [~jlowe] for replying with comments!
I didn't quite sure about this. However, from what I learnt from the code, 
looks like we are renewing the delegation tokens in RM side for finishing Apps 
but NM still need them to do log aggregation. The way NM keep token alive for 
log aggregation is to send appTokenKeepAliveMap in heartbeat to RM and keep the 
time value updated (currentTime + 0.7~0.9 * tokenRemovalDelayMs) in every 
heartbeat request/response. If appTokenKeepAliveMap doesn't get recovered after 
NM get restarted, then NM will never add these apps in keep alive list 
(appsToCleanup only sent once by RM) and RM won't renew the token after the 
time get expired (based on last heartbeat request before NM start) because it 
won't receive any new messages from NM on these apps. 
In practical, this issues doesn't appear obviously because tokenRemovalDelayMs 
is often very large (10 minutes by default), and very few case that NM cannot 
finish log aggregation after this time (even counting NM restart time). 
However, we should still fix it because it making behavior of delegation token 
renewing inconsistent before and after NM restart (and cause bug at least 
theoretically). Isn't it?

> Recover appTokenKeepAliveMap upon nodemanager restart
> -
>
> Key: YARN-3449
> URL: https://issues.apache.org/jira/browse/YARN-3449
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
>
> appTokenKeepAliveMap in NodeStatusUpdaterImpl is used to keep application 
> alive after application is finished but NM still need app token to do log 
> aggregation (when enable security and log aggregation). 
> The applications are only inserted into this map when receiving 
> getApplicationsToCleanup() from RM heartbeat response. And RM only send this 
> info one time in RMNodeImpl.updateNodeHeartbeatResponseForCleanup(). NM 
> restart work preserving should put appTokenKeepAliveMap into NMStateStore and 
> get recovered after restart. Without doing this, RM could terminate 
> application earlier, so log aggregation could be failed if security is 
> enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-04-06 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3044:

Attachment: YARN-3044.20150406-1.patch


On after thoughts, i have changed  changed the approach for this jira (similar 
to the one mentioned by Zhijie ), because creating seperate stack was not only 
inducing too many changes but i was also skeptical that in future supporting 
removal of dependency from timelineservice project in RM project will again 
induce changes in seperate stack for V2 approach. 
So approach what i have taken is 
1> SMP will exist and SMP creates RMTimelineCollector. and RM will not be aware 
of RMTimelineCollector.
2> Seggregrate the V1 event handler code from SMP as TimelineV1Handler . 
(simpler to remove V1 support it in future )
3> add modifications in SMP such that TimelineV1Handler or RMTimelineCollector 
(V2) will be created based on configuration. Also appropriate Dispatchers are 
to be selected.

I have also incorporated the changes to support RMContainer metrics based on 
configuration (Junping's comments).

Pending tasks :
* Testcases for RMTimelinecollector is not completed, as i dint want to take 
the approach of TestDistributedShell as tests in it are mostly checking 
whether the required files are created but in case of TestRMTimelinecollector, 
we need to check whether the Entities are properly populated, which requires 
Reader API and it seems to be not finalized yet. Also even to make use 
FileSystemTimelineWriterImpl to test requires TimelineCollectorContext and 
hence the dependency on YARN-3390 
* As mentioned earlier AppConfig information is not completely available in RM 
side, hence currently have populated the Environment config available in RMApp. 
Shall i raise a new jira to support a method in TimelineClient interface to 
load Appconfig ?


> [Event producers] Implement RM writing app lifecycle events to ATS
> --
>
> Key: YARN-3044
> URL: https://issues.apache.org/jira/browse/YARN-3044
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch
>
>
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481357#comment-14481357
 ] 

Hadoop QA commented on YARN-3044:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12723366/YARN-3044.20150406-1.patch
  against trunk revision 28bebc8.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7224//console

This message is automatically generated.

> [Event producers] Implement RM writing app lifecycle events to ATS
> --
>
> Key: YARN-3044
> URL: https://issues.apache.org/jira/browse/YARN-3044
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch
>
>
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities

2015-04-06 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481362#comment-14481362
 ] 

Jason Lowe commented on YARN-3448:
--

Thanks for the patch, Jonathan.  Interesting approach, and this should 
drastically improve performance for retention processing.  Some comments on the 
patch so far:

I think the code would be easier to follow if we didn't abuse Map.Entry as a 
pair class to associate a WriteBatch to the corresponding DB.  Creating a 
custom utility class to associate these would make the code a lot more readable 
than always needing to deduce that getKey() is a database and getValue() is a 
WriteBatch.

The underlying database throws a runtime exception, and the existing leveldb 
store translates these to IOExceptions.  I think we want to do the same here.  
For example, put has a try..finally block with no catch clauses yet the method 
says it does not throw exceptions like IOException.  Arguably it should throw 
IOException when the database has an error.

The original leveldb code had locking around entities but I don't see it here.  
Since updating entities often involves a read/modify/write operation on the 
database, are we sure it's OK to remove that synchronization?

computeCheckMillis says it needs to be called synchronously, but it looks like 
it can be called without a lock via a number of routes, e.g.:
put -> putIndex -> computeCurrentCheckMillis -> computeCheckMillis
put -> putEntities -> computeCurrentCheckMillis -> computeCheckMillis

These should probably be debug statements, otherwise I think they could be 
quite spammy in the server log.  Also the latter one will always be followed by 
the former because of the loop and may not be that useful in practice, even at 
the debug level.
{code}
+LOG.info("Trying the  db" + db);
...
+LOG.info("Trying the previous db" + db);
{code}

This will NPE on entityUpdate if db is null, and the code explicity checks for 
that possibility:
{code}
+  Map.Entry entityUpdate = 
entityUpdates.get(roundedStartTime);
+  if (entityUpdate == null) {
+DB db = entitydb.getDBForStartTime(startAndInsertTime.startTime);
+if (db != null) {
+  WriteBatch writeBatch = db.createWriteBatch();
+  entityUpdate = new AbstractMap.SimpleImmutableEntry(db, writeBatch);
+  entityUpdates.put(roundedStartTime, entityUpdate);
+};
+  }
+  WriteBatch writeBatch = entityUpdate.getValue();
{code}

In the following code we lookup relatedEntityUpdate but then after checking if 
it's null never use it again.  I think we're supposed to be setting up 
relatedEntityUpdate in the block if it's null rather than re-assigning 
entityUpdate.  Then after the null check we should be using relatedEntityUpdate 
rather than entityUpdate to get the proper write batch.
{code}
+Map.Entry relatedEntityUpdate = 
entityUpdates.get(relatedRoundedStartTime);
+if (relatedEntityUpdate == null) {
+  DB db = entitydb.getDBForStartTime(relatedStartTimeLong);
+  if (db != null) {
+WriteBatch relatedWriteBatch = db.createWriteBatch();
+entityUpdate = new AbstractMap.SimpleImmutableEntry(
+db, relatedWriteBatch);
+entityUpdates.put(relatedRoundedStartTime, entityUpdate);
+  }
+  ;
+}
+WriteBatch relatedWriteBatch = entityUpdate.getValue();
{code}

This code is commented out.  Should have been deleted or is there something 
left to do here with respect to related entitites?
{code}
+/*
+for (EntityIdentifier relatedEntity : relatedEntitiesWithoutStartTimes) {
+  try {
+StartAndInsertTime relatedEntityStartAndInsertTime =
+getAndSetStartTime(relatedEntity.getId(), relatedEntity.getType(),
+readReverseOrderedLong(revStartTime, 0), null);
+if (relatedEntityStartAndInsertTime == null) {
+  throw new IOException("Error setting start time for related entity");
+}
+byte[] relatedEntityStartTime = writeReverseOrderedLong(
+relatedEntityStartAndInsertTime.startTime);
+  // This is the new entity, the domain should be the same
+byte[] key = createDomainIdKey(relatedEntity.getId(),
+relatedEntity.getType(), relatedEntityStartTime);
+writeBatch.put(key, entity.getDomainId().getBytes());
+++putCount;
+writeBatch.put(createRelatedEntityKey(relatedEntity.getId(),
+relatedEntity.getType(), relatedEntityStartTime,
+entity.getEntityId(), entity.getEntityType()), EMPTY_BYTES);
+++putCount;
+writeBatch.put(createEntityMarkerKey(relatedEntity.getId(),
+relatedEntity.getType(), relatedEntityStartTime),
+writeReverseOrderedLong(relatedEntityStartAndInsertTime
+.ins

[jira] [Commented] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.

2015-04-06 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481371#comment-14481371
 ] 

Zhijie Shen commented on YARN-3334:
---

bq. Just filed YARN-3445 to track this issue. 

Yup, I think we can separate that issue. For this patch. the code comment is 
good for now. Will commit this patch.

> [Event Producers] NM TimelineClient life cycle handling and container metrics 
> posting to new timeline service.
> --
>
> Key: YARN-3334
> URL: https://issues.apache.org/jira/browse/YARN-3334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
> YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, 
> YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334-v8.patch, YARN-3334.7.patch
>
>
> After YARN-3039, we have service discovery mechanism to pass app-collector 
> service address among collectors, NMs and RM. In this JIRA, we will handle 
> service address setting for TimelineClients in NodeManager, and put container 
> metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3449) Recover appTokenKeepAliveMap upon nodemanager restart

2015-04-06 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481393#comment-14481393
 ] 

Jason Lowe commented on YARN-3449:
--

I believe the apps in the appTokenKeepAliveMap will be recovered per my first 
comment, but yes the relative delays stored in that map will not match what was 
there before.  However I'm not sure it matters that we have the exact times in 
there.  Again when the NM re-registers it will report all active applications, 
and the RM will attempt to correct this on the next heartbeat.  The NM will 
then add all apps that are still aggregating to the appTokenKeepAliveMap and 
report that to the RM, and the RM will delay the token removal accordingly.  I 
don't think this changes when the token is renewed on the RM, just when the 
token may be cancelled.

Is this JIRA tracking an actual failure that occurred or a theoretical 
occurrence?

> Recover appTokenKeepAliveMap upon nodemanager restart
> -
>
> Key: YARN-3449
> URL: https://issues.apache.org/jira/browse/YARN-3449
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
>
> appTokenKeepAliveMap in NodeStatusUpdaterImpl is used to keep application 
> alive after application is finished but NM still need app token to do log 
> aggregation (when enable security and log aggregation). 
> The applications are only inserted into this map when receiving 
> getApplicationsToCleanup() from RM heartbeat response. And RM only send this 
> info one time in RMNodeImpl.updateNodeHeartbeatResponseForCleanup(). NM 
> restart work preserving should put appTokenKeepAliveMap into NMStateStore and 
> get recovered after restart. Without doing this, RM could terminate 
> application earlier, so log aggregation could be failed if security is 
> enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3110) Few issues in ApplicationHistory web ui

2015-04-06 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3110:

Attachment: YARN-3110.20150406-1.patch

attaching patch with rebased code 

> Few issues in ApplicationHistory web ui
> ---
>
> Key: YARN-3110
> URL: https://issues.apache.org/jira/browse/YARN-3110
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, timelineserver
>Affects Versions: 2.6.0
>Reporter: Bibin A Chundatt
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-3110.20150209-1.patch, YARN-3110.20150315-1.patch, 
> YARN-3110.20150406-1.patch
>
>
> Application state and History link wrong when Application is in unassigned 
> state
>  
> 1.Configure capacity schedular with queue size as 1  also max Absolute Max 
> Capacity:  10.0%
> (Current application state is Accepted and Unassigned from resource manager 
> side)
> 2.Submit application to queue and check the state and link in Application 
> history
> State= null and History link shown as N/A in applicationhistory page
> Kill the same application . In timeline server logs the below is show when 
> selecting application link.
> {quote}
> 2015-01-29 15:39:50,956 ERROR org.apache.hadoop.yarn.webapp.View: Failed to 
> read the AM container of the application attempt 
> appattempt_1422467063659_0007_01.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainer(ApplicationHistoryManagerOnTimelineStore.java:162)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAMContainer(ApplicationHistoryManagerOnTimelineStore.java:184)
>   at 
> org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:160)
>   at 
> org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:157)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:156)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
>   at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
>   at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
>   at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSController.app(AHSController.java:38)
>   at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java

[jira] [Updated] (YARN-3334) [Event Producers] NM TimelineClient container metrics posting to new timeline service.

2015-04-06 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3334:
--
Summary: [Event Producers] NM TimelineClient container metrics posting to 
new timeline service.  (was: [Event Producers] NM TimelineClient life cycle 
handling and container metrics posting to new timeline service.)

> [Event Producers] NM TimelineClient container metrics posting to new timeline 
> service.
> --
>
> Key: YARN-3334
> URL: https://issues.apache.org/jira/browse/YARN-3334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
> YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, 
> YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334-v8.patch, YARN-3334.7.patch
>
>
> After YARN-3039, we have service discovery mechanism to pass app-collector 
> service address among collectors, NMs and RM. In this JIRA, we will handle 
> service address setting for TimelineClients in NodeManager, and put container 
> metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS

2015-04-06 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3045:
--
Summary: [Event producers] Implement NM writing container lifecycle events 
to ATS  (was: [Event producers] Implement NM writing container lifecycle events 
and container system metrics to ATS)

> [Event producers] Implement NM writing container lifecycle events to ATS
> 
>
> Key: YARN-3045
> URL: https://issues.apache.org/jira/browse/YARN-3045
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
>
> Per design in YARN-2928, implement NM writing container lifecycle events and 
> container system metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3334) [Event Producers] NM TimelineClient container metrics posting to new timeline service.

2015-04-06 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-3334.
---
   Resolution: Fixed
Fix Version/s: YARN-2928
 Hadoop Flags: Reviewed

Committed the patch to branch YARN-2928. Thanks for the patch, Junping! Thanks 
for review, Sangjin and Li!

> [Event Producers] NM TimelineClient container metrics posting to new timeline 
> service.
> --
>
> Key: YARN-3334
> URL: https://issues.apache.org/jira/browse/YARN-3334
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: YARN-2928
>
> Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
> YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, 
> YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334-v8.patch, YARN-3334.7.patch
>
>
> After YARN-3039, we have service discovery mechanism to pass app-collector 
> service address among collectors, NMs and RM. In this JIRA, we will handle 
> service address setting for TimelineClients in NodeManager, and put container 
> metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS

2015-04-06 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481410#comment-14481410
 ] 

Zhijie Shen commented on YARN-3045:
---

Container metrics publishing has been completed in YARN-3334, please continue 
the work around NM lifecycle events here. Change the title accordingly.

> [Event producers] Implement NM writing container lifecycle events to ATS
> 
>
> Key: YARN-3045
> URL: https://issues.apache.org/jira/browse/YARN-3045
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
>
> Per design in YARN-2928, implement NM writing container lifecycle events and 
> container system metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS

2015-04-06 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481412#comment-14481412
 ] 

Naganarasimha G R commented on YARN-3045:
-

[~djp], As YARN-3334 is in, can I start with this jira ?

> [Event producers] Implement NM writing container lifecycle events to ATS
> 
>
> Key: YARN-3045
> URL: https://issues.apache.org/jira/browse/YARN-3045
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
>
> Per design in YARN-2928, implement NM writing container lifecycle events and 
> container system metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS

2015-04-06 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481424#comment-14481424
 ] 

Naganarasimha G R commented on YARN-3045:
-

:) parallely had commented ... Will start working on this!

> [Event producers] Implement NM writing container lifecycle events to ATS
> 
>
> Key: YARN-3045
> URL: https://issues.apache.org/jira/browse/YARN-3045
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
>
> Per design in YARN-2928, implement NM writing container lifecycle events and 
> container system metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS

2015-04-06 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481425#comment-14481425
 ] 

Naganarasimha G R commented on YARN-3045:
-

:) parallely had commented ... Will start working on this!

> [Event producers] Implement NM writing container lifecycle events to ATS
> 
>
> Key: YARN-3045
> URL: https://issues.apache.org/jira/browse/YARN-3045
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
>
> Per design in YARN-2928, implement NM writing container lifecycle events and 
> container system metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2666) TestFairScheduler.testContinuousScheduling fails Intermittently

2015-04-06 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481433#comment-14481433
 ] 

zhihai xu commented on YARN-2666:
-

thanks [~ywskycn] for the review and thanks [~ozawa] for reviewing and 
committing the patch!

> TestFairScheduler.testContinuousScheduling fails Intermittently
> ---
>
> Key: YARN-2666
> URL: https://issues.apache.org/jira/browse/YARN-2666
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: scheduler
>Reporter: Tsuyoshi Ozawa
>Assignee: zhihai xu
> Fix For: 2.8.0
>
> Attachments: YARN-2666.000.patch
>
>
> The test fails on trunk.
> {code}
> Tests run: 79, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.698 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
> testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
>   Time elapsed: 0.582 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3372)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3110) Few issues in ApplicationHistory web ui

2015-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481450#comment-14481450
 ] 

Hadoop QA commented on YARN-3110:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12723370/YARN-3110.20150406-1.patch
  against trunk revision 28bebc8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7225//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7225//console

This message is automatically generated.

> Few issues in ApplicationHistory web ui
> ---
>
> Key: YARN-3110
> URL: https://issues.apache.org/jira/browse/YARN-3110
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, timelineserver
>Affects Versions: 2.6.0
>Reporter: Bibin A Chundatt
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-3110.20150209-1.patch, YARN-3110.20150315-1.patch, 
> YARN-3110.20150406-1.patch
>
>
> Application state and History link wrong when Application is in unassigned 
> state
>  
> 1.Configure capacity schedular with queue size as 1  also max Absolute Max 
> Capacity:  10.0%
> (Current application state is Accepted and Unassigned from resource manager 
> side)
> 2.Submit application to queue and check the state and link in Application 
> history
> State= null and History link shown as N/A in applicationhistory page
> Kill the same application . In timeline server logs the below is show when 
> selecting application link.
> {quote}
> 2015-01-29 15:39:50,956 ERROR org.apache.hadoop.yarn.webapp.View: Failed to 
> read the AM container of the application attempt 
> appattempt_1422467063659_0007_01.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainer(ApplicationHistoryManagerOnTimelineStore.java:162)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAMContainer(ApplicationHistoryManagerOnTimelineStore.java:184)
>   at 
> org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:160)
>   at 
> org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:157)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:156)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
>   at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
>   at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
>   at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSController.app(AHSController.java:38)
>   at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav

[jira] [Commented] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM

2015-04-06 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481471#comment-14481471
 ] 

Sidharta Seethana commented on YARN-3443:
-

Thanks for the review, [~vvasudev] . Responses below :
1. I could change such log lines to use StringBuffer everywhere. However, I 
think a metric based on calls to LOG.warn()/LOG.error() does not accurately 
reflect the warn/error 'even't count. 
2. I'll add an info log line to the if block. I had added this block for better 
readability since the behavior here is a little different from the current 
implementation in CgroupsLCEResourcesHandler
3. - 4. Same comment as 1. 
5. Sure, I can make this change.
6. This is by design. Otherwise, every resource handler implementation that 
uses cgroups will have to check if cgroup mounting is enabled or not (which is 
error-prone). It seemed better to instead ignore a mount request when cgroup 
mounting is disabled.
7. I'll fix it.
8. I'll fix it.
9. I'll fix it.
10. Yikes. Yes, I'll fix it. Not sure how this one got through.
11. I had added it for clarity, but maybe it isn't necessary. I'll remove it.
12. I'll fix it.
13. I applied a formatter to all file before creating the patch - but, I'll 
verify.

> Create a 'ResourceHandler' subsystem to ease addition of support for new 
> resource types on the NM
> -
>
> Key: YARN-3443
> URL: https://issues.apache.org/jira/browse/YARN-3443
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-3443.001.patch, YARN-3443.002.patch
>
>
> The current cgroups implementation is closely tied to supporting CPU as a 
> resource . We need to separate out CGroups support as well a provide a simple 
> ResourceHandler subsystem that will enable us to add support for new resource 
> types on the NM - e.g Network, Disk etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481496#comment-14481496
 ] 

Jian He commented on YARN-3021:
---

[~yzhangal], I was out last couple weeks. sorry for the late response. 
Patch looks good overall, one comment:
the {{skipTokenRenewal(token)}} check in {{requestNewHdfsDelegationToken}} may 
be not needed because it's explicitly passing 
{{UserGroupInformation.getLoginUser().getUserName()}} as the renewer, and so 
the  token "renewer" won't be empty.

> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
>Assignee: Yongjun Zhang
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
> YARN-3021.006.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3404) View the queue name to YARN Application page

2015-04-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481511#comment-14481511
 ] 

Jian He commented on YARN-3404:
---

[~ryu_kobayashi], thanks for the patch ! 

Could you also add a link for the queue name to the actual scheduler queue page 
? 
Similarly, the existing user name can also be a link to the "Active Users Info" 
on scheduler page.

> View the queue name to YARN Application page
> 
>
> Key: YARN-3404
> URL: https://issues.apache.org/jira/browse/YARN-3404
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Minor
> Attachments: YARN-3404.1.patch, screenshot.png
>
>
> It want to display the name of the queue that is used to YARN Application 
> page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-197) Add a separate log server

2015-04-06 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481525#comment-14481525
 ] 

Siddharth Seth commented on YARN-197:
-

Yes, as long as the logs are being served out by a sub-system other than the 
MapReduce history server.

> Add a separate log server
> -
>
> Key: YARN-197
> URL: https://issues.apache.org/jira/browse/YARN-197
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Siddharth Seth
>
> Currently, the job history server is being used for log serving. A separate 
> log server can be added which can deal with serving logs, along with other 
> functionality like log retention, merging, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging

2015-04-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481557#comment-14481557
 ] 

Jian He commented on YARN-3273:
---

sure, please go ahead.

> Improve web UI to facilitate scheduling analysis and debugging
> --
>
> Key: YARN-3273
> URL: https://issues.apache.org/jira/browse/YARN-3273
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Rohith
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 
> 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 
> 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, 
> YARN-3273-am-resource-used-AND-User-limit.PNG, 
> YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG
>
>
> Job may be stuck for reasons such as:
> - hitting queue capacity 
> - hitting user-limit, 
> - hitting AM-resource-percentage 
> The  first queueCapacity is already shown on the UI.
> We may surface things like:
> - what is user's current usage and user-limit; 
> - what is the AM resource usage and limit;
> - what is the application's current HeadRoom;
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3449) Recover appTokenKeepAliveMap upon nodemanager restart

2015-04-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du resolved YARN-3449.
--
Resolution: Invalid
  Assignee: (was: Junping Du)

> Recover appTokenKeepAliveMap upon nodemanager restart
> -
>
> Key: YARN-3449
> URL: https://issues.apache.org/jira/browse/YARN-3449
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Junping Du
>
> appTokenKeepAliveMap in NodeStatusUpdaterImpl is used to keep application 
> alive after application is finished but NM still need app token to do log 
> aggregation (when enable security and log aggregation). 
> The applications are only inserted into this map when receiving 
> getApplicationsToCleanup() from RM heartbeat response. And RM only send this 
> info one time in RMNodeImpl.updateNodeHeartbeatResponseForCleanup(). NM 
> restart work preserving should put appTokenKeepAliveMap into NMStateStore and 
> get recovered after restart. Without doing this, RM could terminate 
> application earlier, so log aggregation could be failed if security is 
> enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3449) Recover appTokenKeepAliveMap upon nodemanager restart

2015-04-06 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481606#comment-14481606
 ] 

Junping Du commented on YARN-3449:
--

bq. Again when the NM re-registers it will report all active applications, and 
the RM will attempt to correct this on the next heartbeat. 
You are right, [~jlowe]. I think I could miss CLEANUP_APP would be resent in 
node reconnection (totally forget it for some strange reason). So that 
shouldn't be a problem. BTW, I didn't see any actual failure on this, so I will 
resolve it as invalid.

> Recover appTokenKeepAliveMap upon nodemanager restart
> -
>
> Key: YARN-3449
> URL: https://issues.apache.org/jira/browse/YARN-3449
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
>
> appTokenKeepAliveMap in NodeStatusUpdaterImpl is used to keep application 
> alive after application is finished but NM still need app token to do log 
> aggregation (when enable security and log aggregation). 
> The applications are only inserted into this map when receiving 
> getApplicationsToCleanup() from RM heartbeat response. And RM only send this 
> info one time in RMNodeImpl.updateNodeHeartbeatResponseForCleanup(). NM 
> restart work preserving should put appTokenKeepAliveMap into NMStateStore and 
> get recovered after restart. Without doing this, RM could terminate 
> application earlier, so log aggregation could be failed if security is 
> enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging

2015-04-06 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481673#comment-14481673
 ] 

Zhijie Shen commented on YARN-3273:
---

Thanks for your confirmation, Jian! Will do it.

> Improve web UI to facilitate scheduling analysis and debugging
> --
>
> Key: YARN-3273
> URL: https://issues.apache.org/jira/browse/YARN-3273
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Rohith
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 
> 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 
> 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, 
> YARN-3273-am-resource-used-AND-User-limit.PNG, 
> YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG
>
>
> Job may be stuck for reasons such as:
> - hitting queue capacity 
> - hitting user-limit, 
> - hitting AM-resource-percentage 
> The  first queueCapacity is already shown on the UI.
> We may surface things like:
> - what is user's current usage and user-limit; 
> - what is the AM resource usage and limit;
> - what is the application's current HeadRoom;
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so

2015-04-06 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-2890:

Attachment: YARN-2890.4.patch

[~hitesh], Thanks for the comments. Attached updated patch. Created a new test 
file TestMiniYarnCluster that tests the the starting of timelineserver based on 
the configuration and enableAHS flag.

> MiniMRYarnCluster should turn on timeline service if configured to do so
> 
>
> Key: YARN-2890
> URL: https://issues.apache.org/jira/browse/YARN-2890
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, 
> YARN-2890.4.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
> YARN-2890.patch, YARN-2890.patch
>
>
> Currently the MiniMRYarnCluster does not consider the configuration value for 
> enabling timeline service before starting. The MiniYarnCluster should only 
> start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-06 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481881#comment-14481881
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~jianhe],

Thanks for taking a further look. No worry about the delay, I guessed you were 
out.

About your comment, the code 
{code}
 private void collectDelegationTokens(final String renewer,
   final Credentials credentials,
   final List> tokens)
   throws IOException {
final String serviceName = getCanonicalServiceName();
// Collect token of the this filesystem and then of its embedded children
if (serviceName != null) { // fs has token, grab it
  final Text service = new Text(serviceName);
  Token token = credentials.getToken(service); <
  if (token == null) {
token = getDelegationToken(renewer);
if (token != null) {
  tokens.add(token);
  credentials.addToken(service, token);
}
  }
}
{code}
The line highlighted with "<===" indicates that a token could be retrieved from 
the token map. In this case, are we sure that they always have a non-empty 
renewer? In addition, it's possible that we might change the 
{{skipTokenRenewer}} method in the future to do some additional checking.   
Seems safer to have this check. Do you think we should just keep this checking?

Thanks.




> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
>Assignee: Yongjun Zhang
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
> YARN-3021.006.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3452) Bogus token usernames cause many invalid group lookups

2015-04-06 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-3452:


 Summary: Bogus token usernames cause many invalid group lookups
 Key: YARN-3452
 URL: https://issues.apache.org/jira/browse/YARN-3452
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Reporter: Jason Lowe


YARN uses a number of bogus usernames for tokens, like application attempt IDs 
for NM tokens or even the hardcoded "testing" for the container localizer 
token.  These tokens cause the RPC layer to do group lookups on these bogus 
usernames which will never succeed but can take a long time to perform.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing

2015-04-06 Thread Ashwin Shankar (JIRA)
Ashwin Shankar created YARN-3453:


 Summary: Fair Scheduler : Parts of preemption logic uses 
DefaultResourceCalculator even in DRF mode causing thrashing
 Key: YARN-3453
 URL: https://issues.apache.org/jira/browse/YARN-3453
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Ashwin Shankar


There are two places in preemption code flow where DefaultResourceCalculator is 
used, even in DRF mode.
Which basically results in more resources getting preempted than needed, and 
those extra preempted containers aren’t even getting to the “starved” queue 
since scheduling logic is based on DRF's Calculator.

Following are the two places :
1. {code:title=FSLeafQueue.java|borderStyle=solid}
private boolean isStarved(Resource share)
{code}
A queue shouldn’t be marked as “starved” if the dominant resource usage
is >=  fair/minshare.

2. {code:title=FairScheduler.java|borderStyle=solid}
protected Resource resToPreempt(FSLeafQueue sched, long curTime)
{code}
--

One more thing that I believe needs to change in DRF mode is : during a 
preemption round,if preempting a few containers results in satisfying needs of 
a resource type, then we should exit that preemption round, since the 
containers that we just preempted should bring the dominant resource usage to 
min/fair share.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3426) Add jdiff support to YARN

2015-04-06 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3426:

Attachment: YARN-3426-040615.patch

In this patch I added jdiff support to YARN maven file. We're checking API 
compatibility for yarn-api, yarn-common, yarn-client and yarn-server-common 
now. I'm also attaching standard API file for those four components. 

jdiff result is generated via {{mvn package -Pdocs}}, same as hadoop-common and 
hadoop-hdfs. 

> Add jdiff support to YARN
> -
>
> Key: YARN-3426
> URL: https://issues.apache.org/jira/browse/YARN-3426
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: YARN-3426-040615.patch
>
>
> Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs 
> to YARN as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM

2015-04-06 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-3443:

Attachment: YARN-3443.003.patch

Patch that incorporates code review feedback from [~vvasudev]

> Create a 'ResourceHandler' subsystem to ease addition of support for new 
> resource types on the NM
> -
>
> Key: YARN-3443
> URL: https://issues.apache.org/jira/browse/YARN-3443
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-3443.001.patch, YARN-3443.002.patch, 
> YARN-3443.003.patch
>
>
> The current cgroups implementation is closely tied to supporting CPU as a 
> resource . We need to separate out CGroups support as well a provide a simple 
> ResourceHandler subsystem that will enable us to add support for new resource 
> types on the NM - e.g Network, Disk etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3429) TestAMRMTokens.testTokenExpiry fails Intermittently with error message:Invalid AMRMToken

2015-04-06 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481934#comment-14481934
 ] 

Robert Kanter commented on YARN-3429:
-

+1

> TestAMRMTokens.testTokenExpiry fails Intermittently with error 
> message:Invalid AMRMToken
> 
>
> Key: YARN-3429
> URL: https://issues.apache.org/jira/browse/YARN-3429
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3429.000.patch
>
>
> TestAMRMTokens.testTokenExpiry fails Intermittently with error 
> message:Invalid AMRMToken from appattempt_1427804754787_0001_01
> The error logs is at 
> https://builds.apache.org/job/PreCommit-YARN-Build/7172//testReport/org.apache.hadoop.yarn.server.resourcemanager.security/TestAMRMTokens/testTokenExpiry_1_/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3429) TestAMRMTokens.testTokenExpiry fails Intermittently with error message:Invalid AMRMToken

2015-04-06 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481963#comment-14481963
 ] 

Robert Kanter commented on YARN-3429:
-

Thanks Zhihai.  Committed to trunk and branch-2!

> TestAMRMTokens.testTokenExpiry fails Intermittently with error 
> message:Invalid AMRMToken
> 
>
> Key: YARN-3429
> URL: https://issues.apache.org/jira/browse/YARN-3429
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.8.0
>
> Attachments: YARN-3429.000.patch
>
>
> TestAMRMTokens.testTokenExpiry fails Intermittently with error 
> message:Invalid AMRMToken from appattempt_1427804754787_0001_01
> The error logs is at 
> https://builds.apache.org/job/PreCommit-YARN-Build/7172//testReport/org.apache.hadoop.yarn.server.resourcemanager.security/TestAMRMTokens/testTokenExpiry_1_/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging

2015-04-06 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3273:
--
Fix Version/s: (was: 2.8.0)
   2.7.0

Merged the commit to branch-2.7.

> Improve web UI to facilitate scheduling analysis and debugging
> --
>
> Key: YARN-3273
> URL: https://issues.apache.org/jira/browse/YARN-3273
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Rohith
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 
> 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 
> 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, 
> YARN-3273-am-resource-used-AND-User-limit.PNG, 
> YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG
>
>
> Job may be stuck for reasons such as:
> - hitting queue capacity 
> - hitting user-limit, 
> - hitting AM-resource-percentage 
> The  first queueCapacity is already shown on the UI.
> We may surface things like:
> - what is user's current usage and user-limit; 
> - what is the AM resource usage and limit;
> - what is the application's current HeadRoom;
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3430) RMAppAttempt headroom data is missing in RM Web UI

2015-04-06 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-3430.
---
Resolution: Fixed

After pull YARN-3273 into branch-2.7. Commit this patch again into branch-2.7

> RMAppAttempt headroom data is missing in RM Web UI
> --
>
> Key: YARN-3430
> URL: https://issues.apache.org/jira/browse/YARN-3430
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: YARN-3430.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2429) LCE should blacklist based upon group

2015-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481975#comment-14481975
 ] 

Hudson commented on YARN-2429:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7516 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7516/])
YARN-2429. TestAMRMTokens.testTokenExpiry fails Intermittently with error 
message:Invalid AMRMToken (zxu via rkanter) (rkanter: rev 
99b08a748e7b00a58b63330b353902a6da6aae27)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java
* hadoop-yarn-project/CHANGES.txt


> LCE should blacklist based upon group
> -
>
> Key: YARN-2429
> URL: https://issues.apache.org/jira/browse/YARN-2429
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Allen Wittenauer
>
> It should be possible to list a group to ban, not just individual users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging

2015-04-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481974#comment-14481974
 ] 

Hudson commented on YARN-3273:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7516 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7516/])
Move YARN-3273 from 2.8 to 2.7. (zjshen: rev 
3fb5abfc87953377f86e06578518801a181d7697)
* hadoop-yarn-project/CHANGES.txt


> Improve web UI to facilitate scheduling analysis and debugging
> --
>
> Key: YARN-3273
> URL: https://issues.apache.org/jira/browse/YARN-3273
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Rohith
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 
> 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 
> 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, 
> YARN-3273-am-resource-used-AND-User-limit.PNG, 
> YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG
>
>
> Job may be stuck for reasons such as:
> - hitting queue capacity 
> - hitting user-limit, 
> - hitting AM-resource-percentage 
> The  first queueCapacity is already shown on the UI.
> We may surface things like:
> - what is user's current usage and user-limit; 
> - what is the AM resource usage and limit;
> - what is the application's current HeadRoom;
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3426) Add jdiff support to YARN

2015-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482024#comment-14482024
 ] 

Hadoop QA commented on YARN-3426:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12723432/YARN-3426-040615.patch
  against trunk revision 28bebc8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 4 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7227//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7227//artifact/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7227//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7227//console

This message is automatically generated.

> Add jdiff support to YARN
> -
>
> Key: YARN-3426
> URL: https://issues.apache.org/jira/browse/YARN-3426
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: YARN-3426-040615.patch
>
>
> Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs 
> to YARN as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM

2015-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482026#comment-14482026
 ] 

Hadoop QA commented on YARN-3443:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12723436/YARN-3443.003.patch
  against trunk revision 28bebc8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7228//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7228//console

This message is automatically generated.

> Create a 'ResourceHandler' subsystem to ease addition of support for new 
> resource types on the NM
> -
>
> Key: YARN-3443
> URL: https://issues.apache.org/jira/browse/YARN-3443
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-3443.001.patch, YARN-3443.002.patch, 
> YARN-3443.003.patch
>
>
> The current cgroups implementation is closely tied to supporting CPU as a 
> resource . We need to separate out CGroups support as well a provide a simple 
> ResourceHandler subsystem that will enable us to add support for new resource 
> types on the NM - e.g Network, Disk etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined

2015-04-06 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482030#comment-14482030
 ] 

Robert Kanter commented on YARN-2942:
-

[~vinodkv], I was discussing this with some of our HDFS people, and they think 
using concat would do less (potentially much less) to actually result in NN 
metadata savings; instead of the original design of using append and rereading 
the files.  I agree that it would be best if HDFS supported atomic append (with 
concurrent writers) and rereading the files isn't ideal, but it seems like the 
original design is the best solution for the issue at hand for now.  Thoughts?

> Aggregated Log Files should be combined
> ---
>
> Key: YARN-2942
> URL: https://issues.apache.org/jira/browse/YARN-2942
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.6.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: CombinedAggregatedLogsProposal_v3.pdf, 
> CompactedAggregatedLogsProposal_v1.pdf, 
> CompactedAggregatedLogsProposal_v2.pdf, 
> ConcatableAggregatedLogsProposal_v4.pdf, 
> ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, 
> YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, 
> YARN-2942.003.patch
>
>
> Turning on log aggregation allows users to easily store container logs in 
> HDFS and subsequently view them in the YARN web UIs from a central place.  
> Currently, there is a separate log file for each Node Manager.  This can be a 
> problem for HDFS if you have a cluster with many nodes as you’ll slowly start 
> accumulating many (possibly small) files per YARN application.  The current 
> “solution” for this problem is to configure YARN (actually the JHS) to 
> automatically delete these files after some amount of time.  
> We should improve this by compacting the per-node aggregated log files into 
> one log file per application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3429) TestAMRMTokens.testTokenExpiry fails Intermittently with error message:Invalid AMRMToken

2015-04-06 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482034#comment-14482034
 ] 

zhihai xu commented on YARN-3429:
-

thanks [~rkanter] for reviewing and committing the patch.

> TestAMRMTokens.testTokenExpiry fails Intermittently with error 
> message:Invalid AMRMToken
> 
>
> Key: YARN-3429
> URL: https://issues.apache.org/jira/browse/YARN-3429
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.8.0
>
> Attachments: YARN-3429.000.patch
>
>
> TestAMRMTokens.testTokenExpiry fails Intermittently with error 
> message:Invalid AMRMToken from appattempt_1427804754787_0001_01
> The error logs is at 
> https://builds.apache.org/job/PreCommit-YARN-Build/7172//testReport/org.apache.hadoop.yarn.server.resourcemanager.security/TestAMRMTokens/testTokenExpiry_1_/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-2901) Add errors and warning metrics page to RM, NM web UI

2015-04-06 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu reopened YARN-2901:
-

The recent commit added two findbugs warnings on my local machine and in the 
build of YARN-3426:
{code}

CodeWarning
UrF Unread public/protected field: 
org.apache.hadoop.yarn.util.Log4jWarningErrorMetricsAppender$Element.count
UrF Unread public/protected field: 
org.apache.hadoop.yarn.util.Log4jWarningErrorMetricsAppender$Element.timestampSeconds
{code}

>From git blame this JIRA is the one performed the most recent change. Reopen 
>this JIRA to fix them. 

> Add errors and warning metrics page to RM, NM web UI
> 
>
> Key: YARN-2901
> URL: https://issues.apache.org/jira/browse/YARN-2901
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.8.0
>
> Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
> Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, 
> apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch, 
> apache-yarn-2901.4.patch, apache-yarn-2901.5.patch
>
>
> It would be really useful to have statistics on the number of errors and 
> warnings in the RM and NM web UI. I'm thinking about -
> 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
> 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
> hours/day
> By errors and warnings I'm referring to the log level.
> I suspect we can probably achieve this by writing a custom appender?(I'm open 
> to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2901) Add errors and warning metrics page to RM, NM web UI

2015-04-06 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482051#comment-14482051
 ] 

Li Lu commented on YARN-2901:
-

BTW, Jenkins didn't report those two warnings in this JIRA because probably it 
ran against another patch in the run 4 days ago. 

> Add errors and warning metrics page to RM, NM web UI
> 
>
> Key: YARN-2901
> URL: https://issues.apache.org/jira/browse/YARN-2901
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.8.0
>
> Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
> Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, 
> apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch, 
> apache-yarn-2901.4.patch, apache-yarn-2901.5.patch
>
>
> It would be really useful to have statistics on the number of errors and 
> warnings in the RM and NM web UI. I'm thinking about -
> 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
> 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
> hours/day
> By errors and warnings I'm referring to the log level.
> I suspect we can probably achieve this by writing a custom appender?(I'm open 
> to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3426) Add jdiff support to YARN

2015-04-06 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482053#comment-14482053
 ] 

Li Lu commented on YARN-3426:
-

The findbugs warnings are unrelated here. Reopened YARN-2901 to trace it. 

> Add jdiff support to YARN
> -
>
> Key: YARN-3426
> URL: https://issues.apache.org/jira/browse/YARN-3426
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: YARN-3426-040615.patch
>
>
> Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs 
> to YARN as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage

2015-04-06 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3391:
--
Attachment: YARN-3391.2.patch

Rebase the patch after YARN-3334. Comments are welcome

> Clearly define flow ID/ flow run / flow version in API and storage
> --
>
> Key: YARN-3391
> URL: https://issues.apache.org/jira/browse/YARN-3391
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3391.1.patch, YARN-3391.2.patch
>
>
> To continue the discussion in YARN-3040, let's figure out the best way to 
> describe the flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed 
> into the collector and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not 
> set it)
> - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.

2015-04-06 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3431:
--
Attachment: YARN-3431.2.patch

Rebase the patch after YARN-3334.

> Sub resources of timeline entity needs to be passed to a separate endpoint.
> ---
>
> Key: YARN-3431
> URL: https://issues.apache.org/jira/browse/YARN-3431
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3431.1.patch, YARN-3431.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3110) Few issues in ApplicationHistory web ui

2015-04-06 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482203#comment-14482203
 ] 

Naganarasimha G R commented on YARN-3110:
-

Hi [~xgong],
I have rebased and manually tested with the trunk code and able to see the 
modifications in the web ui . As the modifications are related to the web ui i 
have not written the testcode.
Can you please check now ?

> Few issues in ApplicationHistory web ui
> ---
>
> Key: YARN-3110
> URL: https://issues.apache.org/jira/browse/YARN-3110
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, timelineserver
>Affects Versions: 2.6.0
>Reporter: Bibin A Chundatt
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-3110.20150209-1.patch, YARN-3110.20150315-1.patch, 
> YARN-3110.20150406-1.patch
>
>
> Application state and History link wrong when Application is in unassigned 
> state
>  
> 1.Configure capacity schedular with queue size as 1  also max Absolute Max 
> Capacity:  10.0%
> (Current application state is Accepted and Unassigned from resource manager 
> side)
> 2.Submit application to queue and check the state and link in Application 
> history
> State= null and History link shown as N/A in applicationhistory page
> Kill the same application . In timeline server logs the below is show when 
> selecting application link.
> {quote}
> 2015-01-29 15:39:50,956 ERROR org.apache.hadoop.yarn.webapp.View: Failed to 
> read the AM container of the application attempt 
> appattempt_1422467063659_0007_01.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainer(ApplicationHistoryManagerOnTimelineStore.java:162)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getAMContainer(ApplicationHistoryManagerOnTimelineStore.java:184)
>   at 
> org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:160)
>   at 
> org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:157)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:156)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
>   at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
>   at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
>   at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
>   at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSController.app(AHSController.java:38)
>   at sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 

[jira] [Updated] (YARN-3426) Add jdiff support to YARN

2015-04-06 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3426:

Attachment: YARN-3426-040615-1.patch

renamed one property for dev-support directory location. 

> Add jdiff support to YARN
> -
>
> Key: YARN-3426
> URL: https://issues.apache.org/jira/browse/YARN-3426
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch
>
>
> Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs 
> to YARN as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2901) Add errors and warning metrics page to RM, NM web UI

2015-04-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2901:
-
Attachment: YARN-2901.addendem.1.patch

Thanks for reporting, [~gtCarrera9]. This is false alarm from findbugs. Fields 
of org.apache.hadoop.yarn.util.Log4jWarningErrorMetricsAppender$Element are 
used by ErrosAndWarningsBlock. Simply exclude such warnings.

Uploaded addendum patch and pending Jenkins.

> Add errors and warning metrics page to RM, NM web UI
> 
>
> Key: YARN-2901
> URL: https://issues.apache.org/jira/browse/YARN-2901
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.8.0
>
> Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
> Shot 2015-03-19 at 7.40.02 PM.png, YARN-2901.addendem.1.patch, 
> apache-yarn-2901.0.patch, apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, 
> apache-yarn-2901.3.patch, apache-yarn-2901.4.patch, apache-yarn-2901.5.patch
>
>
> It would be really useful to have statistics on the number of errors and 
> warnings in the RM and NM web UI. I'm thinking about -
> 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
> 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
> hours/day
> By errors and warnings I'm referring to the log level.
> I suspect we can probably achieve this by writing a custom appender?(I'm open 
> to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2901) Add errors and warning metrics page to RM, NM web UI

2015-04-06 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482218#comment-14482218
 ] 

Li Lu commented on YARN-2901:
-

+1 pending Jenkins. 

> Add errors and warning metrics page to RM, NM web UI
> 
>
> Key: YARN-2901
> URL: https://issues.apache.org/jira/browse/YARN-2901
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.8.0
>
> Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
> Shot 2015-03-19 at 7.40.02 PM.png, YARN-2901.addendem.1.patch, 
> apache-yarn-2901.0.patch, apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, 
> apache-yarn-2901.3.patch, apache-yarn-2901.4.patch, apache-yarn-2901.5.patch
>
>
> It would be really useful to have statistics on the number of errors and 
> warnings in the RM and NM web UI. I'm thinking about -
> 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
> 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
> hours/day
> By errors and warnings I'm referring to the log level.
> I suspect we can probably achieve this by writing a custom appender?(I'm open 
> to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat

2015-04-06 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482248#comment-14482248
 ] 

Junping Du commented on YARN-1376:
--

Hi [~xgong], thanks for the patch! 
Some major comments after going through the patch here:
- Shall we put LogAggregationStatus and LogAggregationReport (and related pb 
impl) on server-api instead of yarn api? We will not expose it to application, 
so better to put in server side.
- I didn't see where we remove element from logAggregationReportForApps. I 
think we need to remove it when log aggregation finished or it will still 
occupy (and may eat up gradually) NM's memory.


> NM need to notify the log aggregation status to RM through Node heartbeat
> -
>
> Key: YARN-1376
> URL: https://issues.apache.org/jira/browse/YARN-1376
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-1376.1.patch, YARN-1376.2.patch, YARN-1376.2.patch, 
> YARN-1376.2015-04-04.patch, YARN-1376.2015-04-06.patch, YARN-1376.3.patch, 
> YARN-1376.4.patch
>
>
> Expose a client API to allow clients to figure if log aggregation is 
> complete. The ticket is used to track the changes on NM side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.

2015-04-06 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482266#comment-14482266
 ] 

Li Lu commented on YARN-3431:
-

Hi [~zjshen], thanks for working on this! I reviewed your v2 patch, and the 
code LGTM. However, I'm a little bit confused about the big picture of this 
patch. In this patch you're setting up separate REST endpoints to post 
different types of timeline entities. However, all different REST endpoints 
have exactly the same internal logic, redirecting the incoming entity to the 
collector's putEntity. Are those endpoints just placeholders so that we can 
specialize each of them? Or else, I'm not sure about the motivation behind this 
(currently no description for this JIRA...). Could you please elaborate a 
little bit more on this?

BTW, I agree we need to specialize for different types of timeline entities, 
but maybe we need to do this on the collector/storage side? For storage layer 
design we need to write down the detailed timeline entities so specialization 
would be helpful. 

> Sub resources of timeline entity needs to be passed to a separate endpoint.
> ---
>
> Key: YARN-3431
> URL: https://issues.apache.org/jira/browse/YARN-3431
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3431.1.patch, YARN-3431.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3426) Add jdiff support to YARN

2015-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482281#comment-14482281
 ] 

Hadoop QA commented on YARN-3426:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12723488/YARN-3426-040615-1.patch
  against trunk revision 3fb5abf.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 4 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common:

  
org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7229//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7229//artifact/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7229//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7229//console

This message is automatically generated.

> Add jdiff support to YARN
> -
>
> Key: YARN-3426
> URL: https://issues.apache.org/jira/browse/YARN-3426
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch
>
>
> Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs 
> to YARN as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-04-06 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482298#comment-14482298
 ] 

Junping Du commented on YARN-3225:
--

Thanks [~devaraj.k] for the patch! The latest patch looks good in overall. 
Some minor comments:
{code}
+  private long validateTimeout(String strTimeout) {
+long timeout;
+try {
+  timeout = Long.parseLong(strTimeout);
+} catch (NumberFormatException ex) {
+  throw new IllegalArgumentException(INVALID_TIMEOUT_ERR_MSG + strTimeout);
+}
+if (timeout < 0) {
+  throw new IllegalArgumentException(INVALID_TIMEOUT_ERR_MSG + timeout);
+}
+return timeout;
+  }
{code}
I think we should support a case that Admin want node get decommissioned 
whenever all apps on these node get finished. If so, shall we support nigative 
value (anyone or some special one, like: -1) to specify this case?

javadoc in DecommissionType.java
{code}
+  /** Decomissioning nodes **/
+  NORMAL,
+
+  /** Graceful decommissioning of nodes **/
+  GRACEFUL,
+
+  /** Forceful decommissioning of nodes **/
+  FORCEFUL
{code}
For NORMAL, shall we use "Decommission nodes in normal (old) way" instead or 
something simpler- "Decommission nodes"?

{code}
+@Private
+@Unstable
+public abstract class CheckForDecommissioningNodesRequest {
+  @Public
+  @Unstable
+  public static CheckForDecommissioningNodesRequest newInstance() {
+CheckForDecommissioningNodesRequest request = Records
+.newRecord(CheckForDecommissioningNodesRequest.class);
+return request;
+  }
+}
{code}
IMO, the methods inside a class should't be more public than class itself? If 
we don't expect other projects to use class, we alwasy don't expect some 
methods get used. The same problem happen in an old API 
RefreshNodeRequest.java. I think we may need to fix both?

{code}
   @Test
   public void testRefreshNodes() throws Exception {
 resourceManager.getClientRMService();
-RefreshNodesRequest request = recordFactory
-.newRecordInstance(RefreshNodesRequest.class);
+RefreshNodesRequest request = RefreshNodesRequest
+.newInstance(DecommissionType.NORMAL);
 RefreshNodesResponse response = client.refreshNodes(request);
 assertNotNull(response);
   }
{code}
Why do we need this change? 
recordFactory.newRecordInstance(RefreshNodesRequest.class) will return 
something with DecommissionType.NORMAL as default. No?

> New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
> ---
>
> Key: YARN-3225
> URL: https://issues.apache.org/jira/browse/YARN-3225
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Devaraj K
> Attachments: YARN-3225-1.patch, YARN-3225-2.patch, YARN-3225-3.patch, 
> YARN-3225.patch, YARN-914.patch
>
>
> New CLI (or existing CLI with parameters) should put each node on 
> decommission list to decommissioning status and track timeout to terminate 
> the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2901) Add errors and warning metrics page to RM, NM web UI

2015-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482303#comment-14482303
 ] 

Hadoop QA commented on YARN-2901:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12723490/YARN-2901.addendem.1.patch
  against trunk revision 3fb5abf.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7230//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7230//console

This message is automatically generated.

> Add errors and warning metrics page to RM, NM web UI
> 
>
> Key: YARN-2901
> URL: https://issues.apache.org/jira/browse/YARN-2901
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.8.0
>
> Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
> Shot 2015-03-19 at 7.40.02 PM.png, YARN-2901.addendem.1.patch, 
> apache-yarn-2901.0.patch, apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, 
> apache-yarn-2901.3.patch, apache-yarn-2901.4.patch, apache-yarn-2901.5.patch
>
>
> It would be really useful to have statistics on the number of errors and 
> warnings in the RM and NM web UI. I'm thinking about -
> 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
> 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
> hours/day
> By errors and warnings I'm referring to the log level.
> I suspect we can probably achieve this by writing a custom appender?(I'm open 
> to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.

2015-04-06 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482328#comment-14482328
 ] 

Zhijie Shen commented on YARN-3431:
---

bq. Are those endpoints just placeholders so that we can specialize each of 
them? Or else, I'm not sure about the motivation behind this (currently no 
description for this JIRA...). Could you please elaborate a little bit more on 
this?

The problem is that we have TimelineEntity and the all the sub classes of it. 
On the other side, we have a single endpoint, which consume TimelineEntity. 
Therefore, this endpoint will check the incoming request body contains exactly 
TimelineEntity object. The json data which is serialized from sub-class object 
seems not to be treated as an TimelineEntity object, and won't be deserialized 
into the corresponding Sub-class object.

I tried to figure out if JAX-RS has the general approach, but didn't have the 
answer (please let me know if anyone has the idea). Alternatively, I choose 
treat the predefined sub classes as the sub resources, and put them on separate 
endpoints. Once deserialized at the server side, java can identify 
TimelineEntity objects' classes and then treat them accordingly. So we don't 
need separate Java APIs in the collector.

> Sub resources of timeline entity needs to be passed to a separate endpoint.
> ---
>
> Key: YARN-3431
> URL: https://issues.apache.org/jira/browse/YARN-3431
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3431.1.patch, YARN-3431.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3454) RLESparseResourceAllocation does not handle removal of partial intervals

2015-04-06 Thread Carlo Curino (JIRA)
Carlo Curino created YARN-3454:
--

 Summary: RLESparseResourceAllocation does not handle removal of 
partial intervals
 Key: YARN-3454
 URL: https://issues.apache.org/jira/browse/YARN-3454
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Carlo Curino


The RLESparseResourceAllocation.removeInterval(...) method handles well exact 
match interval removals, but does not handles correctly partial overlaps. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3454) RLESparseResourceAllocation does not handle removal of partial intervals

2015-04-06 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482370#comment-14482370
 ] 

Carlo Curino commented on YARN-3454:


Adding a capacity reservation (e.g., 10 containers) between time 10 and 20, and 
then removing the same containers from an interval between 12-18 does not work 
correctly. Only exact interval match work. This is normally not exercised in 
the Reservation sub-system, but for further enhancements we are working on this 
is needed. More generally this is a bug we should get rid off.  

> RLESparseResourceAllocation does not handle removal of partial intervals
> 
>
> Key: YARN-3454
> URL: https://issues.apache.org/jira/browse/YARN-3454
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Carlo Curino
>
> The RLESparseResourceAllocation.removeInterval(...) method handles well exact 
> match interval removals, but does not handles correctly partial overlaps. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so

2015-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482401#comment-14482401
 ] 

Hadoop QA commented on YARN-2890:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12723428/YARN-2890.4.patch
  against trunk revision 28bebc8.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

  org.apache.hadoop.mapred.pipes.TestPipeApplication
  org.apache.hadoop.mapred.TestMRTimelineEventHandling
  org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler
  org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService
  org.apache.hadoop.mapred.TestClusterMRNotification

  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

org.apache.hadoop.mapred.TestJobCleanup
org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers
org.apache.hadoop.mapred.TestLazyOutput
org.apache.hadoop.mapred.TestMiniMRChildTask
org.apache.hadoop.mapreduce.v2.TestMRJobs
org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution
org.apache.hadoop.mapreduce.v2.TestUberAM
org.apache.hadoop.mapreduce.TestMRJobClient
org.apache.hadoop.mapreduce.TestMapReduceLazyOutput
org.apache.hadoop.mapreduce.TestLargeSort

  The test build failed in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7226//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7226//console

This message is automatically generated.

> MiniMRYarnCluster should turn on timeline service if configured to do so
> 
>
> Key: YARN-2890
> URL: https://issues.apache.org/jira/browse/YARN-2890
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, 
> YARN-2890.4.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
> YARN-2890.patch, YARN-2890.patch
>
>
> Currently the MiniMRYarnCluster does not consider the configuration value for 
> enabling timeline service before starting. The MiniYarnCluster should only 
> start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2618) Avoid over-allocation of disk resources

2015-04-06 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2618:
--
Attachment: YARN-2618-7.patch

Fix the testing errors.

> Avoid over-allocation of disk resources
> ---
>
> Key: YARN-2618
> URL: https://issues.apache.org/jira/browse/YARN-2618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2618-1.patch, YARN-2618-2.patch, YARN-2618-3.patch, 
> YARN-2618-4.patch, YARN-2618-5.patch, YARN-2618-6.patch, YARN-2618-7.patch
>
>
> Subtask of YARN-2139. 
> This should include
> - Add API support for introducing disk I/O as the 3rd type resource.
> - NM should report this information to the RM
> - RM should consider this to avoid over-allocation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2618) Avoid over-allocation of disk resources

2015-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482472#comment-14482472
 ] 

Hadoop QA commented on YARN-2618:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12723515/YARN-2618-7.patch
  against trunk revision 3fb5abf.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 22 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7231//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7231//console

This message is automatically generated.

> Avoid over-allocation of disk resources
> ---
>
> Key: YARN-2618
> URL: https://issues.apache.org/jira/browse/YARN-2618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2618-1.patch, YARN-2618-2.patch, YARN-2618-3.patch, 
> YARN-2618-4.patch, YARN-2618-5.patch, YARN-2618-6.patch, YARN-2618-7.patch
>
>
> Subtask of YARN-2139. 
> This should include
> - Add API support for introducing disk I/O as the 3rd type resource.
> - NM should report this information to the RM
> - RM should consider this to avoid over-allocation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so

2015-04-06 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482488#comment-14482488
 ] 

Mit Desai commented on YARN-2890:
-

These test failures are not related to the patch.
These were also seen in MAPREDUCE-6293 which was not due to the patch.

> MiniMRYarnCluster should turn on timeline service if configured to do so
> 
>
> Key: YARN-2890
> URL: https://issues.apache.org/jira/browse/YARN-2890
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, 
> YARN-2890.4.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
> YARN-2890.patch, YARN-2890.patch
>
>
> Currently the MiniMRYarnCluster does not consider the configuration value for 
> enabling timeline service before starting. The MiniYarnCluster should only 
> start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode

2015-04-06 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482505#comment-14482505
 ] 

Harsh J commented on YARN-2424:
---

[~sidharta-s] - Yes, it appears the warning was skipped in the branch-2 patch, 
likely by accident. Thanks for spotting this!

Could you file a new YARN JIRA to port the warning back into branch-2?

> LCE should support non-cgroups, non-secure mode
> ---
>
> Key: YARN-2424
> URL: https://issues.apache.org/jira/browse/YARN-2424
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: Y2424-1.patch, YARN-2424.patch
>
>
> After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
> This is a fairly serious regression, as turning on LCE prior to turning on 
> full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3455) Document CGroup support

2015-04-06 Thread Rohith (JIRA)
Rohith created YARN-3455:


 Summary: Document CGroup support 
 Key: YARN-3455
 URL: https://issues.apache.org/jira/browse/YARN-3455
 Project: Hadoop YARN
  Issue Type: Task
  Components: documentation
Reporter: Rohith


It would be very useful if CGroup support is documented having sections like 
below
# Introduction
# Configuring CGroups
# Any specific configuration that controls CPU scheduling
# How/when to use CGroups with some use case expanations




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3455) Document CGroup support

2015-04-06 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482564#comment-14482564
 ] 

Rohith commented on YARN-3455:
--

On further checking found that CGroup is documented in YARN-2949. I wil go 
through the document and will close. If any further improvements can be done, 
will add a comment.

> Document CGroup support 
> 
>
> Key: YARN-3455
> URL: https://issues.apache.org/jira/browse/YARN-3455
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: documentation
>Reporter: Rohith
>
> It would be very useful if CGroup support is documented having sections like 
> below
> # Introduction
> # Configuring CGroups
> # Any specific configuration that controls CPU scheduling
> # How/when to use CGroups with some use case expanations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM

2015-04-06 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482591#comment-14482591
 ] 

Varun Vasudev commented on YARN-3443:
-

Minor documentation fixes, everything else looks good.

# In PrivilegedOperationExecutor.java, getPrivilegedOperationExecutionCommand 
is documented as throwing ExitCodeException but it doesn't throw it.
# In CGroupsHandler.java, the documentation for createCGroup is missing 
descriptions for controller and path.


> Create a 'ResourceHandler' subsystem to ease addition of support for new 
> resource types on the NM
> -
>
> Key: YARN-3443
> URL: https://issues.apache.org/jira/browse/YARN-3443
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-3443.001.patch, YARN-3443.002.patch, 
> YARN-3443.003.patch
>
>
> The current cgroups implementation is closely tied to supporting CPU as a 
> resource . We need to separate out CGroups support as well a provide a simple 
> ResourceHandler subsystem that will enable us to add support for new resource 
> types on the NM - e.g Network, Disk etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3404) View the queue name to YARN Application page

2015-04-06 Thread Ryu Kobayashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryu Kobayashi updated YARN-3404:

Attachment: YARN-3404.2.patch

[~jianhe] Okay, I added each links.

> View the queue name to YARN Application page
> 
>
> Key: YARN-3404
> URL: https://issues.apache.org/jira/browse/YARN-3404
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Minor
> Attachments: YARN-3404.1.patch, YARN-3404.2.patch, screenshot.png
>
>
> It want to display the name of the queue that is used to YARN Application 
> page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-06 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-3021:

Attachment: YARN-3021.007.patch

> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
>Assignee: Yongjun Zhang
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
> YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp

2015-04-06 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482635#comment-14482635
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~jianhe],

I uploaded rev 007 to address your latest comment. I agree that the token 
renewer won't be empty in that case, and if we need to modify the definition of 
{{skipTokenRenewal}} in the future, we can add back the check at that time. 

Would you please take a look?

Thanks.

> YARN's delegation-token handling disallows certain trust setups to operate 
> properly over DistCp
> ---
>
> Key: YARN-3021
> URL: https://issues.apache.org/jira/browse/YARN-3021
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Harsh J
>Assignee: Yongjun Zhang
> Attachments: YARN-3021.001.patch, YARN-3021.002.patch, 
> YARN-3021.003.patch, YARN-3021.004.patch, YARN-3021.005.patch, 
> YARN-3021.006.patch, YARN-3021.007.patch, YARN-3021.patch
>
>
> Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
> and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
> clusters.
> Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
> needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
> as it attempts a renewDelegationToken(…) synchronously during application 
> submission (to validate the managed token before it adds it to a scheduler 
> for automatic renewal). The call obviously fails cause B realm will not trust 
> A's credentials (here, the RM's principal is the renewer).
> In the 1.x JobTracker the same call is present, but it is done asynchronously 
> and once the renewal attempt failed we simply ceased to schedule any further 
> attempts of renewals, rather than fail the job immediately.
> We should change the logic such that we attempt the renewal but go easy on 
> the failure and skip the scheduling alone, rather than bubble back an error 
> to the client, failing the app submission. This way the old behaviour is 
> retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM

2015-04-06 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-3443:

Attachment: YARN-3443.004.patch

Patch with documentation fixes.

> Create a 'ResourceHandler' subsystem to ease addition of support for new 
> resource types on the NM
> -
>
> Key: YARN-3443
> URL: https://issues.apache.org/jira/browse/YARN-3443
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-3443.001.patch, YARN-3443.002.patch, 
> YARN-3443.003.patch, YARN-3443.004.patch
>
>
> The current cgroups implementation is closely tied to supporting CPU as a 
> resource . We need to separate out CGroups support as well a provide a simple 
> ResourceHandler subsystem that will enable us to add support for new resource 
> types on the NM - e.g Network, Disk etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers

2015-04-06 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-3366:

Attachment: YARN-3366.002.patch

Uploading a patch that includes changes to YarnConfiguration.java

> Outbound network bandwidth : classify/shape traffic originating from YARN 
> containers
> 
>
> Key: YARN-3366
> URL: https://issues.apache.org/jira/browse/YARN-3366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-3366.001.patch, YARN-3366.002.patch
>
>
> In order to be able to isolate based on/enforce outbound traffic bandwidth 
> limits, we need  a mechanism to classify/shape network traffic in the 
> nodemanager. For more information on the design, please see the attached 
> design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM

2015-04-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482688#comment-14482688
 ] 

Hadoop QA commented on YARN-3443:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12723549/YARN-3443.004.patch
  against trunk revision 3fb5abf.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7234//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7234//console

This message is automatically generated.

> Create a 'ResourceHandler' subsystem to ease addition of support for new 
> resource types on the NM
> -
>
> Key: YARN-3443
> URL: https://issues.apache.org/jira/browse/YARN-3443
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-3443.001.patch, YARN-3443.002.patch, 
> YARN-3443.003.patch, YARN-3443.004.patch
>
>
> The current cgroups implementation is closely tied to supporting CPU as a 
> resource . We need to separate out CGroups support as well a provide a simple 
> ResourceHandler subsystem that will enable us to add support for new resource 
> types on the NM - e.g Network, Disk etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)