[jira] [Assigned] (YARN-5355) YARN Timeline Service v.2: alpha 2

2016-07-13 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C reassigned YARN-5355:


Assignee: Vrushali C  (was: Sangjin Lee)

> YARN Timeline Service v.2: alpha 2
> --
>
> Key: YARN-5355
> URL: https://issues.apache.org/jira/browse/YARN-5355
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: Timeline Service v2_ Ideas for Next Steps.pdf
>
>
> This is an umbrella JIRA for the alpha 2 milestone for YARN Timeline Service 
> v.2.
> This is developed on feature branches: {{YARN-5355}} for the trunk-based 
> development and {{YARN-5355-branch-2}} to maintain backports to branch-2. Any 
> subtask work on this JIRA will be committed to those 2 branches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5287) LinuxContainerExecutor fails to set proper permission

2016-07-13 Thread Ying Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Zhang updated YARN-5287:
-
Attachment: YARN-5287.002.patch

> LinuxContainerExecutor fails to set proper permission
> -
>
> Key: YARN-5287
> URL: https://issues.apache.org/jira/browse/YARN-5287
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.2
>Reporter: Ying Zhang
>Assignee: Ying Zhang
>Priority: Minor
> Attachments: YARN-5287-naga.patch, YARN-5287.001.patch, 
> YARN-5287.002.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> LinuxContainerExecutor fails to set the proper permissions on the local 
> directories(i.e., /hadoop/yarn/local/usercache/... by default) if the cluster 
> has been configured with a restrictive umask, e.g.: umask 077. Job failed due 
> to the following reason:
> Path /hadoop/yarn/local/usercache/ambari-qa/appcache/application_ has 
> permission 700 but needs permission 750



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2016-07-13 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376314#comment-15376314
 ] 

Yufei Gu commented on YARN-4212:


Thanks [~kasha]. 

> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4464) default value of yarn.resourcemanager.state-store.max-completed-applications should lower.

2016-07-13 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376295#comment-15376295
 ] 

Naganarasimha G R commented on YARN-4464:
-

Thanks [~vinodkv], it looks ideal to have default value as zero but not sure 
all production cluster will adopt ATS immediately, in that per se i thought of 
having around last 500 ~ 1000 completed apps in RM. 
If all are ok with no completed apps in RM Memory as default then i am fine 
with it, its like -0 from my side. And i am ok with no change in Hadoop 2.x.


> default value of yarn.resourcemanager.state-store.max-completed-applications 
> should lower.
> --
>
> Key: YARN-4464
> URL: https://issues.apache.org/jira/browse/YARN-4464
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: resourcemanager
>Reporter: KWON BYUNGCHANG
>Assignee: Daniel Templeton
>Priority: Blocker
> Attachments: YARN-4464.001.patch, YARN-4464.002.patch, 
> YARN-4464.003.patch, YARN-4464.004.patch
>
>
> my cluster has 120 nodes.
> I configured RM Restart feature.
> {code}
> yarn.resourcemanager.recovery.enabled=true
> yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
> yarn.resourcemanager.fs.state-store.uri=/system/yarn/rmstore
> {code}
> unfortunately I did not configure 
> {{yarn.resourcemanager.state-store.max-completed-applications}}.
> so that property configured default value 10,000.
> I have restarted RM due to changing another configuartion.
> I expected that RM restart immediately.
> recovery process was very slow.  I have waited about 20min.  
> realize missing 
> {{yarn.resourcemanager.state-store.max-completed-applications}}.
> its default value is very huge.  
> need to change lower value or document notice on [RM Restart 
> page|http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5272) Handle queue names consistently in FairScheduler

2016-07-13 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376276#comment-15376276
 ] 

Ray Chiang commented on YARN-5272:
--

[~wilfreds], let me know if you'd prefer to abstract out the whitespace 
trimming in a follow up JIRA or if you plan to do it for this patch.

> Handle queue names consistently in FairScheduler
> 
>
> Key: YARN-5272
> URL: https://issues.apache.org/jira/browse/YARN-5272
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-5272.1.patch, YARN-5272.3.patch, YARN-5272.4.patch
>
>
> The fix used in YARN-3214 uses a the JDK trim() method to remove leading and 
> trailing spaces. The QueueMetrics uses a guava based trim when it splits the 
> queues.
> The guava based trim uses the unicode definition of a white space which is 
> different than the java trim as can be seen 
> [here|https://docs.google.com/a/cloudera.com/spreadsheets/d/1kq4ECwPjHX9B8QUCTPclgsDCXYaj7T-FlT4tB5q3ahk/pub]
> A queue name with a non-breaking white space will thus still cause the same 
> "Metrics source XXX already exists!" MetricsException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

2016-07-13 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376265#comment-15376265
 ] 

Robert Kanter commented on YARN-4676:
-

Thanks for pointing that out [~mingma].  Using the same file format makes sense 
to me.  Would it make sense to move some of that code (i.e. parsing, etc) to 
Common so that we can use the same implementation in HDFS and YARN?

[~danzhi], [~djp], what do you think?

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> 
>
> Key: YARN-4676
> URL: https://issues.apache.org/jira/browse/YARN-4676
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Zhi
>Assignee: Daniel Zhi
>  Labels: features
> Attachments: GracefulDecommissionYarnNode.pdf, 
> GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, 
> YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, 
> YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, 
> YARN-4676.012.patch, YARN-4676.013.patch, YARN-4676.014.patch, 
> YARN-4676.015.patch, YARN-4676.016.patch
>
>
> YARN-4676 implements an automatic, asynchronous and flexible mechanism to 
> graceful decommission
> YARN nodes. After user issues the refreshNodes request, ResourceManager 
> automatically evaluates
> status of all affected nodes to kicks out decommission or recommission 
> actions. RM asynchronously
> tracks container and application status related to DECOMMISSIONING nodes to 
> decommission the
> nodes immediately after there are ready to be decommissioned. Decommissioning 
> timeout at individual
> nodes granularity is supported and could be dynamically updated. The 
> mechanism naturally supports multiple
> independent graceful decommissioning “sessions” where each one involves 
> different sets of nodes with
> different timeout settings. Such support is ideal and necessary for graceful 
> decommission request issued
> by external cluster management software instead of human.
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5376) capacity scheduler crashed while processing APP_ATTEMPT_REMOVED

2016-07-13 Thread sandflee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sandflee updated YARN-5376:
---
Issue Type: Bug  (was: Improvement)

> capacity scheduler crashed while processing APP_ATTEMPT_REMOVED
> ---
>
> Key: YARN-5376
> URL: https://issues.apache.org/jira/browse/YARN-5376
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
> Attachments: capacity-crash.log
>
>
> we are testing capacity schedule with a sls like client, see following error, 
> seems shedulerNode is removed.
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1606)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:1416)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:903)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1265)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:677)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5156) YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state

2016-07-13 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376260#comment-15376260
 ] 

Varun Saxena commented on YARN-5156:


I am fine with removing it. We can anyways interpret what the container state 
will be from the event. It can either be RUNNING or COMPLETE.  And its COMPLETE 
only on container finished event.

> YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state
> -
>
> Key: YARN-5156
> URL: https://issues.apache.org/jira/browse/YARN-5156
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Vrushali C
>  Labels: YARN-5355
> Attachments: YARN-5156-YARN-2928.01.patch, 
> YARN-5156-YARN-5355.01.patch
>
>
> On container finished, we're reporting "YARN_CONTAINER_STATE: "RUNNING"". Do 
> we design this deliberately or it's a bug? 
> {code}
> {
> metrics: [ ],
> events: [
> {
> id: "YARN_CONTAINER_FINISHED",
> timestamp: 1464213765890,
> info: {
> YARN_CONTAINER_EXIT_STATUS: 0,
> YARN_CONTAINER_STATE: "RUNNING",
> YARN_CONTAINER_DIAGNOSTICS_INFO: ""
> }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED",
> timestamp: 1464213761133,
> info: { }
> },
> {
> id: "YARN_CONTAINER_CREATED",
> timestamp: 1464213761132,
> info: { }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED",
> timestamp: 1464213761132,
> info: { }
> }
> ],
> id: "container_e15_1464213707405_0001_01_18",
> type: "YARN_CONTAINER",
> createdtime: 1464213761132,
> info: {
> YARN_CONTAINER_ALLOCATED_PRIORITY: "20",
> YARN_CONTAINER_ALLOCATED_VCORE: 1,
> YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS: "10.22.16.164:0",
> UID: 
> "yarn_cluster!application_1464213707405_0001!YARN_CONTAINER!container_e15_1464213707405_0001_01_18",
> YARN_CONTAINER_ALLOCATED_HOST: "10.22.16.164",
> YARN_CONTAINER_ALLOCATED_MEMORY: 1024,
> SYSTEM_INFO_PARENT_ENTITY: {
> type: "YARN_APPLICATION_ATTEMPT",
> id: "appattempt_1464213707405_0001_01"
> },
> YARN_CONTAINER_ALLOCATED_PORT: 64694
> },
> configs: { },
> isrelatedto: { },
> relatesto: { }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5156) YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state

2016-07-13 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376240#comment-15376240
 ] 

Vrushali C commented on YARN-5156:
--

Thanks [~varun_saxena]! 

bq. We are not. Container state is only published only in Finished event. Maybe 
we can either include it everywhere or not have it anywhere.
I see, then I think we should just remove it (as part of this jira fix). What 
do you think ?

> YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state
> -
>
> Key: YARN-5156
> URL: https://issues.apache.org/jira/browse/YARN-5156
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Vrushali C
>  Labels: YARN-5355
> Attachments: YARN-5156-YARN-2928.01.patch, 
> YARN-5156-YARN-5355.01.patch
>
>
> On container finished, we're reporting "YARN_CONTAINER_STATE: "RUNNING"". Do 
> we design this deliberately or it's a bug? 
> {code}
> {
> metrics: [ ],
> events: [
> {
> id: "YARN_CONTAINER_FINISHED",
> timestamp: 1464213765890,
> info: {
> YARN_CONTAINER_EXIT_STATUS: 0,
> YARN_CONTAINER_STATE: "RUNNING",
> YARN_CONTAINER_DIAGNOSTICS_INFO: ""
> }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED",
> timestamp: 1464213761133,
> info: { }
> },
> {
> id: "YARN_CONTAINER_CREATED",
> timestamp: 1464213761132,
> info: { }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED",
> timestamp: 1464213761132,
> info: { }
> }
> ],
> id: "container_e15_1464213707405_0001_01_18",
> type: "YARN_CONTAINER",
> createdtime: 1464213761132,
> info: {
> YARN_CONTAINER_ALLOCATED_PRIORITY: "20",
> YARN_CONTAINER_ALLOCATED_VCORE: 1,
> YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS: "10.22.16.164:0",
> UID: 
> "yarn_cluster!application_1464213707405_0001!YARN_CONTAINER!container_e15_1464213707405_0001_01_18",
> YARN_CONTAINER_ALLOCATED_HOST: "10.22.16.164",
> YARN_CONTAINER_ALLOCATED_MEMORY: 1024,
> SYSTEM_INFO_PARENT_ENTITY: {
> type: "YARN_APPLICATION_ATTEMPT",
> id: "appattempt_1464213707405_0001_01"
> },
> YARN_CONTAINER_ALLOCATED_PORT: 64694
> },
> configs: { },
> isrelatedto: { },
> relatesto: { }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5342) Improve non-exclusive node partition resource allocation in Capacity Scheduler

2016-07-13 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375771#comment-15375771
 ] 

Naganarasimha G R edited comment on YARN-5342 at 7/14/16 3:11 AM:
--

Thanks for the patch [~wangda], 
Given that discussed approach in YARN-4425 (Fallback Policy based ) is going to 
take some time as it would require significant modifications, i would agree to 
go for intermittent modification to optimize the non exclusive mode scheduling.
Only concern i have is if the size of default partition is greater than the non 
exclusive partition then on one allocation in default we are resetting the 
counter, would it be productive ?


was (Author: naganarasimha):
Thanks for the patch [~wangda], 
Given that discussed approach in YARN-4225 (Fallback Policy based ) is going to 
take some time as it would require significant modifications, i would agree to 
go for intermittent modification to optimize the non exclusive mode scheduling.
Only concern i have is if the size of default partition is greater than the non 
exclusive partition then on one allocation in default we are resetting the 
counter, would it be productive ?

> Improve non-exclusive node partition resource allocation in Capacity Scheduler
> --
>
> Key: YARN-5342
> URL: https://issues.apache.org/jira/browse/YARN-5342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-5342.1.patch
>
>
> In the previous implementation, one non-exclusive container allocation is 
> possible when the missed-opportunity >= #cluster-nodes. And 
> missed-opportunity will be reset when container allocated to any node.
> This will slow down the frequency of container allocation on non-exclusive 
> node partition: *When a non-exclusive partition=x has idle resource, we can 
> only allocate one container for this app in every 
> X=nodemanagers.heartbeat-interval secs for the whole cluster.*
> In this JIRA, I propose a fix to reset missed-opporunity only if we have >0 
> pending resource for the non-exclusive partition OR we get allocation from 
> the default partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5156) YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state

2016-07-13 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376239#comment-15376239
 ] 

Varun Saxena commented on YARN-5156:


[~vrushalic], I think the warning log is not required because it will be 
printed everytime. Its because in ContainerImpl the state will not be COMPLETE 
when the event to NMTimelinePublisher is posted.

bq.  I think we should include the container state in the finished event, if we 
are including other container states at other times in other events. 
We are not. Container state is only published only in Finished event. Maybe we 
can either include it everywhere or not have it anywhere.

> YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state
> -
>
> Key: YARN-5156
> URL: https://issues.apache.org/jira/browse/YARN-5156
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Vrushali C
>  Labels: YARN-5355
> Attachments: YARN-5156-YARN-2928.01.patch, 
> YARN-5156-YARN-5355.01.patch
>
>
> On container finished, we're reporting "YARN_CONTAINER_STATE: "RUNNING"". Do 
> we design this deliberately or it's a bug? 
> {code}
> {
> metrics: [ ],
> events: [
> {
> id: "YARN_CONTAINER_FINISHED",
> timestamp: 1464213765890,
> info: {
> YARN_CONTAINER_EXIT_STATUS: 0,
> YARN_CONTAINER_STATE: "RUNNING",
> YARN_CONTAINER_DIAGNOSTICS_INFO: ""
> }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED",
> timestamp: 1464213761133,
> info: { }
> },
> {
> id: "YARN_CONTAINER_CREATED",
> timestamp: 1464213761132,
> info: { }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED",
> timestamp: 1464213761132,
> info: { }
> }
> ],
> id: "container_e15_1464213707405_0001_01_18",
> type: "YARN_CONTAINER",
> createdtime: 1464213761132,
> info: {
> YARN_CONTAINER_ALLOCATED_PRIORITY: "20",
> YARN_CONTAINER_ALLOCATED_VCORE: 1,
> YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS: "10.22.16.164:0",
> UID: 
> "yarn_cluster!application_1464213707405_0001!YARN_CONTAINER!container_e15_1464213707405_0001_01_18",
> YARN_CONTAINER_ALLOCATED_HOST: "10.22.16.164",
> YARN_CONTAINER_ALLOCATED_MEMORY: 1024,
> SYSTEM_INFO_PARENT_ENTITY: {
> type: "YARN_APPLICATION_ATTEMPT",
> id: "appattempt_1464213707405_0001_01"
> },
> YARN_CONTAINER_ALLOCATED_PORT: 64694
> },
> configs: { },
> isrelatedto: { },
> relatesto: { }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5376) capacity scheduler crashed while processing APP_ATTEMPT_REMOVED

2016-07-13 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376233#comment-15376233
 ] 

sandflee commented on YARN-5376:


2.7.2, did not change capacity scheduler code.

> capacity scheduler crashed while processing APP_ATTEMPT_REMOVED
> ---
>
> Key: YARN-5376
> URL: https://issues.apache.org/jira/browse/YARN-5376
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
> Attachments: capacity-crash.log
>
>
> we are testing capacity schedule with a sls like client, see following error, 
> seems shedulerNode is removed.
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1606)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:1416)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:903)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1265)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:677)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5309) SSLFactory truststore reloader thread leak in TimelineClientImpl

2016-07-13 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-5309:
--
Priority: Blocker  (was: Major)

> SSLFactory truststore reloader thread leak in TimelineClientImpl
> 
>
> Key: YARN-5309
> URL: https://issues.apache.org/jira/browse/YARN-5309
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver, yarn
>Affects Versions: 2.7.1
>Reporter: Thomas Friedrich
>Assignee: Weiwei Yang
>Priority: Blocker
> Attachments: YARN-5309.001.patch, YARN-5309.002.patch, 
> YARN-5309.003.patch, YARN-5309.004.patch
>
>
> We found a similar issue as HADOOP-11368 in TimelineClientImpl. The class 
> creates an instance of SSLFactory in newSslConnConfigurator and subsequently 
> creates the ReloadingX509TrustManager instance which in turn starts a trust 
> store reloader thread. 
> However, the SSLFactory is never destroyed and hence the trust store reloader 
> threads are not killed.
> This problem was observed by a customer who had SSL enabled in Hadoop and 
> submitted many queries against the HiveServer2. After a few days, the HS2 
> instance crashed and from the Java dump we could see many (over 13000) 
> threads like this:
> "Truststore reloader thread" #126 daemon prio=5 os_prio=0 
> tid=0x7f680d2e3000 nid=0x98fd waiting on 
> condition [0x7f67e482c000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run
> (ReloadingX509TrustManager.java:225)
> at java.lang.Thread.run(Thread.java:745)
> HiveServer2 uses the JobClient to submit a job:
> Thread [HiveServer2-Background-Pool: Thread-188] (Suspended (breakpoint at 
> line 89 in 
> ReloadingX509TrustManager))   
>   owns: Object  (id=464)  
>   owns: Object  (id=465)  
>   owns: Object  (id=466)  
>   owns: ServiceLoader  (id=210)
>   ReloadingX509TrustManager.(String, String, String, long) line: 89 
>   FileBasedKeyStoresFactory.init(SSLFactory$Mode) line: 209   
>   SSLFactory.init() line: 131 
>   TimelineClientImpl.newSslConnConfigurator(int, Configuration) line: 532 
>   TimelineClientImpl.newConnConfigurator(Configuration) line: 507 
>   TimelineClientImpl.serviceInit(Configuration) line: 269 
>   TimelineClientImpl(AbstractService).init(Configuration) line: 163   
>   YarnClientImpl.serviceInit(Configuration) line: 169 
>   YarnClientImpl(AbstractService).init(Configuration) line: 163   
>   ResourceMgrDelegate.serviceInit(Configuration) line: 102
>   ResourceMgrDelegate(AbstractService).init(Configuration) line: 163  
>   ResourceMgrDelegate.(YarnConfiguration) line: 96  
>   YARNRunner.(Configuration) line: 112  
>   YarnClientProtocolProvider.create(Configuration) line: 34   
>   Cluster.initialize(InetSocketAddress, Configuration) line: 95   
>   Cluster.(InetSocketAddress, Configuration) line: 82   
>   Cluster.(Configuration) line: 75  
>   JobClient.init(JobConf) line: 475   
>   JobClient.(JobConf) line: 454 
>   MapRedTask(ExecDriver).execute(DriverContext) line: 401 
>   MapRedTask.execute(DriverContext) line: 137 
>   MapRedTask(Task).executeTask() line: 160 
>   TaskRunner.runSequential() line: 88 
>   Driver.launchTask(Task, String, boolean, String, int, 
> DriverContext) line: 1653   
>   Driver.execute() line: 1412 
> For every job, a new instance of JobClient/YarnClientImpl/TimelineClientImpl 
> is created. But because the HS2 process stays up for days, the previous trust 
> store reloader threads are still hanging around in the HS2 process and 
> eventually use all the resources available. 
> It seems like a similar fix as HADOOP-11368 is needed in TimelineClientImpl 
> but it doesn't have a destroy method to begin with. 
> One option to avoid this problem is to disable the yarn timeline service 
> (yarn.timeline-service.enabled=false).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5376) capacity scheduler crashed while processing APP_ATTEMPT_REMOVED

2016-07-13 Thread sandflee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sandflee updated YARN-5376:
---
Attachment: capacity-crash.log

> capacity scheduler crashed while processing APP_ATTEMPT_REMOVED
> ---
>
> Key: YARN-5376
> URL: https://issues.apache.org/jira/browse/YARN-5376
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
> Attachments: capacity-crash.log
>
>
> we are testing capacity schedule with a sls like client, see following error, 
> seems shedulerNode is removed.
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1606)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:1416)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:903)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1265)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:677)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5376) capacity scheduler crashed while processing APP_ATTEMPT_REMOVED

2016-07-13 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376222#comment-15376222
 ] 

Sunil G commented on YARN-5376:
---

Hi [~sandflee],  which version of hadoop are you using. 

> capacity scheduler crashed while processing APP_ATTEMPT_REMOVED
> ---
>
> Key: YARN-5376
> URL: https://issues.apache.org/jira/browse/YARN-5376
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
>
> we are testing capacity schedule with a sls like client, see following error, 
> seems shedulerNode is removed.
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1606)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:1416)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:903)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1265)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:677)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5333) Some recovered apps are put into default queue when RM HA

2016-07-13 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376219#comment-15376219
 ] 

Jun Gong commented on YARN-5333:


The reason for test case errors in TestRMWebServicesAppsModification(e.g. 
testAppMove) is that they reinitialize CapacityScheduler with a new 
CapacitySchedulerConfiguration before {{rm.start()}} and it will cause problems 
to reinitialize it two times. However from another point of view, I think 
CapacityScheduler also needs this patch.  [~vinodkv], [~vvasudev] could you 
please help confirm it? Thanks!

> Some recovered apps are put into default queue when RM HA
> -
>
> Key: YARN-5333
> URL: https://issues.apache.org/jira/browse/YARN-5333
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-5333.01.patch, YARN-5333.02.patch
>
>
> Enable RM HA and use FairScheduler, 
> {{yarn.scheduler.fair.allow-undeclared-pools}} is set to false, 
> {{yarn.scheduler.fair.user-as-default-queue}} is set to false.
> Reproduce steps:
> 1. Start two RMs.
> 2. After RMs are running, change both RM's file 
> {{etc/hadoop/fair-scheduler.xml}}, then add some queues.
> 3. Submit some apps to the new added queues.
> 4. Stop the active RM, then the standby RM will transit to active and recover 
> apps.
> However the new active RM will put recovered apps into default queue because 
> it might have not loaded the new {{fair-scheduler.xml}}. We need call 
> {{initScheduler}} before start active services or bring {{refreshAll()}} in 
> front of {{rm.transitionToActive()}}. *It seems it is also important for 
> other scheduler*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5376) capacity scheduler crashed while processing APP_ATTEMPT_REMOVED

2016-07-13 Thread sandflee (JIRA)
sandflee created YARN-5376:
--

 Summary: capacity scheduler crashed while processing 
APP_ATTEMPT_REMOVED
 Key: YARN-5376
 URL: https://issues.apache.org/jira/browse/YARN-5376
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: sandflee


we are testing capacity schedule with a sls like client, see following error, 
seems shedulerNode is removed.
{noformat}
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1606)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:1416)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:903)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1265)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:121)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:677)
at java.lang.Thread.run(Thread.java:745)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5211) Supporting "priorities" in the ReservationSystem

2016-07-13 Thread Sean Po (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Po updated YARN-5211:
--
Issue Type: Improvement  (was: Sub-task)
Parent: (was: YARN-2572)

> Supporting "priorities" in the ReservationSystem
> 
>
> Key: YARN-5211
> URL: https://issues.apache.org/jira/browse/YARN-5211
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Sean Po
>
> The ReservationSystem currently has an implicit FIFO priority. This JIRA 
> tracks effort to generalize this to arbitrary priority. This is non-trivial 
> as the greedy nature of our ReservationAgents might need to be revisited if 
> not enough space if found for late-arriving but higher priority reservations. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5362) TestRMRestart#testFinishedAppRemovalAfterRMRestart can fail

2016-07-13 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376161#comment-15376161
 ] 

sandflee commented on YARN-5362:


thanks [~rohithsharma] for review and commit, open YARN-5375 to track 
implicitly invokes drainEvents in mockRM.

> TestRMRestart#testFinishedAppRemovalAfterRMRestart can fail
> ---
>
> Key: YARN-5362
> URL: https://issues.apache.org/jira/browse/YARN-5362
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jason Lowe
>Assignee: sandflee
> Fix For: 2.9.0
>
> Attachments: YARN-5362.01.patch
>
>
> Saw the following in a precommit build that only changed an unrelated unit 
> test:
> {noformat}
> Tests run: 29, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 101.265 sec 
> <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
> testFinishedAppRemovalAfterRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
>   Time elapsed: 0.411 sec  <<< FAILURE!
> java.lang.AssertionError: expected null, but 
> was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1653)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5375) invoke MockRM#drainEvents implicitly in MockRM methods to reduce test failures

2016-07-13 Thread sandflee (JIRA)
sandflee created YARN-5375:
--

 Summary: invoke MockRM#drainEvents implicitly in MockRM methods to 
reduce test failures
 Key: YARN-5375
 URL: https://issues.apache.org/jira/browse/YARN-5375
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: sandflee
Assignee: sandflee


seen many test failures related to RMApp/RMAppattempt comes to some state but 
some event are not processed in rm event queue or scheduler event queue, cause 
test failure, seems we could implicitly invokes drainEvents(should also drain 
sheduler event) in some mockRM method like waitForState



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5361) Obtaining logs for completed container says 'file belongs to a running container ' at the end

2016-07-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376109#comment-15376109
 ] 

Hadoop QA commented on YARN-5361:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 50s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 26s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 14s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 35s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 1 
new + 4 unchanged - 1 fixed = 5 total (was 5) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 16s 
{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 14s {color} 
| {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 34m 43s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.client.cli.TestLogsCLI |
|   | hadoop.yarn.client.api.impl.TestYarnClient |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12817838/YARN-5361.2.patch |
| JIRA Issue | YARN-5361 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 2fbc753c99e0 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 728bf7f |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/12318/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
 |
| unit | 

[jira] [Commented] (YARN-5156) YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state

2016-07-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376107#comment-15376107
 ] 

Hadoop QA commented on YARN-5156:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 
26s {color} | {color:green} YARN-5355 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} YARN-5355 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} YARN-5355 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s 
{color} | {color:green} YARN-5355 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} YARN-5355 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
46s {color} | {color:green} YARN-5355 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} YARN-5355 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 8 new + 0 unchanged - 0 fixed = 8 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch 5 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 54s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 30m 44s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12817847/YARN-5156-YARN-5355.01.patch
 |
| JIRA Issue | YARN-5156 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 63e565a26a7b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | YARN-5355 / 0fd3980 |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/12319/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/12319/artifact/patchprocess/whitespace-tabs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/12319/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/12319/console |
| Powered by | Apache Yetus 0.3.0   

[jira] [Commented] (YARN-4759) Revisit signalContainer() for docker containers

2016-07-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376098#comment-15376098
 ] 

Hadoop QA commented on YARN-4759:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
1s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 12s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 3 new + 18 unchanged - 0 fixed = 21 total (was 18) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 55s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m 35s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12817837/YARN-4759.003.patch |
| JIRA Issue | YARN-4759 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  cc  |
| uname | Linux 105334caf068 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 728bf7f |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/12317/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/12317/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/12317/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Revisit signalContainer() for docker containers
> 

[jira] [Commented] (YARN-5342) Improve non-exclusive node partition resource allocation in Capacity Scheduler

2016-07-13 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376095#comment-15376095
 ] 

Wangda Tan commented on YARN-5342:
--

[~Naganarasimha], that's a good point, actually I have thought about this while 
working on the patch.
The only purpose of doing this is for simple, we could have some better logic 
like gradually decrease the counter depends on ratio of #nodes in default 
partition and #nodes in specific partitions, but they could be complex and 
potentially can be a regression since we don't know what happened. Please share 
your thoughts. 

Thanks,

> Improve non-exclusive node partition resource allocation in Capacity Scheduler
> --
>
> Key: YARN-5342
> URL: https://issues.apache.org/jira/browse/YARN-5342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-5342.1.patch
>
>
> In the previous implementation, one non-exclusive container allocation is 
> possible when the missed-opportunity >= #cluster-nodes. And 
> missed-opportunity will be reset when container allocated to any node.
> This will slow down the frequency of container allocation on non-exclusive 
> node partition: *When a non-exclusive partition=x has idle resource, we can 
> only allocate one container for this app in every 
> X=nodemanagers.heartbeat-interval secs for the whole cluster.*
> In this JIRA, I propose a fix to reset missed-opporunity only if we have >0 
> pending resource for the non-exclusive partition OR we get allocation from 
> the default partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5159) Wrong Javadoc tag in MiniYarnCluster

2016-07-13 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376089#comment-15376089
 ] 

Akira Ajisaka commented on YARN-5159:
-

bq. I tried that locally before but if I remove the package name the javadoc 
engine will skip it.
Really? o.a.h.yarn.conf.YarnConfiguration is imported in MiniYarnCluster.java, 
so I'm thinking that works. I tried that and succeed the following commands.
{noformat}
$ cd hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests
$ mvn javadoc:test-javadoc
{noformat}

> Wrong Javadoc tag in MiniYarnCluster
> 
>
> Key: YARN-5159
> URL: https://issues.apache.org/jira/browse/YARN-5159
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: documentation
>Affects Versions: 2.6.0
>Reporter: Andras Bokor
>Assignee: Andras Bokor
> Fix For: 2.8.0
>
> Attachments: YARN-5159.01.patch, YARN-5159.02.patch, 
> YARN-5159.03.patch
>
>
> {@YarnConfiguration.RM_SCHEDULER_INCLUDE_PORT_IN_NODE_NAME} is wrong. Should 
> be changed to 
>  {@value YarnConfiguration#RM_SCHEDULER_INCLUDE_PORT_IN_NODE_NAME}
> Edit:
> I noted that due to java 8 javadoc restrictions the javadoc:test-javadoc goal 
> fails on hadoop-yarn-server-tests project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5156) YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state

2016-07-13 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-5156:
-
Attachment: YARN-5156-YARN-5355.01.patch

Uploading patch rebased to new branch YARN-5355 and modifying the code as per 
Varun's points above.

Like I mentioned in an earlier comment, I think we should include the container 
state in the finished event, if we are including other container states at 
other times in other events. This has two purposes:
- ensuring consistency in information within an event
- allowing for easier scanning/filtering in the data when state information is 
present. 

I am still wondering what unit test to write. The patch is simple enough. 


> YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state
> -
>
> Key: YARN-5156
> URL: https://issues.apache.org/jira/browse/YARN-5156
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Vrushali C
>  Labels: YARN-5355
> Attachments: YARN-5156-YARN-2928.01.patch, 
> YARN-5156-YARN-5355.01.patch
>
>
> On container finished, we're reporting "YARN_CONTAINER_STATE: "RUNNING"". Do 
> we design this deliberately or it's a bug? 
> {code}
> {
> metrics: [ ],
> events: [
> {
> id: "YARN_CONTAINER_FINISHED",
> timestamp: 1464213765890,
> info: {
> YARN_CONTAINER_EXIT_STATUS: 0,
> YARN_CONTAINER_STATE: "RUNNING",
> YARN_CONTAINER_DIAGNOSTICS_INFO: ""
> }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED",
> timestamp: 1464213761133,
> info: { }
> },
> {
> id: "YARN_CONTAINER_CREATED",
> timestamp: 1464213761132,
> info: { }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED",
> timestamp: 1464213761132,
> info: { }
> }
> ],
> id: "container_e15_1464213707405_0001_01_18",
> type: "YARN_CONTAINER",
> createdtime: 1464213761132,
> info: {
> YARN_CONTAINER_ALLOCATED_PRIORITY: "20",
> YARN_CONTAINER_ALLOCATED_VCORE: 1,
> YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS: "10.22.16.164:0",
> UID: 
> "yarn_cluster!application_1464213707405_0001!YARN_CONTAINER!container_e15_1464213707405_0001_01_18",
> YARN_CONTAINER_ALLOCATED_HOST: "10.22.16.164",
> YARN_CONTAINER_ALLOCATED_MEMORY: 1024,
> SYSTEM_INFO_PARENT_ENTITY: {
> type: "YARN_APPLICATION_ATTEMPT",
> id: "appattempt_1464213707405_0001_01"
> },
> YARN_CONTAINER_ALLOCATED_PORT: 64694
> },
> configs: { },
> isrelatedto: { },
> relatesto: { }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5156) YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state

2016-07-13 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376057#comment-15376057
 ] 

Vrushali C edited comment on YARN-5156 at 7/14/16 12:25 AM:


Thanks [~varun_saxena] for the review discussion. 

Uploading patch rebased to new branch YARN-5355 and modifying the code as per 
Varun's points above.

Like I mentioned in an earlier comment, I think we should include the container 
state in the finished event, if we are including other container states at 
other times in other events. This has two purposes:
- ensuring consistency in information within an event
- allowing for easier scanning/filtering in the data when state information is 
present. 

I am still wondering what unit test to write. The patch is simple enough. 



was (Author: vrushalic):
Uploading patch rebased to new branch YARN-5355 and modifying the code as per 
Varun's points above.

Like I mentioned in an earlier comment, I think we should include the container 
state in the finished event, if we are including other container states at 
other times in other events. This has two purposes:
- ensuring consistency in information within an event
- allowing for easier scanning/filtering in the data when state information is 
present. 

I am still wondering what unit test to write. The patch is simple enough. 


> YARN_CONTAINER_FINISHED of YARN_CONTAINERs will always have running state
> -
>
> Key: YARN-5156
> URL: https://issues.apache.org/jira/browse/YARN-5156
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Vrushali C
>  Labels: YARN-5355
> Attachments: YARN-5156-YARN-2928.01.patch, 
> YARN-5156-YARN-5355.01.patch
>
>
> On container finished, we're reporting "YARN_CONTAINER_STATE: "RUNNING"". Do 
> we design this deliberately or it's a bug? 
> {code}
> {
> metrics: [ ],
> events: [
> {
> id: "YARN_CONTAINER_FINISHED",
> timestamp: 1464213765890,
> info: {
> YARN_CONTAINER_EXIT_STATUS: 0,
> YARN_CONTAINER_STATE: "RUNNING",
> YARN_CONTAINER_DIAGNOSTICS_INFO: ""
> }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED",
> timestamp: 1464213761133,
> info: { }
> },
> {
> id: "YARN_CONTAINER_CREATED",
> timestamp: 1464213761132,
> info: { }
> },
> {
> id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED",
> timestamp: 1464213761132,
> info: { }
> }
> ],
> id: "container_e15_1464213707405_0001_01_18",
> type: "YARN_CONTAINER",
> createdtime: 1464213761132,
> info: {
> YARN_CONTAINER_ALLOCATED_PRIORITY: "20",
> YARN_CONTAINER_ALLOCATED_VCORE: 1,
> YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS: "10.22.16.164:0",
> UID: 
> "yarn_cluster!application_1464213707405_0001!YARN_CONTAINER!container_e15_1464213707405_0001_01_18",
> YARN_CONTAINER_ALLOCATED_HOST: "10.22.16.164",
> YARN_CONTAINER_ALLOCATED_MEMORY: 1024,
> SYSTEM_INFO_PARENT_ENTITY: {
> type: "YARN_APPLICATION_ATTEMPT",
> id: "appattempt_1464213707405_0001_01"
> },
> YARN_CONTAINER_ALLOCATED_PORT: 64694
> },
> configs: { },
> isrelatedto: { },
> relatesto: { }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5361) Obtaining logs for completed container says 'file belongs to a running container ' at the end

2016-07-13 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-5361:

Attachment: YARN-5361.2.patch

> Obtaining logs for completed container says 'file belongs to a running 
> container ' at the end
> -
>
> Key: YARN-5361
> URL: https://issues.apache.org/jira/browse/YARN-5361
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sumana Sathish
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-5361.1.patch, YARN-5361.2.patch
>
>
> Obtaining logs via yarn CLI for completed container but running application 
> says "This log file belongs to a running container 
> (container_e32_1468319707096_0001_01_04) and so may not be complete" 
> which is not correct.
> {code}
> LogType:stdout
> Log Upload Time:Tue Jul 12 10:38:14 + 2016
> Log Contents:
> End of LogType:stdout. This log file belongs to a running container 
> (container_e32_1468319707096_0001_01_04) and so may not be complete.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4759) Revisit signalContainer() for docker containers

2016-07-13 Thread Shane Kumpf (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-4759:
--
Attachment: YARN-4759.003.patch

> Revisit signalContainer() for docker containers
> ---
>
> Key: YARN-4759
> URL: https://issues.apache.org/jira/browse/YARN-4759
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Shane Kumpf
> Attachments: YARN-4759.001.patch, YARN-4759.002.patch, 
> YARN-4759.003.patch
>
>
> The current signal handling (in the DockerContainerRuntime) needs to be 
> revisited for docker containers. For example, container reacquisition on NM 
> restart might not work, depending on which user the process in the container 
> runs as. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4759) Revisit signalContainer() for docker containers

2016-07-13 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376031#comment-15376031
 ] 

Shane Kumpf commented on YARN-4759:
---

Thanks for the review [~vvasudev]! I will upload a new patch shortly.

> Revisit signalContainer() for docker containers
> ---
>
> Key: YARN-4759
> URL: https://issues.apache.org/jira/browse/YARN-4759
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Shane Kumpf
> Attachments: YARN-4759.001.patch, YARN-4759.002.patch
>
>
> The current signal handling (in the DockerContainerRuntime) needs to be 
> revisited for docker containers. For example, container reacquisition on NM 
> restart might not work, depending on which user the process in the container 
> runs as. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5361) Obtaining logs for completed container says 'file belongs to a running container ' at the end

2016-07-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376027#comment-15376027
 ] 

Hadoop QA commented on YARN-5361:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
58s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 56s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
44s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 45s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 40s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 1 
new + 4 unchanged - 1 fixed = 5 total (was 5) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 23s 
{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 34s {color} 
| {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 36m 13s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.client.cli.TestLogsCLI |
|   | hadoop.yarn.client.api.impl.TestYarnClient |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12817826/YARN-5361.1.patch |
| JIRA Issue | YARN-5361 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 8e0895a44569 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / d180505 |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/12316/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
 |
| unit | 

[jira] [Comment Edited] (YARN-4743) ResourceManager crash because TimSort

2016-07-13 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375989#comment-15375989
 ] 

sandflee edited comment on YARN-4743 at 7/13/16 11:37 PM:
--

I don't think snapshot could resolve this, as in YARN-5371, node is only sorted 
with unused resource. this seems caused by a > b, and b > c, but while sorting 
a and c, a < c. we should snapshot all sorting element and then sort to avoid 
this, or could add -Djava.util.Arrays.useLegacyMergeSort=true to YARN_OPS to 
use mergeSort not TimSort for Collection#sort.


was (Author: sandflee):
I don't think snapshot could resolve this, as in YARN-5371, node is only sorted 
with unused resource. this seems caused by a > b, and b > c, but while sorting 
a and c, a < c. we should snapshot all sorting element and then sort to avoid 
this, or could add -Djava.util.Arrays.useLegacyMergeSort=true to YARN_OPS to 
use mergeSort not TimSort for Collection#sort, I think capacity scheduler have 
similar problem.

> ResourceManager crash because TimSort
> -
>
> Key: YARN-4743
> URL: https://issues.apache.org/jira/browse/YARN-4743
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.4
>Reporter: Zephyr Guo
>Assignee: Yufei Gu
> Attachments: YARN-4743-cdh5.4.7.patch
>
>
> {code}
> 2016-02-26 14:08:50,821 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>  at java.util.TimSort.mergeHi(TimSort.java:868)
>  at java.util.TimSort.mergeAt(TimSort.java:485)
>  at java.util.TimSort.mergeCollapse(TimSort.java:410)
>  at java.util.TimSort.sort(TimSort.java:214)
>  at java.util.TimSort.sort(TimSort.java:173)
>  at java.util.Arrays.sort(Arrays.java:659)
>  at java.util.Collections.sort(Collections.java:217)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>  at java.lang.Thread.run(Thread.java:745)
> 2016-02-26 14:08:50,822 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {code}
> Actually, this issue found in 2.6.0-cdh5.4.7.
> I think the cause is that we modify {{Resouce}} while we are sorting 
> {{runnableApps}}.
> {code:title=FSLeafQueue.java}
> Comparator comparator = policy.getComparator();
> writeLock.lock();
> try {
>   Collections.sort(runnableApps, comparator);
> } finally {
>   writeLock.unlock();
> }
> readLock.lock();
> {code}
> {code:title=FairShareComparator}
> public int compare(Schedulable s1, Schedulable s2) {
> ..
>   s1.getResourceUsage(), minShare1);
>   boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
>   s2.getResourceUsage(), minShare2);
>   minShareRatio1 = (double) s1.getResourceUsage().getMemory()
>   / Resources.max(RESOURCE_CALCULATOR, null, minShare1, 
> ONE).getMemory();
>   minShareRatio2 = (double) s2.getResourceUsage().getMemory()
>   / Resources.max(RESOURCE_CALCULATOR, null, minShare2, 
> ONE).getMemory();
> ..
> {code}
> {{getResourceUsage}} will return current Resource. The current Resource is 
> unstable. 
> {code:title=FSAppAttempt.java}
> @Override
>   public Resource getResourceUsage() {
> // Here the getPreemptedResources() always return zero, except in
> // a preemption round
> return Resources.subtract(getCurrentConsumption(), 
> getPreemptedResources());
>   }
> {code}
> {code:title=SchedulerApplicationAttempt}
>  public Resource getCurrentConsumption() {
> return currentConsumption;
>   }
> // This method may modify current Resource.
> public synchronized void recoverContainer(RMContainer rmContainer) {
> ..
> Resources.addTo(currentConsumption, rmContainer.getContainer()
>   .getResource());
> ..
> 

[jira] [Commented] (YARN-5363) For AM containers, or for containers of running-apps, "yarn logs" incorrectly only (tries to) shows syslog file-type by default

2016-07-13 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375999#comment-15375999
 ] 

Xuan Gong commented on YARN-5363:
-

[~vinodkv] Thanks for the patch.  Overall looks good.

I have one comment:
we did a check for the input log files:
{code}
List logs = new ArrayList();
if (fetchAllLogFiles(logFiles)) {
  logs.add(".*");
} else if (logFiles != null && logFiles.length > 0) {
  logs = Arrays.asList(logFiles);
}
{code}
before we actually ran any commands. I think that we could add the logic here. 
So, we do not need to do it separately inside several different functions.

> For AM containers, or for containers of running-apps, "yarn logs" incorrectly 
> only (tries to) shows syslog file-type by default
> ---
>
> Key: YARN-5363
> URL: https://issues.apache.org/jira/browse/YARN-5363
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: YARN-5363-2016-07-12.txt, YARN-5363-2016-07-13.txt
>
>
> For e.g, for a running application, the following happens:
> {code}
> # yarn logs -applicationId application_1467838922593_0001
> 16/07/06 22:07:05 INFO impl.TimelineClientImpl: Timeline service address: 
> http://:8188/ws/v1/timeline/
> 16/07/06 22:07:06 INFO client.RMProxy: Connecting to ResourceManager at 
> /:8050
> 16/07/06 22:07:07 INFO impl.TimelineClientImpl: Timeline service address: 
> http://l:8188/ws/v1/timeline/
> 16/07/06 22:07:07 INFO client.RMProxy: Connecting to ResourceManager at 
> /:8050
> Can not find any log file matching the pattern: [syslog] for the container: 
> container_e03_1467838922593_0001_01_01 within the application: 
> application_1467838922593_0001
> Can not find any log file matching the pattern: [syslog] for the container: 
> container_e03_1467838922593_0001_01_02 within the application: 
> application_1467838922593_0001
> Can not find any log file matching the pattern: [syslog] for the container: 
> container_e03_1467838922593_0001_01_03 within the application: 
> application_1467838922593_0001
> Can not find any log file matching the pattern: [syslog] for the container: 
> container_e03_1467838922593_0001_01_04 within the application: 
> application_1467838922593_0001
> Can not find any log file matching the pattern: [syslog] for the container: 
> container_e03_1467838922593_0001_01_05 within the application: 
> application_1467838922593_0001
> Can not find any log file matching the pattern: [syslog] for the container: 
> container_e03_1467838922593_0001_01_06 within the application: 
> application_1467838922593_0001
> Can not find any log file matching the pattern: [syslog] for the container: 
> container_e03_1467838922593_0001_01_07 within the application: 
> application_1467838922593_0001
> Can not find the logs for the application: application_1467838922593_0001 
> with the appOwner: 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5373) NPE listing wildcard directory in containerLaunch

2016-07-13 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5373:
-
Summary: NPE listing wildcard directory in containerLaunch  (was: NPE 
introduced by YARN-4958 (The file localization process should allow...))

> NPE listing wildcard directory in containerLaunch
> -
>
> Key: YARN-5373
> URL: https://issues.apache.org/jira/browse/YARN-5373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Critical
>
> YARN-4958 added support for wildcards in file localization. It introduces a 
> NPE 
> at 
> {code:java}
> for (File wildLink : directory.listFiles()) {
> sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
> }
> {code}
> When directory.listFiles returns null (only happens in a secure cluster), NPE 
> will cause the container fail to launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)

2016-07-13 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5373:
-
Priority: Critical  (was: Major)

> NPE introduced by YARN-4958 (The file localization process should allow...)
> ---
>
> Key: YARN-5373
> URL: https://issues.apache.org/jira/browse/YARN-5373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Critical
>
> YARN-4958 added support for wildcards in file localization. It introduces a 
> NPE 
> at 
> {code:java}
> for (File wildLink : directory.listFiles()) {
> sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
> }
> {code}
> When directory.listFiles returns null (only happens in a secure cluster), NPE 
> will cause the container fail to launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4743) ResourceManager crash because TimSort

2016-07-13 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375989#comment-15375989
 ] 

sandflee commented on YARN-4743:


I don't think snapshot could resolve this, as in YARN-5371, node is only sorted 
with unused resource. this seems caused by a > b, and b > c, but while sorting 
a and c, a < c. we should snapshot all sorting element and then sort to avoid 
this, or could add -Djava.util.Arrays.useLegacyMergeSort=true to YARN_OPS to 
use mergeSort not TimSort for Collection#sort, I think capacity scheduler have 
similar problem.

> ResourceManager crash because TimSort
> -
>
> Key: YARN-4743
> URL: https://issues.apache.org/jira/browse/YARN-4743
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.4
>Reporter: Zephyr Guo
>Assignee: Yufei Gu
> Attachments: YARN-4743-cdh5.4.7.patch
>
>
> {code}
> 2016-02-26 14:08:50,821 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>  at java.util.TimSort.mergeHi(TimSort.java:868)
>  at java.util.TimSort.mergeAt(TimSort.java:485)
>  at java.util.TimSort.mergeCollapse(TimSort.java:410)
>  at java.util.TimSort.sort(TimSort.java:214)
>  at java.util.TimSort.sort(TimSort.java:173)
>  at java.util.Arrays.sort(Arrays.java:659)
>  at java.util.Collections.sort(Collections.java:217)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>  at java.lang.Thread.run(Thread.java:745)
> 2016-02-26 14:08:50,822 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {code}
> Actually, this issue found in 2.6.0-cdh5.4.7.
> I think the cause is that we modify {{Resouce}} while we are sorting 
> {{runnableApps}}.
> {code:title=FSLeafQueue.java}
> Comparator comparator = policy.getComparator();
> writeLock.lock();
> try {
>   Collections.sort(runnableApps, comparator);
> } finally {
>   writeLock.unlock();
> }
> readLock.lock();
> {code}
> {code:title=FairShareComparator}
> public int compare(Schedulable s1, Schedulable s2) {
> ..
>   s1.getResourceUsage(), minShare1);
>   boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
>   s2.getResourceUsage(), minShare2);
>   minShareRatio1 = (double) s1.getResourceUsage().getMemory()
>   / Resources.max(RESOURCE_CALCULATOR, null, minShare1, 
> ONE).getMemory();
>   minShareRatio2 = (double) s2.getResourceUsage().getMemory()
>   / Resources.max(RESOURCE_CALCULATOR, null, minShare2, 
> ONE).getMemory();
> ..
> {code}
> {{getResourceUsage}} will return current Resource. The current Resource is 
> unstable. 
> {code:title=FSAppAttempt.java}
> @Override
>   public Resource getResourceUsage() {
> // Here the getPreemptedResources() always return zero, except in
> // a preemption round
> return Resources.subtract(getCurrentConsumption(), 
> getPreemptedResources());
>   }
> {code}
> {code:title=SchedulerApplicationAttempt}
>  public Resource getCurrentConsumption() {
> return currentConsumption;
>   }
> // This method may modify current Resource.
> public synchronized void recoverContainer(RMContainer rmContainer) {
> ..
> Resources.addTo(currentConsumption, rmContainer.getContainer()
>   .getResource());
> ..
>   }
> {code}
> I suggest that use stable Resource in comparator.
> Is there something i think wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)

2016-07-13 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375988#comment-15375988
 ] 

Haibo Chen commented on YARN-5373:
--

As per offline discussion with Daniel, the cause is that in a secure cluster, 
the node manager that executes container launch code runs as a user that has no 
permission to read/execute the local wildcard directory that is downloaded as a 
resource by the remote user. Thus, directory.listFiles() return null.

> NPE introduced by YARN-4958 (The file localization process should allow...)
> ---
>
> Key: YARN-5373
> URL: https://issues.apache.org/jira/browse/YARN-5373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> YARN-4958 added support for wildcards in file localization. It introduces a 
> NPE 
> at 
> {code:java}
> for (File wildLink : directory.listFiles()) {
> sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
> }
> {code}
> When directory.listFiles returns null (only happens in a secure cluster), NPE 
> will cause the container fail to launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3649) Allow configurable prefix for hbase table names (like prod, exp, test etc)

2016-07-13 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375987#comment-15375987
 ] 

Vrushali C commented on YARN-3649:
--

Thanks Joep, yes I need to rebase. Good point about the documentation, will 
include updates to doc as well.


> Allow configurable prefix for hbase table names (like prod, exp, test etc)
> --
>
> Key: YARN-3649
> URL: https://issues.apache.org/jira/browse/YARN-3649
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>  Labels: YARN-5355
> Attachments: YARN-3649-YARN-2928.01.patch
>
>
> As per [~jrottinghuis]'s suggestion in YARN-3411, it will be a good idea to 
> have a configurable prefix for hbase table names.  
> This way we can easily run a staging, a test, a production and whatever setup 
> in the same HBase instance / without having to override every single table in 
> the config.
> One could simply overwrite the default prefix and you're off and running.
> For prefix, potential candidates are "tst" "prod" "exp" etc. Once can then 
> still override one tablename if needed, but managing one whole setup will be 
> easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5164) CapacityOvertimePolicy does not take advantaged of plan RLE

2016-07-13 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375971#comment-15375971
 ] 

Chris Douglas commented on YARN-5164:
-

Only minor nits, otherwise +1:
{{CapacityOverTimePolicy}}
- Avoid importing java.util.\*
- Where the intermediate points are added, the code would be more readable if 
the key were assigned to a named variable (instead of multiple calls to 
{{e.getKey()}}). Same with the point-wise integral computation
- checkstyle (spacing): {{+  if(e.getValue()!=null) {}}
- A comment briefly sketching the algorithm would help future maintainers

{{NoOverCommitPolicy}}
- The exception message should be reformatted (some redundant string concats) 
and omit references to the time it no longer reports
- Should the {{PlanningException}} be added as a cause, rather than 
concatenated with the ReservationID?

> CapacityOvertimePolicy does not take advantaged of plan RLE
> ---
>
> Key: YARN-5164
> URL: https://issues.apache.org/jira/browse/YARN-5164
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-5164-example.pdf, YARN-5164-inclusive.4.patch, 
> YARN-5164-inclusive.5.patch, YARN-5164.1.patch, YARN-5164.2.patch, 
> YARN-5164.5.patch, YARN-5164.6.patch
>
>
> As a consequence small time granularities (e.g., 1 sec) and long time horizon 
> for a reservation (e.g., months) run rather slow (10 sec). 
> Proposed resolution is to switch to interval math in checking, similar to how 
> YARN-4359 does for agents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5361) Obtaining logs for completed container says 'file belongs to a running container ' at the end

2016-07-13 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375956#comment-15375956
 ] 

Xuan Gong commented on YARN-5361:
-

It's not straightforward to add a unit test. I have tested locally.

> Obtaining logs for completed container says 'file belongs to a running 
> container ' at the end
> -
>
> Key: YARN-5361
> URL: https://issues.apache.org/jira/browse/YARN-5361
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sumana Sathish
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-5361.1.patch
>
>
> Obtaining logs via yarn CLI for completed container but running application 
> says "This log file belongs to a running container 
> (container_e32_1468319707096_0001_01_04) and so may not be complete" 
> which is not correct.
> {code}
> LogType:stdout
> Log Upload Time:Tue Jul 12 10:38:14 + 2016
> Log Contents:
> End of LogType:stdout. This log file belongs to a running container 
> (container_e32_1468319707096_0001_01_04) and so may not be complete.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5361) Obtaining logs for completed container says 'file belongs to a running container ' at the end

2016-07-13 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-5361:

Attachment: YARN-5361.1.patch

> Obtaining logs for completed container says 'file belongs to a running 
> container ' at the end
> -
>
> Key: YARN-5361
> URL: https://issues.apache.org/jira/browse/YARN-5361
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sumana Sathish
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-5361.1.patch
>
>
> Obtaining logs via yarn CLI for completed container but running application 
> says "This log file belongs to a running container 
> (container_e32_1468319707096_0001_01_04) and so may not be complete" 
> which is not correct.
> {code}
> LogType:stdout
> Log Upload Time:Tue Jul 12 10:38:14 + 2016
> Log Contents:
> End of LogType:stdout. This log file belongs to a running container 
> (container_e32_1468319707096_0001_01_04) and so may not be complete.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3662) Federation Membership State APIs

2016-07-13 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375947#comment-15375947
 ] 

Wangda Tan commented on YARN-3662:
--

Hi [~subru],

I took a very quick look at this patch and also YARN-3664/YARN-5367, I put all 
questions and comments here:

Questions:
- Could not quite sure about what is FederationPolicy and how to use the class. 
Is it a state or a configuration? And why compressing parameters into a byte 
array instead more meaningful fields?
- It could be better to add RPC service interface definitions of 
FederationPolicy storage API for easier review, now I cannot understand how 
these protocol definitions will be used.

(Highlevel) Comments:
- FederationMembershipState looks like a "state manager" since it supports 
operations to modify existing members. At the first glance, it's a 
sub-cluster-resource-tracker which is similar to existing RM resource tracker.
- Similiarly, FederationApplicationState looks like a 
"federation-application-manager" instead of a "state".
- FederationMembershipState has same parameter FederationSubClusterInfo for 
register/heartbeat -- is it possible that we require different parameter for 
registration and heartbeat? (Just like NM registration request and NM update 
request).
- FederationSubClusterInfo: fields like amRMAddress is actually a service 
endpoint, names of these fields are little confusing to me.

Styles:
- redundunt "public" in all interface definitions (considering switching to 
Intellij instead of Eclipse? :-p)

Thanks,

> Federation Membership State APIs
> 
>
> Key: YARN-3662
> URL: https://issues.apache.org/jira/browse/YARN-3662
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-3662-YARN-2915-v1.1.patch, 
> YARN-3662-YARN-2915-v1.patch, YARN-3662-YARN-2915-v2.patch
>
>
> The Federation Application State encapsulates the information about the 
> active RM of each sub-cluster that is participating in Federation. The 
> information includes addresses for ClientRM, ApplicationMaster and Admin 
> services along with the sub_cluster _capability_ which is currently defined 
> by *ClusterMetricsInfo*. Please refer to the design doc in parent JIRA for 
> further details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5298) Mount usercache and NM filecache directories into Docker container

2016-07-13 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375910#comment-15375910
 ] 

Sidharta Seethana commented on YARN-5298:
-

Thanks, [~vvasudev] and [~templedf] !

[~vvasudev] , about the container specific directories : The docker container 
runtime itself makes no assumptions about the location of the container 
specific directories/non container-specific directories. It does not know of or 
assume a parent/sub-dir structure and explicitly mounts all required 
directories. I hope that answers your question. 

> Mount usercache and NM filecache directories into Docker container
> --
>
> Key: YARN-5298
> URL: https://issues.apache.org/jira/browse/YARN-5298
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Varun Vasudev
>Assignee: Sidharta Seethana
> Attachments: YARN-5298.001.patch, YARN-5298.002.patch
>
>
> Currently, we don't mount the usercache and the NM filecache directories into 
> the Docker container. This can lead to issues with containers that rely on 
> public and application scope resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5181) ClusterNodeTracker: add method to get list of nodes matching a specific resourceName

2016-07-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375904#comment-15375904
 ] 

Hadoop QA commented on YARN-5181:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
52s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
57s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 30s {color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager
 generated 1 new + 2 unchanged - 1 fixed = 3 total (was 3) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 18s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 3 new + 1 unchanged - 0 fixed = 4 total (was 1) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
59s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 18s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager
 generated 2 new + 989 unchanged - 0 fixed = 991 total (was 989) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 33m 9s 
{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 47m 41s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12807049/yarn-5181-1.patch |
| JIRA Issue | YARN-5181 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 46276eaa1b34 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / af8f480 |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/12315/artifact/patchprocess/diff-compile-javac-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/12315/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/12315/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results 

[jira] [Commented] (YARN-5339) passing file to -out for YARN log CLI doesnt give warning or error code

2016-07-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375896#comment-15375896
 ] 

Hudson commented on YARN-5339:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #10093 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10093/])
YARN-5339. Fixed "yarn logs" to fail when a file is passed to -out (vinodkv: 
rev d18050522c5c6bd9e32eb9a1be4ffe2288624c40)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java


> passing file to -out for YARN log CLI doesnt give warning or error code
> ---
>
> Key: YARN-5339
> URL: https://issues.apache.org/jira/browse/YARN-5339
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Sumana Sathish
>Assignee: Xuan Gong
> Fix For: 2.9.0
>
> Attachments: YARN-5339.1.patch, YARN-5339.2.patch
>
>
> passing file to -out for YARN log CLI doesnt give warning or error code
> {code}
> yarn  logs -applicationId application_1467117709224_0003 -out 
> /grid/0/hadoopqe/artifacts/file.txt
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4464) default value of yarn.resourcemanager.state-store.max-completed-applications should lower.

2016-07-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375867#comment-15375867
 ] 

Vinod Kumar Vavilapalli commented on YARN-4464:
---

We need ATS in production - aka ATS V2. With that in the picture, I agree that 
we don't need to keep any completed applications in RM memory at all.

> default value of yarn.resourcemanager.state-store.max-completed-applications 
> should lower.
> --
>
> Key: YARN-4464
> URL: https://issues.apache.org/jira/browse/YARN-4464
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: resourcemanager
>Reporter: KWON BYUNGCHANG
>Assignee: Daniel Templeton
>Priority: Blocker
> Attachments: YARN-4464.001.patch, YARN-4464.002.patch, 
> YARN-4464.003.patch, YARN-4464.004.patch
>
>
> my cluster has 120 nodes.
> I configured RM Restart feature.
> {code}
> yarn.resourcemanager.recovery.enabled=true
> yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
> yarn.resourcemanager.fs.state-store.uri=/system/yarn/rmstore
> {code}
> unfortunately I did not configure 
> {{yarn.resourcemanager.state-store.max-completed-applications}}.
> so that property configured default value 10,000.
> I have restarted RM due to changing another configuartion.
> I expected that RM restart immediately.
> recovery process was very slow.  I have waited about 20min.  
> realize missing 
> {{yarn.resourcemanager.state-store.max-completed-applications}}.
> its default value is very huge.  
> need to change lower value or document notice on [RM Restart 
> page|http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5181) ClusterNodeTracker: add method to get list of nodes matching a specific resourceName

2016-07-13 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375866#comment-15375866
 ] 

Arun Suresh commented on YARN-5181:
---

Thanks for the patch [~kasha]

Some minor comments:
# Remove the unused import
# Maybe rename the getNodes(String) to getNodesWithName(String) so that we 
don't need the cast null to (NodeFilter) in getAllNodes() ?




> ClusterNodeTracker: add method to get list of nodes matching a specific 
> resourceName
> 
>
> Key: YARN-5181
> URL: https://issues.apache.org/jira/browse/YARN-5181
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-5181-1.patch
>
>
> ClusterNodeTracker should have a method to return the list of nodes matching 
> a particular resourceName. This is so we could identify what all nodes a 
> particular ResourceRequest is interested in, which in turn is useful in 
> YARN-5139 (global scheduler) and YARN-4752 (FairScheduler preemption 
> overhaul). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4464) default value of yarn.resourcemanager.state-store.max-completed-applications should lower.

2016-07-13 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375862#comment-15375862
 ] 

Daniel Templeton commented on YARN-4464:


With ATS, I don't see a lot of need to keep 10k completed apps lying about. Not 
only is it a startup burden, but it also is a ZK burden.  We regularly tell 
customers to set it lower because of ZK cache load.  Improving the recovery 
logic is something we should also do, but the best doesn't need to be the enemy 
of the good.  [~vinodkv], [~Naganarasimha], [~kasha], can we come to a 
conclusion?

> default value of yarn.resourcemanager.state-store.max-completed-applications 
> should lower.
> --
>
> Key: YARN-4464
> URL: https://issues.apache.org/jira/browse/YARN-4464
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: resourcemanager
>Reporter: KWON BYUNGCHANG
>Assignee: Daniel Templeton
>Priority: Blocker
> Attachments: YARN-4464.001.patch, YARN-4464.002.patch, 
> YARN-4464.003.patch, YARN-4464.004.patch
>
>
> my cluster has 120 nodes.
> I configured RM Restart feature.
> {code}
> yarn.resourcemanager.recovery.enabled=true
> yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
> yarn.resourcemanager.fs.state-store.uri=/system/yarn/rmstore
> {code}
> unfortunately I did not configure 
> {{yarn.resourcemanager.state-store.max-completed-applications}}.
> so that property configured default value 10,000.
> I have restarted RM due to changing another configuartion.
> I expected that RM restart immediately.
> recovery process was very slow.  I have waited about 20min.  
> realize missing 
> {{yarn.resourcemanager.state-store.max-completed-applications}}.
> its default value is very huge.  
> need to change lower value or document notice on [RM Restart 
> page|http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5339) passing file to -out for YARN log CLI doesnt give warning or error code

2016-07-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375833#comment-15375833
 ] 

Vinod Kumar Vavilapalli commented on YARN-5339:
---

Looks good, +1. Checking this in.

> passing file to -out for YARN log CLI doesnt give warning or error code
> ---
>
> Key: YARN-5339
> URL: https://issues.apache.org/jira/browse/YARN-5339
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Sumana Sathish
>Assignee: Xuan Gong
> Attachments: YARN-5339.1.patch, YARN-5339.2.patch
>
>
> passing file to -out for YARN log CLI doesnt give warning or error code
> {code}
> yarn  logs -applicationId application_1467117709224_0003 -out 
> /grid/0/hadoopqe/artifacts/file.txt
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5371) FairScheduer ContinuousScheduling thread throws Exception

2016-07-13 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved YARN-5371.

Resolution: Duplicate

> FairScheduer ContinuousScheduling thread throws Exception
> -
>
> Key: YARN-5371
> URL: https://issues.apache.org/jira/browse/YARN-5371
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: sandflee
>Assignee: sandflee
>Priority: Critical
>
> {noformat}
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
> at java.util.TimSort.mergeLo(TimSort.java:777)
> at java.util.TimSort.mergeAt(TimSort.java:514)
> at java.util.TimSort.mergeCollapse(TimSort.java:441)
> at java.util.TimSort.sort(TimSort.java:245)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1454)
> at java.util.Collections.sort(Collections.java:175)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1002)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5371) FairScheduer ContinuousScheduling thread throws Exception

2016-07-13 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-5371:
---
Priority: Critical  (was: Major)

> FairScheduer ContinuousScheduling thread throws Exception
> -
>
> Key: YARN-5371
> URL: https://issues.apache.org/jira/browse/YARN-5371
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: sandflee
>Assignee: sandflee
>Priority: Critical
>
> {noformat}
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
> at java.util.TimSort.mergeLo(TimSort.java:777)
> at java.util.TimSort.mergeAt(TimSort.java:514)
> at java.util.TimSort.mergeCollapse(TimSort.java:441)
> at java.util.TimSort.sort(TimSort.java:245)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1454)
> at java.util.Collections.sort(Collections.java:175)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1002)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4767) Network issues can cause persistent RM UI outage

2016-07-13 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375812#comment-15375812
 ] 

Daniel Templeton commented on YARN-4767:


Ping [~xgong], [~vinodkv].  Would love feedback on the approach in this patch.  
Thanks!

> Network issues can cause persistent RM UI outage
> 
>
> Key: YARN-4767
> URL: https://issues.apache.org/jira/browse/YARN-4767
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.7.2
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-4767.001.patch, YARN-4767.002.patch, 
> YARN-4767.003.patch, YARN-4767.004.patch, YARN-4767.005.patch, 
> YARN-4767.006.patch, YARN-4767.007.patch
>
>
> If a network issue causes an AM web app to resolve the RM proxy's address to 
> something other than what's listed in the allowed proxies list, the 
> AmIpFilter will 302 redirect the RM proxy's request back to the RM proxy.  
> The RM proxy will then consume all available handler threads connecting to 
> itself over and over, resulting in an outage of the web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2016-07-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375809#comment-15375809
 ] 

Karthik Kambatla commented on YARN-4212:


I am perfectly open to working on YARN-5264 first. Happy to review. 

> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)

2016-07-13 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5373:
-
Description: 
YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{code:java}
for (File wildLink : directory.listFiles()) {
sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
}
{code}
When directory.listFiles returns null (only happens in a secure cluster), NPE 
will cause the container fail to launch.

  was:
YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{code:java}
for (File wildLink : directory.listFiles()) {
sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
}
{code}
When directory.listFiles returns null, NPE will cause the container fail to 
launch.


> NPE introduced by YARN-4958 (The file localization process should allow...)
> ---
>
> Key: YARN-5373
> URL: https://issues.apache.org/jira/browse/YARN-5373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> YARN-4958 added support for wildcards in file localization. It introduces a 
> NPE 
> at 
> {code:java}
> for (File wildLink : directory.listFiles()) {
> sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
> }
> {code}
> When directory.listFiles returns null (only happens in a secure cluster), NPE 
> will cause the container fail to launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5304) Ship single node HBase config option with single startup command

2016-07-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375807#comment-15375807
 ] 

Karthik Kambatla commented on YARN-5304:


I spoke to [~esteban] about this. In his opinion, the minicluster approach 
(master, RS etc. in a single process) is discouraged. I am assuming the goal is 
to do a pseudo-distributed setup of HBase - Master and RegionServer in 
different processes. 

> Ship single node HBase config option with single startup command
> 
>
> Key: YARN-5304
> URL: https://issues.apache.org/jira/browse/YARN-5304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Joep Rottinghuis
>Assignee: Joep Rottinghuis
>  Labels: YARN-5355
>
> For small to medium Hadoop deployments we should make it dead-simple to use 
> the timeline service v2. We should have a single command to launch and stop 
> the timelineservice back-end for the default HBase implementation.
> A default config with all the values should be packaged that launches all the 
> needed daemons (on the RM node) with a single command with all the 
> recommended settings.
> Having a timeline admin command, perhaps an init command might be needed, or 
> perhaps the timeline service can even auto-detect that and create tables, 
> deploy needed coprocessors etc.
> The overall purpose is to ensure nobody needs to be an HBase expert to get 
> this going. For those cluster operators with HBase experience, they can 
> choose their own more sophisticated deployment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5343) TestContinuousScheduling#testSortedNodes fail intermittently

2016-07-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375800#comment-15375800
 ] 

Karthik Kambatla commented on YARN-5343:


I remember [~yufeigu] was looking into this. [~yufeigu] - does [~sandflee]'s 
analysis help? 

> TestContinuousScheduling#testSortedNodes fail intermittently
> 
>
> Key: YARN-5343
> URL: https://issues.apache.org/jira/browse/YARN-5343
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: sandflee
>Priority: Minor
>
> {noformat}
> java.lang.AssertionError: expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling.testSortedNodes(TestContinuousScheduling.java:167)
> {noformat}
> https://builds.apache.org/job/PreCommit-YARN-Build/12250/testReport/org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair/TestContinuousScheduling/testSortedNodes/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5342) Improve non-exclusive node partition resource allocation in Capacity Scheduler

2016-07-13 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375771#comment-15375771
 ] 

Naganarasimha G R commented on YARN-5342:
-

Thanks for the patch [~wangda], 
Given that discussed approach in YARN-4225 (Fallback Policy based ) is going to 
take some time as it would require significant modifications, i would agree to 
go for intermittent modification to optimize the non exclusive mode scheduling.
Only concern i have is if the size of default partition is greater than the non 
exclusive partition then on one allocation in default we are resetting the 
counter, would it be productive ?

> Improve non-exclusive node partition resource allocation in Capacity Scheduler
> --
>
> Key: YARN-5342
> URL: https://issues.apache.org/jira/browse/YARN-5342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-5342.1.patch
>
>
> In the previous implementation, one non-exclusive container allocation is 
> possible when the missed-opportunity >= #cluster-nodes. And 
> missed-opportunity will be reset when container allocated to any node.
> This will slow down the frequency of container allocation on non-exclusive 
> node partition: *When a non-exclusive partition=x has idle resource, we can 
> only allocate one container for this app in every 
> X=nodemanagers.heartbeat-interval secs for the whole cluster.*
> In this JIRA, I propose a fix to reset missed-opporunity only if we have >0 
> pending resource for the non-exclusive partition OR we get allocation from 
> the default partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5374) Preemption causing communication loop

2016-07-13 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved YARN-5374.
--
Resolution: Invalid

Closing as invalid.

> Preemption causing communication loop
> -
>
> Key: YARN-5374
> URL: https://issues.apache.org/jira/browse/YARN-5374
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, nodemanager, resourcemanager, yarn
>Affects Versions: 2.7.1
> Environment: Yarn version: Hadoop 2.7.1-amzn-0
> AWS EMR Cluster running:
> 1 x r3.8xlarge (Master)
> 52 x r3.8xlarge (Core)
> Spark version : 1.6.0
> Scala version: 2.10.5
> Java version: 1.8.0_51
> Input size: ~10 tb
> Input coming from S3
> Queue Configuration:
> Dynamic allocation: enabled
> Preemption: enabled
> Q1: 70% capacity with max of 100%
> Q2: 30% capacity with max of 100%
> Job Configuration:
> Driver memory = 10g
> Executor cores = 6
> Executor memory = 10g
> Deploy mode = cluster
> Master = yarn
> maxResultSize = 4g
> Shuffle manager = hash
>Reporter: Lucas Winkelmann
>Priority: Blocker
>
> Here is the scenario:
> I launch job 1 into Q1 and allow it to grow to 100% cluster utilization.
> I wait between 15-30 mins ( for this job to complete with 100% of the cluster 
> available takes about 1hr so job 1 is between 25-50% complete). Note that if 
> I wait less time then the issue sometimes does not occur, it appears to be 
> only after the job 1 is at least 25% complete.
> I launch job 2 into Q2 and preemption occurs on the Q1 shrinking the job to 
> allow 70% of cluster utilization.
> At this point job 1 basically halts progress while job 2 continues to execute 
> as normal and finishes. Job 2 either:
> - Fails its attempt and restarts. By the time this attempt fails the other 
> job is already complete meaning the second attempt has full cluster 
> availability and finishes.
> - The job remains at its current progress and simply does not finish ( I have 
> waited ~6 hrs until finally killing the application ).
>  
> Looking into the error log there is this constant error message:
> WARN NettyRpcEndpointRef: Error sending message [message = 
> RemoveExecutor(454,Container container_1468422920649_0001_01_000594 on host: 
> ip-NUMBERS.ec2.internal was preempted.)] in X attempts
>  
> My observations have led me to believe that the application master does not 
> know about this container being killed and continuously asks the container to 
> remove the executor until eventually failing the attempt or continue trying 
> to remove the executor.
>  
> I have done much digging online for anyone else experiencing this issue but 
> have come up with nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5374) Preemption causing communication loop

2016-07-13 Thread Lucas Winkelmann (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375761#comment-15375761
 ] 

Lucas Winkelmann commented on YARN-5374:


I will go ahead and file a Spark JIRA ticket now.

> Preemption causing communication loop
> -
>
> Key: YARN-5374
> URL: https://issues.apache.org/jira/browse/YARN-5374
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, nodemanager, resourcemanager, yarn
>Affects Versions: 2.7.1
> Environment: Yarn version: Hadoop 2.7.1-amzn-0
> AWS EMR Cluster running:
> 1 x r3.8xlarge (Master)
> 52 x r3.8xlarge (Core)
> Spark version : 1.6.0
> Scala version: 2.10.5
> Java version: 1.8.0_51
> Input size: ~10 tb
> Input coming from S3
> Queue Configuration:
> Dynamic allocation: enabled
> Preemption: enabled
> Q1: 70% capacity with max of 100%
> Q2: 30% capacity with max of 100%
> Job Configuration:
> Driver memory = 10g
> Executor cores = 6
> Executor memory = 10g
> Deploy mode = cluster
> Master = yarn
> maxResultSize = 4g
> Shuffle manager = hash
>Reporter: Lucas Winkelmann
>Priority: Blocker
>
> Here is the scenario:
> I launch job 1 into Q1 and allow it to grow to 100% cluster utilization.
> I wait between 15-30 mins ( for this job to complete with 100% of the cluster 
> available takes about 1hr so job 1 is between 25-50% complete). Note that if 
> I wait less time then the issue sometimes does not occur, it appears to be 
> only after the job 1 is at least 25% complete.
> I launch job 2 into Q2 and preemption occurs on the Q1 shrinking the job to 
> allow 70% of cluster utilization.
> At this point job 1 basically halts progress while job 2 continues to execute 
> as normal and finishes. Job 2 either:
> - Fails its attempt and restarts. By the time this attempt fails the other 
> job is already complete meaning the second attempt has full cluster 
> availability and finishes.
> - The job remains at its current progress and simply does not finish ( I have 
> waited ~6 hrs until finally killing the application ).
>  
> Looking into the error log there is this constant error message:
> WARN NettyRpcEndpointRef: Error sending message [message = 
> RemoveExecutor(454,Container container_1468422920649_0001_01_000594 on host: 
> ip-NUMBERS.ec2.internal was preempted.)] in X attempts
>  
> My observations have led me to believe that the application master does not 
> know about this container being killed and continuously asks the container to 
> remove the executor until eventually failing the attempt or continue trying 
> to remove the executor.
>  
> I have done much digging online for anyone else experiencing this issue but 
> have come up with nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5374) Preemption causing communication loop

2016-07-13 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375759#comment-15375759
 ] 

Wangda Tan commented on YARN-5374:
--

[~LucasW], it seems to me that the issue is caused by Spark application doesn't 
well handle container preemption message. If so, I suggest you can drop a mail 
to Spark maillist or file a Spark JIRA instead.

> Preemption causing communication loop
> -
>
> Key: YARN-5374
> URL: https://issues.apache.org/jira/browse/YARN-5374
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, nodemanager, resourcemanager, yarn
>Affects Versions: 2.7.1
> Environment: Yarn version: Hadoop 2.7.1-amzn-0
> AWS EMR Cluster running:
> 1 x r3.8xlarge (Master)
> 52 x r3.8xlarge (Core)
> Spark version : 1.6.0
> Scala version: 2.10.5
> Java version: 1.8.0_51
> Input size: ~10 tb
> Input coming from S3
> Queue Configuration:
> Dynamic allocation: enabled
> Preemption: enabled
> Q1: 70% capacity with max of 100%
> Q2: 30% capacity with max of 100%
> Job Configuration:
> Driver memory = 10g
> Executor cores = 6
> Executor memory = 10g
> Deploy mode = cluster
> Master = yarn
> maxResultSize = 4g
> Shuffle manager = hash
>Reporter: Lucas Winkelmann
>Priority: Blocker
>
> Here is the scenario:
> I launch job 1 into Q1 and allow it to grow to 100% cluster utilization.
> I wait between 15-30 mins ( for this job to complete with 100% of the cluster 
> available takes about 1hr so job 1 is between 25-50% complete). Note that if 
> I wait less time then the issue sometimes does not occur, it appears to be 
> only after the job 1 is at least 25% complete.
> I launch job 2 into Q2 and preemption occurs on the Q1 shrinking the job to 
> allow 70% of cluster utilization.
> At this point job 1 basically halts progress while job 2 continues to execute 
> as normal and finishes. Job 2 either:
> - Fails its attempt and restarts. By the time this attempt fails the other 
> job is already complete meaning the second attempt has full cluster 
> availability and finishes.
> - The job remains at its current progress and simply does not finish ( I have 
> waited ~6 hrs until finally killing the application ).
>  
> Looking into the error log there is this constant error message:
> WARN NettyRpcEndpointRef: Error sending message [message = 
> RemoveExecutor(454,Container container_1468422920649_0001_01_000594 on host: 
> ip-NUMBERS.ec2.internal was preempted.)] in X attempts
>  
> My observations have led me to believe that the application master does not 
> know about this container being killed and continuously asks the container to 
> remove the executor until eventually failing the attempt or continue trying 
> to remove the executor.
>  
> I have done much digging online for anyone else experiencing this issue but 
> have come up with nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5364) timelineservice modules have indirect dependencies on mapreduce artifacts

2016-07-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375711#comment-15375711
 ] 

Hudson commented on YARN-5364:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #10092 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10092/])
YARN-5364. timelineservice modules have indirect dependencies on 
(naganarasimha_gr: rev af8f480c2482b40e9f5a2d29fb5bc7069979fa2e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/pom.xml


> timelineservice modules have indirect dependencies on mapreduce artifacts
> -
>
> Key: YARN-5364
> URL: https://issues.apache.org/jira/browse/YARN-5364
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.0.0-alpha1
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Minor
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5364.01.patch
>
>
> The new timelineservice and timelineservice-hbase-tests modules have indirect 
> dependencies to mapreduce artifacts through HBase and phoenix. Although it's 
> not causing builds to fail, it's not good hygiene.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5374) Preemption causing communication loop

2016-07-13 Thread Lucas Winkelmann (JIRA)
Lucas Winkelmann created YARN-5374:
--

 Summary: Preemption causing communication loop
 Key: YARN-5374
 URL: https://issues.apache.org/jira/browse/YARN-5374
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, nodemanager, resourcemanager, yarn
Affects Versions: 2.7.1
 Environment: Yarn version: Hadoop 2.7.1-amzn-0

AWS EMR Cluster running:
1 x r3.8xlarge (Master)
52 x r3.8xlarge (Core)

Spark version : 1.6.0
Scala version: 2.10.5
Java version: 1.8.0_51

Input size: ~10 tb
Input coming from S3

Queue Configuration:
Dynamic allocation: enabled
Preemption: enabled
Q1: 70% capacity with max of 100%
Q2: 30% capacity with max of 100%

Job Configuration:
Driver memory = 10g
Executor cores = 6
Executor memory = 10g
Deploy mode = cluster
Master = yarn
maxResultSize = 4g
Shuffle manager = hash
Reporter: Lucas Winkelmann
Priority: Blocker


Here is the scenario:
I launch job 1 into Q1 and allow it to grow to 100% cluster utilization.
I wait between 15-30 mins ( for this job to complete with 100% of the cluster 
available takes about 1hr so job 1 is between 25-50% complete). Note that if I 
wait less time then the issue sometimes does not occur, it appears to be only 
after the job 1 is at least 25% complete.
I launch job 2 into Q2 and preemption occurs on the Q1 shrinking the job to 
allow 70% of cluster utilization.
At this point job 1 basically halts progress while job 2 continues to execute 
as normal and finishes. Job 2 either:
- Fails its attempt and restarts. By the time this attempt fails the other job 
is already complete meaning the second attempt has full cluster availability 
and finishes.
- The job remains at its current progress and simply does not finish ( I have 
waited ~6 hrs until finally killing the application ).
 
Looking into the error log there is this constant error message:
WARN NettyRpcEndpointRef: Error sending message [message = 
RemoveExecutor(454,Container container_1468422920649_0001_01_000594 on host: 
ip-NUMBERS.ec2.internal was preempted.)] in X attempts
 
My observations have led me to believe that the application master does not 
know about this container being killed and continuously asks the container to 
remove the executor until eventually failing the attempt or continue trying to 
remove the executor.
 
I have done much digging online for anyone else experiencing this issue but 
have come up with nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)

2016-07-13 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375683#comment-15375683
 ] 

Daniel Templeton commented on YARN-5373:


It looks like the issue only appears when running with a secure cluster.

> NPE introduced by YARN-4958 (The file localization process should allow...)
> ---
>
> Key: YARN-5373
> URL: https://issues.apache.org/jira/browse/YARN-5373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> YARN-4958 added support for wildcards in file localization. It introduces a 
> NPE 
> at 
> {code:java}
> for (File wildLink : directory.listFiles()) {
> sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
> }
> {code}
> When directory.listFiles returns null, NPE will cause the container fail to 
> launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5364) timelineservice modules have indirect dependencies on mapreduce artifacts

2016-07-13 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375648#comment-15375648
 ] 

Naganarasimha G R commented on YARN-5364:
-

strangely dependency tree was also not showing it as required jar earlier

> timelineservice modules have indirect dependencies on mapreduce artifacts
> -
>
> Key: YARN-5364
> URL: https://issues.apache.org/jira/browse/YARN-5364
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.0.0-alpha1
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Minor
> Attachments: YARN-5364.01.patch
>
>
> The new timelineservice and timelineservice-hbase-tests modules have indirect 
> dependencies to mapreduce artifacts through HBase and phoenix. Although it's 
> not causing builds to fail, it's not good hygiene.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5364) timelineservice modules have indirect dependencies on mapreduce artifacts

2016-07-13 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375647#comment-15375647
 ] 

Naganarasimha G R commented on YARN-5364:
-

Not sure why it was failing earlier (with /without the patch), once i changed 
the repo location then was able to start running the test cases, Will go ahead 
and commit the patch.

> timelineservice modules have indirect dependencies on mapreduce artifacts
> -
>
> Key: YARN-5364
> URL: https://issues.apache.org/jira/browse/YARN-5364
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.0.0-alpha1
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Minor
> Attachments: YARN-5364.01.patch
>
>
> The new timelineservice and timelineservice-hbase-tests modules have indirect 
> dependencies to mapreduce artifacts through HBase and phoenix. Although it's 
> not causing builds to fail, it's not good hygiene.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5364) timelineservice modules have indirect dependencies on mapreduce artifacts

2016-07-13 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-5364:

Attachment: (was: screenshot-1.png)

> timelineservice modules have indirect dependencies on mapreduce artifacts
> -
>
> Key: YARN-5364
> URL: https://issues.apache.org/jira/browse/YARN-5364
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.0.0-alpha1
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Minor
> Attachments: YARN-5364.01.patch
>
>
> The new timelineservice and timelineservice-hbase-tests modules have indirect 
> dependencies to mapreduce artifacts through HBase and phoenix. Although it's 
> not causing builds to fail, it's not good hygiene.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)

2016-07-13 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5373:
-
Description: 
YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{code:java}
for (File wildLink : directory.listFiles()) {
sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
}
{code}
When directory.listFiles returns null, NPE will cause the container fail to 
launch.

  was:
YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{code:java}
  for (File wildLink : directory.listFiles()) {
  sb.symlink(new Path(wildLink.toString()), new 
Path(wildLink.getName()));
  }
{code}
When directory.listFiles returns null, NPE will cause the container fail to 
launch.


> NPE introduced by YARN-4958 (The file localization process should allow...)
> ---
>
> Key: YARN-5373
> URL: https://issues.apache.org/jira/browse/YARN-5373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> YARN-4958 added support for wildcards in file localization. It introduces a 
> NPE 
> at 
> {code:java}
> for (File wildLink : directory.listFiles()) {
> sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
> }
> {code}
> When directory.listFiles returns null, NPE will cause the container fail to 
> launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)

2016-07-13 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5373:
-
Description: 
YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{code:java}
  for (File wildLink : directory.listFiles()) {
  sb.symlink(new Path(wildLink.toString()),
  new Path(wildLink.getName()));
}
{code}
When directory.listFiles returns null, NPE will cause the container fail to 
launch.

  was:
YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{{code}}
  for (File wildLink : directory.listFiles()) {
  sb.symlink(new Path(wildLink.toString()),
  new Path(wildLink.getName()));
}
{{code}}
When directory.listFiles returns null, NPE will cause the container fail to 
launch.


> NPE introduced by YARN-4958 (The file localization process should allow...)
> ---
>
> Key: YARN-5373
> URL: https://issues.apache.org/jira/browse/YARN-5373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> YARN-4958 added support for wildcards in file localization. It introduces a 
> NPE 
> at 
> {code:java}
>   for (File wildLink : directory.listFiles()) {
>   sb.symlink(new Path(wildLink.toString()),
>   new Path(wildLink.getName()));
> }
> {code}
> When directory.listFiles returns null, NPE will cause the container fail to 
> launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)

2016-07-13 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5373:
-
Description: 
YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{code:java}
  for (File wildLink : directory.listFiles()) {
  sb.symlink(new Path(wildLink.toString()),
  new Path(wildLink.getName()));
  }
{code}
When directory.listFiles returns null, NPE will cause the container fail to 
launch.

  was:
YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{code:java}
  for (File wildLink : directory.listFiles()) {
  sb.symlink(new Path(wildLink.toString()),
  new Path(wildLink.getName()));
}
{code}
When directory.listFiles returns null, NPE will cause the container fail to 
launch.


> NPE introduced by YARN-4958 (The file localization process should allow...)
> ---
>
> Key: YARN-5373
> URL: https://issues.apache.org/jira/browse/YARN-5373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> YARN-4958 added support for wildcards in file localization. It introduces a 
> NPE 
> at 
> {code:java}
>   for (File wildLink : directory.listFiles()) {
>   sb.symlink(new Path(wildLink.toString()),
>   new Path(wildLink.getName()));
>   }
> {code}
> When directory.listFiles returns null, NPE will cause the container fail to 
> launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)

2016-07-13 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5373:
-
Description: 
YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{code:java}
  for (File wildLink : directory.listFiles()) {
  sb.symlink(new Path(wildLink.toString()), new 
Path(wildLink.getName()));
  }
{code}
When directory.listFiles returns null, NPE will cause the container fail to 
launch.

  was:
YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{code:java}
  for (File wildLink : directory.listFiles()) {
  sb.symlink(new Path(wildLink.toString()),
  new Path(wildLink.getName()));
  }
{code}
When directory.listFiles returns null, NPE will cause the container fail to 
launch.


> NPE introduced by YARN-4958 (The file localization process should allow...)
> ---
>
> Key: YARN-5373
> URL: https://issues.apache.org/jira/browse/YARN-5373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> YARN-4958 added support for wildcards in file localization. It introduces a 
> NPE 
> at 
> {code:java}
>   for (File wildLink : directory.listFiles()) {
>   sb.symlink(new Path(wildLink.toString()), new 
> Path(wildLink.getName()));
>   }
> {code}
> When directory.listFiles returns null, NPE will cause the container fail to 
> launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)

2016-07-13 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-5373:


 Summary: NPE introduced by YARN-4958 (The file localization 
process should allow...)
 Key: YARN-5373
 URL: https://issues.apache.org/jira/browse/YARN-5373
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.9.0
Reporter: Haibo Chen
Assignee: Haibo Chen


YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{{code}}
  for (File wildLink : directory.listFiles()) {
  sb.symlink(new Path(wildLink.toString()),
  new Path(wildLink.getName()));
}
{{code}}
When directory.listFiles returns null, NPE will cause the container fail to 
launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5303) Clean up ContainerExecutor JavaDoc

2016-07-13 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375544#comment-15375544
 ] 

Varun Vasudev commented on YARN-5303:
-

Thanks for the patch [~templedf]! +1. I'll commit this tomorrow if no one 
objects.

> Clean up ContainerExecutor JavaDoc
> --
>
> Key: YARN-5303
> URL: https://issues.apache.org/jira/browse/YARN-5303
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Minor
> Attachments: YARN-5303.001.patch
>
>
> The {{ContainerExecutor}} class needs a lot of JavaDoc cleanup and could use 
> some other TLC as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5007) MiniYarnCluster contains deprecated constructor which is called by the other constructors

2016-07-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375528#comment-15375528
 ] 

Hadoop QA commented on YARN-5007:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 32s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 54s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 20s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
52s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 52s 
{color} | {color:green} root generated 0 new + 706 unchanged - 4 fixed = 706 
total (was 710) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 26s 
{color} | {color:red} root: The patch generated 1 new + 61 unchanged - 2 fixed 
= 62 total (was 63) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
51s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 4m 27s {color} 
| {color:red} hadoop-yarn-server-tests in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 22s {color} 
| {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 113m 53s 
{color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
31s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 162m 26s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.TestContainerManagerSecurity |
|   | hadoop.yarn.server.TestMiniYarnClusterNodeUtilization |
|   | hadoop.yarn.client.api.impl.TestYarnClient |
|   | hadoop.yarn.client.cli.TestLogsCLI |
|   | hadoop.mapred.TestMRCJCFileOutputCommitter |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12817705/YARN-5007.02.patch |
| JIRA Issue | YARN-5007 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux e5fa8444e734 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 5614217 |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| checkstyle 

[jira] [Commented] (YARN-5339) passing file to -out for YARN log CLI doesnt give warning or error code

2016-07-13 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375522#comment-15375522
 ] 

Xuan Gong commented on YARN-5339:
-

The testcase failures and checkstyle issue are not related

> passing file to -out for YARN log CLI doesnt give warning or error code
> ---
>
> Key: YARN-5339
> URL: https://issues.apache.org/jira/browse/YARN-5339
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Sumana Sathish
>Assignee: Xuan Gong
> Attachments: YARN-5339.1.patch, YARN-5339.2.patch
>
>
> passing file to -out for YARN log CLI doesnt give warning or error code
> {code}
> yarn  logs -applicationId application_1467117709224_0003 -out 
> /grid/0/hadoopqe/artifacts/file.txt
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5339) passing file to -out for YARN log CLI doesnt give warning or error code

2016-07-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375503#comment-15375503
 ] 

Hadoop QA commented on YARN-5339:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
8s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: The 
patch generated 1 new + 87 unchanged - 1 fixed = 88 total (was 88) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 2s {color} | 
{color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 23m 45s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.client.cli.TestLogsCLI |
|   | hadoop.yarn.client.api.impl.TestAMRMProxy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12817574/YARN-5339.2.patch |
| JIRA Issue | YARN-5339 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux ab084b82c3e8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / eb47163 |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/12314/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/12314/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-YARN-Build/12314/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/12314/testReport/ |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/12314/console |
| 

[jira] [Commented] (YARN-5363) For AM containers, or for containers of running-apps, "yarn logs" incorrectly only (tries to) shows syslog file-type by default

2016-07-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375496#comment-15375496
 ] 

Hadoop QA commented on YARN-5363:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 35s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 0s 
{color} | {color:blue} The patch file was not named according to hadoop's 
naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute 
for instructions. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: The 
patch generated 7 new + 80 unchanged - 8 fixed = 87 total (was 88) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 31s {color} 
| {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
15s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 21m 37s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.client.api.impl.TestYarnClient |
|   | hadoop.yarn.client.cli.TestLogsCLI |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12817767/YARN-5363-2016-07-13.txt
 |
| JIRA Issue | YARN-5363 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 2c97dfcde450 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / eb47163 |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/12313/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/12313/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-YARN-Build/12313/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
 |
|  Test Results | 

[jira] [Commented] (YARN-5298) Mount usercache and NM filecache directories into Docker container

2016-07-13 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375488#comment-15375488
 ] 

Daniel Templeton commented on YARN-5298:


Looks good to me as well.

> Mount usercache and NM filecache directories into Docker container
> --
>
> Key: YARN-5298
> URL: https://issues.apache.org/jira/browse/YARN-5298
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Varun Vasudev
>Assignee: Sidharta Seethana
> Attachments: YARN-5298.001.patch, YARN-5298.002.patch
>
>
> Currently, we don't mount the usercache and the NM filecache directories into 
> the Docker container. This can lead to issues with containers that rely on 
> public and application scope resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5200) Improve yarn logs to get Container List

2016-07-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375487#comment-15375487
 ] 

Hudson commented on YARN-5200:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #10091 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10091/])
YARN-5200. Enhanced "yarn logs" to be able to get a list of containers 
(vinodkv: rev eb471632349deac4b62f8dec853c8ceb64c9617a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java


> Improve yarn logs to get Container List
> ---
>
> Key: YARN-5200
> URL: https://issues.apache.org/jira/browse/YARN-5200
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.9.0
>
> Attachments: YARN-5200.1.patch, YARN-5200.10.patch, 
> YARN-5200.11.patch, YARN-5200.12.patch, YARN-5200.2.patch, YARN-5200.3.patch, 
> YARN-5200.4.patch, YARN-5200.5.patch, YARN-5200.6.patch, YARN-5200.7.patch, 
> YARN-5200.8.patch, YARN-5200.9.patch, YARN-5200.9.rebase.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5298) Mount usercache and NM filecache directories into Docker container

2016-07-13 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375481#comment-15375481
 ] 

Varun Vasudev commented on YARN-5298:
-

Forgot to mention - the question should not hold up the patch. +1 for the 
patch. I'll commit it tomorrow if no one objects.

> Mount usercache and NM filecache directories into Docker container
> --
>
> Key: YARN-5298
> URL: https://issues.apache.org/jira/browse/YARN-5298
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Varun Vasudev
>Assignee: Sidharta Seethana
> Attachments: YARN-5298.001.patch, YARN-5298.002.patch
>
>
> Currently, we don't mount the usercache and the NM filecache directories into 
> the Docker container. This can lead to issues with containers that rely on 
> public and application scope resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5363) For AM containers, or for containers of running-apps, "yarn logs" incorrectly only (tries to) shows syslog file-type by default

2016-07-13 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-5363:
--
Attachment: YARN-5363-2016-07-13.txt

Updated patch against the latest trunk.

> For AM containers, or for containers of running-apps, "yarn logs" incorrectly 
> only (tries to) shows syslog file-type by default
> ---
>
> Key: YARN-5363
> URL: https://issues.apache.org/jira/browse/YARN-5363
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: YARN-5363-2016-07-12.txt, YARN-5363-2016-07-13.txt
>
>
> For e.g, for a running application, the following happens:
> {code}
> # yarn logs -applicationId application_1467838922593_0001
> 16/07/06 22:07:05 INFO impl.TimelineClientImpl: Timeline service address: 
> http://:8188/ws/v1/timeline/
> 16/07/06 22:07:06 INFO client.RMProxy: Connecting to ResourceManager at 
> /:8050
> 16/07/06 22:07:07 INFO impl.TimelineClientImpl: Timeline service address: 
> http://l:8188/ws/v1/timeline/
> 16/07/06 22:07:07 INFO client.RMProxy: Connecting to ResourceManager at 
> /:8050
> Can not find any log file matching the pattern: [syslog] for the container: 
> container_e03_1467838922593_0001_01_01 within the application: 
> application_1467838922593_0001
> Can not find any log file matching the pattern: [syslog] for the container: 
> container_e03_1467838922593_0001_01_02 within the application: 
> application_1467838922593_0001
> Can not find any log file matching the pattern: [syslog] for the container: 
> container_e03_1467838922593_0001_01_03 within the application: 
> application_1467838922593_0001
> Can not find any log file matching the pattern: [syslog] for the container: 
> container_e03_1467838922593_0001_01_04 within the application: 
> application_1467838922593_0001
> Can not find any log file matching the pattern: [syslog] for the container: 
> container_e03_1467838922593_0001_01_05 within the application: 
> application_1467838922593_0001
> Can not find any log file matching the pattern: [syslog] for the container: 
> container_e03_1467838922593_0001_01_06 within the application: 
> application_1467838922593_0001
> Can not find any log file matching the pattern: [syslog] for the container: 
> container_e03_1467838922593_0001_01_07 within the application: 
> application_1467838922593_0001
> Can not find the logs for the application: application_1467838922593_0001 
> with the appOwner: 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4759) Revisit signalContainer() for docker containers

2016-07-13 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375435#comment-15375435
 ] 

Varun Vasudev commented on YARN-4759:
-

Thanks for the patch [~shaneku...@gmail.com]. Patch looks mostly good. One 
minor change -
{code}
+  // always change back
+  if (change_effective_user(user, group) != 0) {
+return -1;
+  }
{code}
Can you please log an error message?

> Revisit signalContainer() for docker containers
> ---
>
> Key: YARN-4759
> URL: https://issues.apache.org/jira/browse/YARN-4759
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Shane Kumpf
> Attachments: YARN-4759.001.patch, YARN-4759.002.patch
>
>
> The current signal handling (in the DockerContainerRuntime) needs to be 
> revisited for docker containers. For example, container reacquisition on NM 
> restart might not work, depending on which user the process in the container 
> runs as. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5200) Improve yarn logs to get Container List

2016-07-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375411#comment-15375411
 ] 

Vinod Kumar Vavilapalli commented on YARN-5200:
---

I'll dig up the test-case tickets.

The latest patch looks good to me. +1, checking this in.

> Improve yarn logs to get Container List
> ---
>
> Key: YARN-5200
> URL: https://issues.apache.org/jira/browse/YARN-5200
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-5200.1.patch, YARN-5200.10.patch, 
> YARN-5200.11.patch, YARN-5200.12.patch, YARN-5200.2.patch, YARN-5200.3.patch, 
> YARN-5200.4.patch, YARN-5200.5.patch, YARN-5200.6.patch, YARN-5200.7.patch, 
> YARN-5200.8.patch, YARN-5200.9.patch, YARN-5200.9.rebase.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5370) Setting yarn.nodemanager.delete.debug-delay-sec to high number crashes NM because of OOM

2016-07-13 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375339#comment-15375339
 ] 

Manikandan R commented on YARN-5370:


To solve this issue, we tried by setting 
yarn.nodemanager.delete.debug-delay-sec to very low value (zero second) 
assuming that it may clear off the existing scheduled deletion tasks. It didn't 
happen - basically it is not applied for the existing tasks which has been 
already scheduled. Then, we come to know that canRecover() method is getting 
called in service start, which is trying to pull the info from NM recovery 
directory (from local filesystem) and building this entire info in memory, 
which in turn, causing the problems in starting the services and consuming so 
much amount of memory. Then, we tried by moving the contents of NM recovery 
directory to some other place. From this points onwards, it was able to start 
smoothly and works as expected. I think showing some warnings about this high 
value (for ex, 100+ days) somewhere (for ex, in logs) indicating that it can 
cause potential crash can saving significant amount of time to troubleshoot 
this issue.

> Setting yarn.nodemanager.delete.debug-delay-sec to high number crashes NM 
> because of OOM
> 
>
> Key: YARN-5370
> URL: https://issues.apache.org/jira/browse/YARN-5370
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>
> I set yarn.nodemanager.delete.debug-delay-sec to 100 + days in my dev  
> cluster for some reasons. It has been done before 3-4 weeks. After setting 
> this up, at times, NM crashes because of OOM. So, I kept on increasing from 
> 512MB to 6 GB over the past few weeks gradually as and when this crash occurs 
> as temp fix. Sometimes, It won't start smoothly and after multiple tries, it 
> starts functioning. While analyzing heap dump of corresponding JVM, come to 
> know that DeletionService.Java is occupying almost 99% of total allocated 
> memory (-xmx) something like this
> org.apache.hadoop.yarn.server.nodemanager.DeletionService$DelServiceSchedThreadPoolExecutor
>  @ 0x6c1d09068| 80 | 3,544,094,696 | 99.13%
> Basically, there are huge no. of above mentioned tasks scheduled for 
> deletion. Usually, I see NM memory requirements as 2-4GB for large clusters. 
> In my case, cluster is very small and OOM occurs.
> Is it expected behaviour? (or) Is there any limit we can expose on 
> yarn.nodemanager.delete.debug-delay-sec to avoid these kind of issues?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5364) timelineservice modules have indirect dependencies on mapreduce artifacts

2016-07-13 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375332#comment-15375332
 ] 

Varun Saxena commented on YARN-5364:


Passes for me.
I tried changing the repository path (so that all jars are downloaded) and even 
then it works.

Probably [~naganarasimha...@apache.org] at that time the repository from where 
the jar was to be downloaded from, may have been down.

> timelineservice modules have indirect dependencies on mapreduce artifacts
> -
>
> Key: YARN-5364
> URL: https://issues.apache.org/jira/browse/YARN-5364
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.0.0-alpha1
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Minor
> Attachments: YARN-5364.01.patch, screenshot-1.png
>
>
> The new timelineservice and timelineservice-hbase-tests modules have indirect 
> dependencies to mapreduce artifacts through HBase and phoenix. Although it's 
> not causing builds to fail, it's not good hygiene.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5364) timelineservice modules have indirect dependencies on mapreduce artifacts

2016-07-13 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375332#comment-15375332
 ] 

Varun Saxena edited comment on YARN-5364 at 7/13/16 4:47 PM:
-

Passes for me.
I tried changing the repository path (so that all jars are downloaded again) 
and even then it works.

Probably [~naganarasimha...@apache.org] at that time the repository from where 
the jar was to be downloaded from, may have been down.


was (Author: varun_saxena):
Passes for me.
I tried changing the repository path (so that all jars are downloaded) and even 
then it works.

Probably [~naganarasimha...@apache.org] at that time the repository from where 
the jar was to be downloaded from, may have been down.

> timelineservice modules have indirect dependencies on mapreduce artifacts
> -
>
> Key: YARN-5364
> URL: https://issues.apache.org/jira/browse/YARN-5364
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.0.0-alpha1
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Minor
> Attachments: YARN-5364.01.patch, screenshot-1.png
>
>
> The new timelineservice and timelineservice-hbase-tests modules have indirect 
> dependencies to mapreduce artifacts through HBase and phoenix. Although it's 
> not causing builds to fail, it's not good hygiene.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5371) FairScheduer ContinuousScheduling thread throws Exception

2016-07-13 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375287#comment-15375287
 ] 

Rohith Sharma K S commented on YARN-5371:
-

Is this dup of YARN-4743? Both issues stack trace would be different but I 
think root cause is same!!

> FairScheduer ContinuousScheduling thread throws Exception
> -
>
> Key: YARN-5371
> URL: https://issues.apache.org/jira/browse/YARN-5371
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: sandflee
>Assignee: sandflee
>
> {noformat}
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
> at java.util.TimSort.mergeLo(TimSort.java:777)
> at java.util.TimSort.mergeAt(TimSort.java:514)
> at java.util.TimSort.mergeCollapse(TimSort.java:441)
> at java.util.TimSort.sort(TimSort.java:245)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1454)
> at java.util.Collections.sort(Collections.java:175)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1002)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4876) [Phase 1] Decoupled Init / Destroy of Containers from Start / Stop

2016-07-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375276#comment-15375276
 ] 

Karthik Kambatla commented on YARN-4876:


I understand there were more discussions on the design and implementation since 
the design doc was posted here. Does the posted design doc still apply? 

If yes, had one quick clarification question: Both 'StartContainerRequest' and 
'StopContainerRequest' seem to be adding a field to capture when to actually 
destroy the container. I understand why stop needs it. Does start need it for 
the case where the container (process) completes on its own without an explicit 
stop request from the AM? 

> [Phase 1] Decoupled Init / Destroy of Containers from Start / Stop
> --
>
> Key: YARN-4876
> URL: https://issues.apache.org/jira/browse/YARN-4876
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Marco Rabozzi
> Attachments: YARN-4876-design-doc.pdf, YARN-4876.002.patch, 
> YARN-4876.01.patch
>
>
> Introduce *initialize* and *destroy* container API into the 
> *ContainerManagementProtocol* and decouple the actual start of a container 
> from the initialization. This will allow AMs to re-start a container without 
> having to lose the allocation.
> Additionally, if the localization of the container is associated to the 
> initialize (and the cleanup with the destroy), This can also be used by 
> applications to upgrade a Container by *re-initializing* with a new 
> *ContainerLaunchContext*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5007) MiniYarnCluster contains deprecated constructor which is called by the other constructors

2016-07-13 Thread Andras Bokor (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Bokor updated YARN-5007:
---
Attachment: YARN-5007.02.patch

I am uploading the second patch because the first one is no longer applicable 
due to another code change.

> MiniYarnCluster contains deprecated constructor which is called by the other 
> constructors
> -
>
> Key: YARN-5007
> URL: https://issues.apache.org/jira/browse/YARN-5007
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: timelineserver
>Reporter: Andras Bokor
>Assignee: Andras Bokor
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-5007.01.patch, YARN-5007.02.patch
>
>
> MiniYarnCluster has a deprecated constructor which is called by the other 
> constructors and it causes javac warnings during the build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5298) Mount usercache and NM filecache directories into Docker container

2016-07-13 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375257#comment-15375257
 ] 

Varun Vasudev commented on YARN-5298:
-

Patch looks mostly good to me [~sidharta-s]. One question - since we're 
mounting the usercache directories, do we need to mount the individual 
container directories?

> Mount usercache and NM filecache directories into Docker container
> --
>
> Key: YARN-5298
> URL: https://issues.apache.org/jira/browse/YARN-5298
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Varun Vasudev
>Assignee: Sidharta Seethana
> Attachments: YARN-5298.001.patch, YARN-5298.002.patch
>
>
> Currently, we don't mount the usercache and the NM filecache directories into 
> the Docker container. This can lead to issues with containers that rely on 
> public and application scope resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5359) FileSystemTimelineReader/Writer uses unix-specific default storage path

2016-07-13 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375246#comment-15375246
 ] 

Varun Saxena commented on YARN-5359:


Committed the latest patch to trunk.
Thanks [~sjlee0] for your contribution and [~jrottinghuis] for the reviews.

> FileSystemTimelineReader/Writer uses unix-specific default storage path
> ---
>
> Key: YARN-5359
> URL: https://issues.apache.org/jira/browse/YARN-5359
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha1
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Minor
> Fix For: 3.0.0-alpha1
>
> Attachments: YARN-5359.01.patch, YARN-5359.02.patch, 
> YARN-5359.03.patch
>
>
> {{FileSystemTimelineReaderImpl}} and {{FileSystemTimelineWriterImpl}} use a 
> unix-specific default. It won't work on Windows.
> Also, {{TestFileSystemTimelineReaderImpl}} uses this default directly, which 
> is also brittle against concurrent tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5364) timelineservice modules have indirect dependencies on mapreduce artifacts

2016-07-13 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375209#comment-15375209
 ] 

Sangjin Lee edited comment on YARN-5364 at 7/13/16 3:38 PM:


Hmm, I haven't seen that error before. I see the {{twill-zookeeper}} dependency 
with or without this patch:

{noformat}
[INFO]org.apache.twill:twill-zookeeper:jar:0.6.0-incubating:test
{noformat}

I also see it in the classpath of the unit tests (via {{mvn test -X}} )

{panel}
\[DEBUG\] boot(compact) classpath:  surefire-booter-2.17.jar  
surefire-api-2.17.jar  test-classes  classes  junit-4.11.jar  
hamcrest-core-1.3.jar  
hadoop-yarn-server-timelineservice-3.0.0-alpha1-SNAPSHOT.jar  
hadoop-annotations-3.0.0-alpha1-SNAPSHOT.jar  tools.jar  guice-4.0.jar  
javax.inject-1.jar  aopalliance-1.0.jar  commons-io-2.4.jar  
servlet-api-2.5.jar  jaxb-api-2.2.2.jar  stax-api-1.0-2.jar  activation-1.1.jar 
 commons-cli-1.2.jar  commons-lang-2.6.jar  commons-logging-1.1.3.jar  
commons-csv-1.0.jar  jackson-core-asl-1.9.13.jar  jackson-mapper-asl-1.9.13.jar 
 hadoop-common-2.5.1.jar  commons-math3-3.1.1.jar  xmlenc-0.52.jar  
commons-httpclient-3.1.jar  commons-codec-1.4.jar  commons-net-3.1.jar  
commons-collections-3.2.2.jar  jetty-6.1.26.jar  jetty-util-6.1.26.jar  
jersey-json-1.19.jar  jaxb-impl-2.2.3-1.jar  jersey-server-1.19.jar  
jasper-compiler-5.5.23.jar  jasper-runtime-5.5.23.jar  jsp-api-2.1.jar  
commons-el-1.0.jar  log4j-1.2.17.jar  jets3t-0.9.0.jar  httpcore-4.4.4.jar  
java-xmlbuilder-0.4.jar  commons-configuration-1.6.jar  
commons-digester-1.8.jar  commons-beanutils-1.7.0.jar  
commons-beanutils-core-1.8.0.jar  slf4j-api-1.7.10.jar  
slf4j-log4j12-1.7.10.jar  avro-1.7.4.jar  paranamer-2.3.jar  
snappy-java-1.0.4.1.jar  protobuf-java-2.5.0.jar  jsch-0.1.51.jar  
jsr305-3.0.0.jar  zookeeper-3.4.6.jar  commons-compress-1.4.1.jar  xz-1.0.jar  
hadoop-auth-2.5.1.jar  httpclient-4.5.2.jar  
apacheds-kerberos-codec-2.0.0-M15.jar  apacheds-i18n-2.0.0-M15.jar  
api-asn1-api-1.0.0-M20.jar  api-util-1.0.0-M20.jar  
hadoop-yarn-api-3.0.0-alpha1-SNAPSHOT.jar  
hadoop-yarn-common-3.0.0-alpha1-SNAPSHOT.jar  jackson-jaxrs-1.9.13.jar  
jackson-xc-1.9.13.jar  guice-servlet-4.0.jar  jersey-guice-1.19.jar  
jersey-servlet-1.19.jar  hadoop-yarn-server-common-3.0.0-alpha1-SNAPSHOT.jar  
leveldbjni-all-1.8.jar  
hadoop-yarn-server-applicationhistoryservice-3.0.0-alpha1-SNAPSHOT.jar  
jettison-1.1.jar  fst-2.24.jar  javassist-3.18.1-GA.jar  objenesis-2.1.jar  
guava-11.0.2.jar  jersey-core-1.19.jar  jsr311-api-1.1.1.jar  
jersey-client-1.19.jar  hbase-common-1.1.3.jar  hbase-protocol-1.1.3.jar  
hbase-annotations-1.1.3.jar  htrace-core-3.1.0-incubating.jar  
findbugs-annotations-1.3.9-1.jar  hbase-client-1.1.3.jar  
netty-all-4.1.0.Beta5.jar  jcodings-1.0.8.jar  joni-2.1.2.jar  
hbase-server-1.1.3.jar  hbase-procedure-1.1.3.jar  hbase-prefix-tree-1.1.3.jar  
hbase-hadoop-compat-1.1.3.jar  hbase-hadoop2-compat-1.1.3.jar  
metrics-core-2.2.0.jar  commons-math-2.2.jar  jetty-sslengine-6.1.26.jar  
jsp-2.1-6.1.14.jar  jsp-api-2.1-6.1.14.jar  servlet-api-2.5-6.1.14.jar  
jamon-runtime-2.3.1.jar  disruptor-3.3.0.jar  hbase-common-1.1.3-tests.jar  
hbase-server-1.1.3-tests.jar  hbase-it-1.1.3-tests.jar  hbase-shell-1.1.3.jar  
jruby-complete-1.6.8.jar  netty-3.2.4.Final.jar  
phoenix-core-4.7.0-HBase-1.1.jar  tephra-api-0.7.0.jar  tephra-core-0.7.0.jar  
gson-2.2.4.jar  guice-assistedinject-3.0.jar  libthrift-0.9.0.jar  
fastutil-6.5.6.jar  twill-common-0.6.0-incubating.jar  
twill-core-0.6.0-incubating.jar  twill-api-0.6.0-incubating.jar  
asm-all-5.0.2.jar  twill-discovery-api-0.6.0-incubating.jar  
twill-discovery-core-0.6.0-incubating.jar  twill-zookeeper-0.6.0-incubating.jar 
 metrics-core-3.1.0.jar  tephra-hbase-compat-1.1-0.7.0.jar  antlr-3.5.jar  
ST4-4.0.7.jar  antlr-runtime-3.5.jar  stringtemplate-3.2.1.jar  antlr-2.7.7.jar 
 sqlline-1.1.8.jar  annotations-1.3.2.jar  snappy-0.3.jar  
phoenix-core-4.7.0-HBase-1.1-tests.jar  jline-2.11.jar  joda-time-1.6.jar  
mockito-all-1.8.5.jar  hadoop-common-2.5.1-tests.jar  hadoop-hdfs-2.5.1.jar  
commons-daemon-1.0.13.jar  netty-3.6.2.Final.jar  hadoop-hdfs-2.5.1-tests.jar  
hbase-testing-util-1.1.3.jar  hbase-annotations-1.1.3-tests.jar  
hbase-hadoop-compat-1.1.3-tests.jar  hbase-hadoop2-compat-1.1.3-tests.jar  
surefire-junit4-2.17.jar
{panel}

Do you see that as well? Does your maven local cache ( {{~/.m2/repository}} ) 
have that jar?


was (Author: sjlee0):
Hmm, I haven't seen that error before. I see the {{twill-zookeeper}} dependency 
with or without this patch:

{noformat}
[INFO]org.apache.twill:twill-zookeeper:jar:0.6.0-incubating:test
{noformat}

Do you see that as well? Does your maven local cache ( {{~/.m2/repository}} ) 
have that jar?

> timelineservice modules have indirect dependencies on mapreduce artifacts
> 

[jira] [Commented] (YARN-5364) timelineservice modules have indirect dependencies on mapreduce artifacts

2016-07-13 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375209#comment-15375209
 ] 

Sangjin Lee commented on YARN-5364:
---

Hmm, I haven't seen that error before. I see the {{twill-zookeeper}} dependency 
with or without this patch:

{noformat}
[INFO]org.apache.twill:twill-zookeeper:jar:0.6.0-incubating:test
{noformat}

Do you see that as well? Does your maven local cache ( {{~/.m2/repository}} ) 
have that jar?

> timelineservice modules have indirect dependencies on mapreduce artifacts
> -
>
> Key: YARN-5364
> URL: https://issues.apache.org/jira/browse/YARN-5364
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.0.0-alpha1
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Minor
> Attachments: YARN-5364.01.patch, screenshot-1.png
>
>
> The new timelineservice and timelineservice-hbase-tests modules have indirect 
> dependencies to mapreduce artifacts through HBase and phoenix. Although it's 
> not causing builds to fail, it's not good hygiene.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5359) FileSystemTimelineReader/Writer uses unix-specific default storage path

2016-07-13 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-5359:
---
Summary: FileSystemTimelineReader/Writer uses unix-specific default storage 
path  (was: FileSystemTimelineReader/Writer uses unix-specific default)

> FileSystemTimelineReader/Writer uses unix-specific default storage path
> ---
>
> Key: YARN-5359
> URL: https://issues.apache.org/jira/browse/YARN-5359
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha1
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Minor
> Attachments: YARN-5359.01.patch, YARN-5359.02.patch, 
> YARN-5359.03.patch
>
>
> {{FileSystemTimelineReaderImpl}} and {{FileSystemTimelineWriterImpl}} use a 
> unix-specific default. It won't work on Windows.
> Also, {{TestFileSystemTimelineReaderImpl}} uses this default directly, which 
> is also brittle against concurrent tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5372) TestRMWebServicesAppsModification fails in trunk

2016-07-13 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong resolved YARN-5372.

Resolution: Not A Problem

> TestRMWebServicesAppsModification fails in trunk
> 
>
> Key: YARN-5372
> URL: https://issues.apache.org/jira/browse/YARN-5372
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Jun Gong
>
> Some test cases in TestRMWebServicesAppsModification fails in trunk:
> {code}
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testAppMove[0]
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testUpdateAppPriority[0]
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testAppMove[1]
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized[1]
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testUpdateAppPriority[1]
>
> {code}
> The test case errors are at 
> https://builds.apache.org/job/PreCommit-YARN-Build/12310/testReport/.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5321) [YARN-3368] Add resource usage for application by node managers

2016-07-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375165#comment-15375165
 ] 

Hadoop QA commented on YARN-5321:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 1m 16s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:6d3a5f5 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12817648/YARN-5321-YARN-3368.004.patch
 |
| JIRA Issue | YARN-5321 |
| Optional Tests |  asflicense  |
| uname | Linux cc3a33998123 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | YARN-3368 / 8ec70d3 |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/12311/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> [YARN-3368] Add resource usage for application by node managers
> ---
>
> Key: YARN-5321
> URL: https://issues.apache.org/jira/browse/YARN-5321
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-5321-YARN-3368-0001.patch, 
> YARN-5321-YARN-3368.0002.patch, YARN-5321-YARN-3368.003.patch, 
> YARN-5321-YARN-3368.004.patch, sample-1.png
>
>
> With this, user can understand distribution of resources allocated to this 
> application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5372) TestRMWebServicesAppsModification fails in trunk

2016-07-13 Thread Jun Gong (JIRA)
Jun Gong created YARN-5372:
--

 Summary: TestRMWebServicesAppsModification fails in trunk
 Key: YARN-5372
 URL: https://issues.apache.org/jira/browse/YARN-5372
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Jun Gong


Some test cases in TestRMWebServicesAppsModification fails in trunk:

{code}
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testAppMove[0]
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testUpdateAppPriority[0]
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testAppMove[1]
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized[1]
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testUpdateAppPriority[1]
 
{code}

The test case errors are at 
https://builds.apache.org/job/PreCommit-YARN-Build/12310/testReport/.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5333) Some recovered apps are put into default queue when RM HA

2016-07-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375143#comment-15375143
 ] 

Hadoop QA commented on YARN-5333:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 40s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 42m 8s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
20s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 59m 11s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12817678/YARN-5333.02.patch |
| JIRA Issue | YARN-5333 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux fc5c75cacfbd 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / d6d41e8 |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/12310/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-YARN-Build/12310/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/12310/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/12310/console |
| Powered by | Apache Yetus 0.3.0   

[jira] [Commented] (YARN-4091) Improvement: Introduce more debug/diagnostics information to detail out scheduler activity

2016-07-13 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375131#comment-15375131
 ] 

Sunil G commented on YARN-4091:
---

Thanks [~ChenGe] for the patch and detailed doc.

Few initial comments, I will also share more feedback soon.

*REST api comments :*
1. For REST query ending with {{activities?nodeId=node-87}} I think it may scan 
all nodes in that host if there are multiple NMs running on same node. correct?
2. If we are supporting above option, could we pass node names in comma 
separated form to {{nodeId}} like  {{activities?nodeId=node-87,node-88}}   , 
May we can define a scope here for number of node manager to query as response 
o/p also need to be simpler to understand.
3. For {{app-activities?appId=application_1468198570845_0022}} I think o/p is 
different from node ? Could you also please attach REST o/p for app and node 
scenario.
4.   It is possible that some times we may look for relaxed scheduling by 
considering missed opportunities. So one round of nodes has to undergo 
heartbeats to get an allocation for few cases like (rack local/dflt partition 
from shared label) etc. Its better we add an option like collect scheduler 
activity for an app till missed opportunity is 0. Thoughts?
5. 


*General Comments :*
1. ActivityManager is a class which holds all the informations regarding 
scheduling activities tracker. Over the time, I think we might need to consider 
cases like cleanup of some out standing requests, internal aggregation to 
compact and re-order collected data across heartbeats. For all these cases, I 
think its better we can make ActivityManager as an extended service for 
scheduler. So it can start a thread associated with service to do all 
monitoring and cleanup. This is just a thought, pls feel free to share your 
opinion as its a good to have option.
2. I am in favor of having the current direct simple call to start/update/stop 
scheduling activity. But will it be better if we define an read-write interface 
and clearly define who will read the data, and who can write to the activity 
manager. On a second thought, could we raise events to ActivityManager from 
scheduler and we can make it asynchronous for writes. It may become more clear 
and simple. Thoughts?


> Improvement: Introduce more debug/diagnostics information to detail out 
> scheduler activity
> --
>
> Key: YARN-4091
> URL: https://issues.apache.org/jira/browse/YARN-4091
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Chen Ge
> Attachments: Improvement on debugdiagnostic information - YARN.pdf, 
> YARN-4091-design-doc-v1.pdf, YARN-4091.preliminary.1.patch
>
>
> As schedulers are improved with various new capabilities, more configurations 
> which tunes the schedulers starts to take actions such as limit assigning 
> containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under 
> these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in 
> scheduler where it skips/rejects container assignment, activate application 
> etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve 
> on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5370) Setting yarn.nodemanager.delete.debug-delay-sec to high number crashes NM because of OOM

2016-07-13 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375089#comment-15375089
 ] 

Jason Lowe commented on YARN-5370:
--

It's expected behavior in the sense that the debug delay setting causes the NM 
to buffer every deletion task up to the specified amount of time.  100 days is 
a lot of time, so if there are many deletions within that period it will have 
to buffer a lot of tasks as you saw in the heap dump.

The debug delay is, as the name implies, for debugging.  If you set it to a 
very large value then, depending upon the amount of container churn on the 
cluster, a correspondingly large heap will be required given the way it works 
today.  It's not typical to set this to a very large value since it only needs 
to be large enough to give someone a chance to examine/copy off the requisite 
files after reproducing the issue.  Normally it doesn't take someone 100 days 
to get around to examining the files after a problem occurs. ;-)

Theoretically we could extend the functionality to spill tasks to disk or do 
something more clever with how they are stored to reduce the memory pressure, 
but I question the cost/benefit tradeoff.  Again this is a feature intended 
just for debugging.  I'm also not a big fan of putting in an arbitrary limit on 
the value.  If someone wants to store files for a few years and has the heap 
size and disk space to hold all that, who are we to stop them from trying?


> Setting yarn.nodemanager.delete.debug-delay-sec to high number crashes NM 
> because of OOM
> 
>
> Key: YARN-5370
> URL: https://issues.apache.org/jira/browse/YARN-5370
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>
> I set yarn.nodemanager.delete.debug-delay-sec to 100 + days in my dev  
> cluster for some reasons. It has been done before 3-4 weeks. After setting 
> this up, at times, NM crashes because of OOM. So, I kept on increasing from 
> 512MB to 6 GB over the past few weeks gradually as and when this crash occurs 
> as temp fix. Sometimes, It won't start smoothly and after multiple tries, it 
> starts functioning. While analyzing heap dump of corresponding JVM, come to 
> know that DeletionService.Java is occupying almost 99% of total allocated 
> memory (-xmx) something like this
> org.apache.hadoop.yarn.server.nodemanager.DeletionService$DelServiceSchedThreadPoolExecutor
>  @ 0x6c1d09068| 80 | 3,544,094,696 | 99.13%
> Basically, there are huge no. of above mentioned tasks scheduled for 
> deletion. Usually, I see NM memory requirements as 2-4GB for large clusters. 
> In my case, cluster is very small and OOM occurs.
> Is it expected behaviour? (or) Is there any limit we can expose on 
> yarn.nodemanager.delete.debug-delay-sec to avoid these kind of issues?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5356) NodeManager should communicate physical resource capability to ResourceManager

2016-07-13 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-5356:
-
Description: 
Currently ResourceUtilization contains absolute quantities of resource used 
(e.g. 4096MB memory used). It would be good if the NM also communicated the 
actual physical resource capabilities of the node so that the RM can use this 
data to schedule more effectively (overcommit, etc)

Currently the only available information is the Resource the node registered 
with (or later updated using updateNodeResource). However, these aren't really 
sufficient to get a good view of how utilized a resource is. For example, if a 
node reports 400% CPU utilization, does that mean it's completely full, or 
barely utilized? Today there is no reliable way to figure this out.

[~elgoiri] - Lots of good work is happening in YARN-2965 so curious if you have 
thoughts/opinions on this?

  was:
Currently ResourceUtilization contains absolute quantities of resource used 
(e.g. 4096MB memory used). It would be good if it also included how much of 
that resource is actually available on the node so that the RM can use this 
data to schedule more effectively (overcommit, etc)

Currently the only available information is the Resource the node registered 
with (or later updated using updateNodeResource). However, these aren't really 
sufficient to get a good view of how utilized a resource is. For example, if a 
node reports 400% CPU utilization, does that mean it's completely full, or 
barely utilized? Today there is no reliable way to figure this out.

[~elgoiri] - Lots of good work is happening in YARN-2965 so curious if you have 
thoughts/opinions on this?


> NodeManager should communicate physical resource capability to ResourceManager
> --
>
> Key: YARN-5356
> URL: https://issues.apache.org/jira/browse/YARN-5356
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Nathan Roberts
>Assignee: Inigo Goiri
> Attachments: YARN-5356.000.patch
>
>
> Currently ResourceUtilization contains absolute quantities of resource used 
> (e.g. 4096MB memory used). It would be good if the NM also communicated the 
> actual physical resource capabilities of the node so that the RM can use this 
> data to schedule more effectively (overcommit, etc)
> Currently the only available information is the Resource the node registered 
> with (or later updated using updateNodeResource). However, these aren't 
> really sufficient to get a good view of how utilized a resource is. For 
> example, if a node reports 400% CPU utilization, does that mean it's 
> completely full, or barely utilized? Today there is no reliable way to figure 
> this out.
> [~elgoiri] - Lots of good work is happening in YARN-2965 so curious if you 
> have thoughts/opinions on this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >