[jira] [Commented] (YARN-7770) Support for setting application priority for Distributed Shell jobs

2018-02-04 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352040#comment-16352040
 ] 

Sunil G commented on YARN-7770:
---

Yes, its been in DS help message.

> Support for setting application priority for Distributed Shell jobs
> ---
>
> Key: YARN-7770
> URL: https://issues.apache.org/jira/browse/YARN-7770
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Charan Hebri
>Assignee: Sunil G
>Priority: Major
>
> Currently there isn't a way to submit a Distributed Shell job with an 
> application priority like how it is done via the property
> {noformat}
> mapred.job.priority{noformat}
> for MapReduce jobs. Creating this issue to track support for the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7770) Support for setting application priority for Distributed Shell jobs

2018-02-04 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G resolved YARN-7770.
---
Resolution: Not A Bug

> Support for setting application priority for Distributed Shell jobs
> ---
>
> Key: YARN-7770
> URL: https://issues.apache.org/jira/browse/YARN-7770
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Charan Hebri
>Assignee: Sunil G
>Priority: Major
>
> Currently there isn't a way to submit a Distributed Shell job with an 
> application priority like how it is done via the property
> {noformat}
> mapred.job.priority{noformat}
> for MapReduce jobs. Creating this issue to track support for the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7292) Revisit Resource Profile Behavior

2018-02-04 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352038#comment-16352038
 ] 

Sunil G commented on YARN-7292:
---

Hi [~leftnoteasy]. Thanks for the patch.

Generally approach seems fine. I will help in clearing jenkins etc today.

> Revisit Resource Profile Behavior
> -
>
> Key: YARN-7292
> URL: https://issues.apache.org/jira/browse/YARN-7292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-7292.002.patch, YARN-7292.003.patch, 
> YARN-7292.wip.001.patch
>
>
> Had discussions with [~templedf], [~vvasudev], [~sunilg] offline. There're a 
> couple of resource profile related behaviors might need to be updated:
> 1) Configure resource profile in server side or client side: 
> Currently resource profile can be only configured centrally:
> - Advantages:
> A given resource profile has a the same meaning in the cluster. It won’t 
> change when we run different apps in different configurations. A job can run 
> under Amazon’s G2.8X can also run on YARN with G2.8X profile. A side benefit 
> is YARN scheduler can potentially do better bin packing.
> - Disadvantages: 
> Hard for applications to add their own resource profiles. 
> 2) Do we really need mandatory resource profiles such as 
> minimum/maximum/default? 
> 3) Should we send resource profile name inside ResourceRequest, or should 
> client/AM translate it to resource and set it to the existing resource 
> fields? 
> 4) Related to above, should we allow resource overrides or client/AM should 
> send final resource to RM?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6510) Fix profs stat file warning caused by process names that includes parenthesis

2018-02-04 Thread Cholju Paek (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352002#comment-16352002
 ] 

Cholju Paek edited comment on YARN-6510 at 2/5/18 4:06 AM:
---

Hello.

One of my customer is experiencing the same and needs a patch.

There is one question.
I wonder if this bug causes functional problems?
For example, could a job fail or a data drop?

 The patch is very careful because the site is very sensitive to patches.

Could you answer that?

 

안녕하세요.

저희 사이트도 동일한 이슈가 발생하고 있고, Patch가 필요한 상태입니다.
한가지 궁금한 점이 있습니다.
이 Bug로 인하여 기능적인 문제가 발생하는지에 대하여 궁금합니다.
예를들어 작업실패 또는 데이터 누락 등이 발생할 수 있을까요?

사이트가 패치에 대해 매우 민감하므로 Patch가 조심스럽습니다.

혹시 답변해 주실수 있으신지요?

 


was (Author: cholju73):
Hello.

One of my customer is experiencing the same issue.
Currently, Cloudera CDH is used, and due to the bug, patch is required.

I wonder if this bug causes functional problems?
The patch is very careful because the site is very sensitive to patches.

Could you answer that?

 

안녕하세요. 

저희 사이트도 동일한 이슈가 발생하고 있습니다.
현재 Cloudera CDH를 이용하고 있으며, 해당 Bug로 인하여 Patch가 필요한 상태입니다.
궁금한 점은,,
 이 Bug로 인하여 기능적인 문제가 발생하는지에 대하여 궁금합니다.
사이트가 패치에 대해 매우 민감하므로 Patch가 조심스럽습니다.

혹시 답변해 주실수 있으신지요?

 

> Fix profs stat file warning caused by process names that includes parenthesis
> -
>
> Key: YARN-6510
> URL: https://issues.apache.org/jira/browse/YARN-6510
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Fix For: 2.9.0, 3.0.0-alpha4
>
> Attachments: YARN-6510.01.patch
>
>
> Even with the fix for YARN-3344 we still have issues with the procfs format.
> This is the case that is causing issues:
> {code}
> [user@nm1 ~]$ cat /proc/2406/stat
> 2406 (ib_fmr(mlx4_0)) S 2 0 0 0 -1 2149613632 0 0 0 0 166 126908 0 0 20 0 1 0 
> 4284 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 
> 0 0 17 6 0 0 0 0 0
> {code}
> We do not handle the parenthesis in the name which causes the pattern 
> matching to fail



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7892) NodeAttributePBImpl does not implement hashcode and Equals properly

2018-02-04 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352009#comment-16352009
 ] 

Sunil G commented on YARN-7892:
---

[~Naganarasimha] This looks fine. For NodeAttributeType, we are using enum. So 
I think separate compare may not be needed for each type.

> NodeAttributePBImpl does not implement hashcode and Equals properly
> ---
>
> Key: YARN-7892
> URL: https://issues.apache.org/jira/browse/YARN-7892
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Major
> Attachments: YARN-7892-YARN-3409.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6510) Fix profs stat file warning caused by process names that includes parenthesis

2018-02-04 Thread Cholju Paek (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352002#comment-16352002
 ] 

Cholju Paek commented on YARN-6510:
---

Hello.

One of my customer is experiencing the same issue.
Currently, Cloudera CDH is used, and due to the bug, patch is required.

I wonder if this bug causes functional problems?
The patch is very careful because the site is very sensitive to patches.

Could you answer that?

 

안녕하세요. 

저희 사이트도 동일한 이슈가 발생하고 있습니다.
현재 Cloudera CDH를 이용하고 있으며, 해당 Bug로 인하여 Patch가 필요한 상태입니다.
궁금한 점은,,
 이 Bug로 인하여 기능적인 문제가 발생하는지에 대하여 궁금합니다.
사이트가 패치에 대해 매우 민감하므로 Patch가 조심스럽습니다.

혹시 답변해 주실수 있으신지요?

 

> Fix profs stat file warning caused by process names that includes parenthesis
> -
>
> Key: YARN-6510
> URL: https://issues.apache.org/jira/browse/YARN-6510
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Fix For: 2.9.0, 3.0.0-alpha4
>
> Attachments: YARN-6510.01.patch
>
>
> Even with the fix for YARN-3344 we still have issues with the procfs format.
> This is the case that is causing issues:
> {code}
> [user@nm1 ~]$ cat /proc/2406/stat
> 2406 (ib_fmr(mlx4_0)) S 2 0 0 0 -1 2149613632 0 0 0 0 166 126908 0 0 20 0 1 0 
> 4284 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 
> 0 0 17 6 0 0 0 0 0
> {code}
> We do not handle the parenthesis in the name which causes the pattern 
> matching to fail



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7880) CapacityScheduler$ResourceCommitterService throws NPE when running sls

2018-02-04 Thread Jiandan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  resolved YARN-7880.
-
   Resolution: Duplicate
 Assignee: Jiandan Yang 
Fix Version/s: 3.0.0

> CapacityScheduler$ResourceCommitterService throws NPE when running sls
> --
>
> Key: YARN-7880
> URL: https://issues.apache.org/jira/browse/YARN-7880
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Major
> Fix For: 3.0.0
>
>
> sls test case: node count = 9000, job count=10k,task num of job = 500, task 
> run time = 100s, but it does not occur when node count = 500 and 2000.
> {code}
> 18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: 
> container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED 
> to RUNNING
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541)
> {code}
> some CapacityScheduler$AsyncScheduleThread also throws NPE
> {code}
> 18/02/02 20:40:34 INFO resourcemanager.DefaultAMSProcessor: AM registration 
> appattempt_1517575125794_4564_01
> 18/02/02 20:40:34 INFO resourcemanager.RMAuditLogger: USER=default  
> OPERATION=Register App Master   TARGET=ApplicationMasterService 
> RESULT=SUCCESS  APPID=application_1517575125794_4564
> APPATTEMPTID=appattempt_1517575125794_4564_01
> Exception in thread "Thread-43" 18/02/02 20:40:34 INFO appmaster.AMSimulator: 
> Register the application master for application application_1517575125794_4564
> 18/02/02 20:40:34 INFO resourcemanager.MockAMLauncher: Notify AM launcher 
> launched:container_1517575125794_4564_01_01
> 18/02/02 20:40:34 INFO rmcontainer.RMContainerImpl: 
> container_1517575125794_2703_01_01 Container Transitioned from ACQUIRED 
> to RUNNING
> 18/02/02 20:40:34 INFO attempt.RMAppAttemptImpl: 
> appattempt_1517575125794_4564_01 State change from ALLOCATED to LAUNCHED 
> on event = LAUNCHED
> 18/02/02 20:40:34 INFO attempt.RMAppAttemptImpl: 
> appattempt_1517575125794_4564_01 State change from LAUNCHED to RUNNING on 
> event = REGISTERED
> 18/02/02 20:40:34 INFO rmapp.RMAppImpl: application_1517575125794_4564 State 
> change from ACCEPTED to RUNNING on event = ATTEMPT_REGISTERED
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceRequests(SchedulerApplicationAttempt.java:1341)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.canAssign(RegularContainerAllocator.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignOffSwitchContainers(RegularContainerAllocator.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainersOnNode(RegularContainerAllocator.java:470)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.tryAllocateOnNode(RegularContainerAllocator.java:252)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.allocate(RegularContainerAllocator.java:816)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(RegularContainerAllocator.java:854)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(ContainerAllocator.java:54)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(FiCaSchedulerApp.java:856)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:735)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
> at 
> 

[jira] [Commented] (YARN-7757) Refactor NodeLabelsProvider to be more generic and reusable for node attributes providers

2018-02-04 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351991#comment-16351991
 ] 

Weiwei Yang commented on YARN-7757:
---

Thanks [~Naganarasimha], [~sunilg] for all the reviews and comments!

> Refactor NodeLabelsProvider to be more generic and reusable for node 
> attributes providers
> -
>
> Key: YARN-7757
> URL: https://issues.apache.org/jira/browse/YARN-7757
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Blocker
> Fix For: yarn-3409
>
> Attachments: YARN-7757-YARN-3409.001.patch, 
> YARN-7757-YARN-3409.002.patch, YARN-7757-YARN-3409.003.patch, 
> YARN-7757-YARN-3409.004.patch, YARN-7757-YARN-3409.005.patch, 
> YARN-7757-YARN-3409.006.patch, 
> nodeLabelsProvider_refactor_class_hierarchy.pdf, 
> nodeLabelsProvider_refactor_v2.pdf, nodeLabelsProvider_refactor_v3.pdf
>
>
> Propose to do refactor on {{NodeLabelsProvider}}, 
> {{AbstractNodeLabelsProvider}} to be more generic, so node attributes 
> providers can reuse these interface/abstract classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7880) CapacityScheduler$ResourceCommitterService throws NPE when running sls

2018-02-04 Thread Jiandan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351990#comment-16351990
 ] 

Jiandan Yang  commented on YARN-7880:
-

duplicated with YARN-7591

> CapacityScheduler$ResourceCommitterService throws NPE when running sls
> --
>
> Key: YARN-7880
> URL: https://issues.apache.org/jira/browse/YARN-7880
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Jiandan Yang 
>Priority: Major
>
> sls test case: node count = 9000, job count=10k,task num of job = 500, task 
> run time = 100s, but it does not occur when node count = 500 and 2000.
> {code}
> 18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: 
> container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED 
> to RUNNING
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541)
> {code}
> some CapacityScheduler$AsyncScheduleThread also throws NPE
> {code}
> 18/02/02 20:40:34 INFO resourcemanager.DefaultAMSProcessor: AM registration 
> appattempt_1517575125794_4564_01
> 18/02/02 20:40:34 INFO resourcemanager.RMAuditLogger: USER=default  
> OPERATION=Register App Master   TARGET=ApplicationMasterService 
> RESULT=SUCCESS  APPID=application_1517575125794_4564
> APPATTEMPTID=appattempt_1517575125794_4564_01
> Exception in thread "Thread-43" 18/02/02 20:40:34 INFO appmaster.AMSimulator: 
> Register the application master for application application_1517575125794_4564
> 18/02/02 20:40:34 INFO resourcemanager.MockAMLauncher: Notify AM launcher 
> launched:container_1517575125794_4564_01_01
> 18/02/02 20:40:34 INFO rmcontainer.RMContainerImpl: 
> container_1517575125794_2703_01_01 Container Transitioned from ACQUIRED 
> to RUNNING
> 18/02/02 20:40:34 INFO attempt.RMAppAttemptImpl: 
> appattempt_1517575125794_4564_01 State change from ALLOCATED to LAUNCHED 
> on event = LAUNCHED
> 18/02/02 20:40:34 INFO attempt.RMAppAttemptImpl: 
> appattempt_1517575125794_4564_01 State change from LAUNCHED to RUNNING on 
> event = REGISTERED
> 18/02/02 20:40:34 INFO rmapp.RMAppImpl: application_1517575125794_4564 State 
> change from ACCEPTED to RUNNING on event = ATTEMPT_REGISTERED
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceRequests(SchedulerApplicationAttempt.java:1341)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.canAssign(RegularContainerAllocator.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignOffSwitchContainers(RegularContainerAllocator.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainersOnNode(RegularContainerAllocator.java:470)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.tryAllocateOnNode(RegularContainerAllocator.java:252)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.allocate(RegularContainerAllocator.java:816)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(RegularContainerAllocator.java:854)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(ContainerAllocator.java:54)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(FiCaSchedulerApp.java:856)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:735)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
> at 
> 

[jira] [Updated] (YARN-7880) CapacityScheduler$ResourceCommitterService throws NPE when running sls

2018-02-04 Thread Jiandan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  updated YARN-7880:

Description: 
sls test case: node count = 9000, job count=10k,task num of job = 500, task run 
time = 100s, but it does not occur when node count = 500 and 2000.
{code}
18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: 
container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED to 
RUNNING

java.lang.NullPointerException
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541)
{code}
some CapacityScheduler$AsyncScheduleThread also throws NPE
{code}
18/02/02 20:40:34 INFO resourcemanager.DefaultAMSProcessor: AM registration 
appattempt_1517575125794_4564_01
18/02/02 20:40:34 INFO resourcemanager.RMAuditLogger: USER=default  
OPERATION=Register App Master   TARGET=ApplicationMasterService RESULT=SUCCESS  
APPID=application_1517575125794_4564
APPATTEMPTID=appattempt_1517575125794_4564_01
Exception in thread "Thread-43" 18/02/02 20:40:34 INFO appmaster.AMSimulator: 
Register the application master for application application_1517575125794_4564
18/02/02 20:40:34 INFO resourcemanager.MockAMLauncher: Notify AM launcher 
launched:container_1517575125794_4564_01_01
18/02/02 20:40:34 INFO rmcontainer.RMContainerImpl: 
container_1517575125794_2703_01_01 Container Transitioned from ACQUIRED to 
RUNNING
18/02/02 20:40:34 INFO attempt.RMAppAttemptImpl: 
appattempt_1517575125794_4564_01 State change from ALLOCATED to LAUNCHED on 
event = LAUNCHED
18/02/02 20:40:34 INFO attempt.RMAppAttemptImpl: 
appattempt_1517575125794_4564_01 State change from LAUNCHED to RUNNING on 
event = REGISTERED
18/02/02 20:40:34 INFO rmapp.RMAppImpl: application_1517575125794_4564 State 
change from ACCEPTED to RUNNING on event = ATTEMPT_REGISTERED
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceRequests(SchedulerApplicationAttempt.java:1341)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.canAssign(RegularContainerAllocator.java:302)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignOffSwitchContainers(RegularContainerAllocator.java:389)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainersOnNode(RegularContainerAllocator.java:470)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.tryAllocateOnNode(RegularContainerAllocator.java:252)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.allocate(RegularContainerAllocator.java:816)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(RegularContainerAllocator.java:854)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(ContainerAllocator.java:54)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(FiCaSchedulerApp.java:856)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:735)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1343)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1337)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1434)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1199)
at 

[jira] [Updated] (YARN-7880) CapacityScheduler$ResourceCommitterService throws NPE when running sls

2018-02-04 Thread Jiandan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  updated YARN-7880:

Description: 
sls test case: node count = 9000, job count=10k,task num of job = 500, task run 
time = 100s, but it does not occur when node count = 500 and 2000.
{code}
18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: 
container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED to 
RUNNING

java.lang.NullPointerException
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541)
{code}
some CapacityScheduler$AsyncScheduleThread also throws NPE
{code}
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceRequests(SchedulerApplicationAttempt.java:1341)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.canAssign(RegularContainerAllocator.java:302)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignOffSwitchContainers(RegularContainerAllocator.java:389)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainersOnNode(RegularContainerAllocator.java:470)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.tryAllocateOnNode(RegularContainerAllocator.java:252)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.allocate(RegularContainerAllocator.java:816)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(RegularContainerAllocator.java:854)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(ContainerAllocator.java:54)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(FiCaSchedulerApp.java:856)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:735)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1343)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1337)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1434)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1199)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:474)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:501)
{code}

  was:

sls test case: node count = 9000, job count=10k,task num of job = 500, task run 
time = 100s
{code}
18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: 
container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED to 
RUNNING

java.lang.NullPointerException
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541)
{code}
some CapacityScheduler$AsyncScheduleThread also throws NPE
{code}
java.lang.NullPointerException
at 

[jira] [Updated] (YARN-7880) CapacityScheduler$ResourceCommitterService throws NPE when running sls

2018-02-04 Thread Jiandan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  updated YARN-7880:

Description: 

sls test case: node count = 9000, job count=10k,task num of job = 500, task run 
time = 100s
{code}
18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: 
container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED to 
RUNNING

java.lang.NullPointerException
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541)
{code}
some CapacityScheduler$AsyncScheduleThread also throws NPE
{code}
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceRequests(SchedulerApplicationAttempt.java:1341)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.canAssign(RegularContainerAllocator.java:302)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignOffSwitchContainers(RegularContainerAllocator.java:389)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainersOnNode(RegularContainerAllocator.java:470)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.tryAllocateOnNode(RegularContainerAllocator.java:252)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.allocate(RegularContainerAllocator.java:816)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(RegularContainerAllocator.java:854)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(ContainerAllocator.java:54)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(FiCaSchedulerApp.java:856)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:735)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1343)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1337)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1434)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1199)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:474)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:501)
{code}

  was:
sls test case: node count = 9000, job count=10k,task num of job = 500, task run 
time = 100s

{code}
18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: 
container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED to 
RUNNING

java.lang.NullPointerException
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541)
{code}


> CapacityScheduler$ResourceCommitterService throws NPE when running sls
> --
>
> Key: YARN-7880
> URL: 

[jira] [Updated] (YARN-7880) CapacityScheduler$ResourceCommitterService throws NPE when running sls

2018-02-04 Thread Jiandan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  updated YARN-7880:

Component/s: yarn

> CapacityScheduler$ResourceCommitterService throws NPE when running sls
> --
>
> Key: YARN-7880
> URL: https://issues.apache.org/jira/browse/YARN-7880
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Jiandan Yang 
>Priority: Major
>
> sls test case: node count = 9000, job count=10k,task num of job = 500, task 
> run time = 100s
> {code}
> 18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: 
> container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED 
> to RUNNING
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7880) CapacityScheduler$ResourceCommitterService throws NPE when running sls

2018-02-04 Thread Jiandan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  updated YARN-7880:

Affects Version/s: 3.0.0

> CapacityScheduler$ResourceCommitterService throws NPE when running sls
> --
>
> Key: YARN-7880
> URL: https://issues.apache.org/jira/browse/YARN-7880
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jiandan Yang 
>Priority: Major
>
> sls test case: node count = 9000, job count=10k,task num of job = 500, task 
> run time = 100s
> {code}
> 18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: 
> container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED 
> to RUNNING
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7880) CapacityScheduler$ResourceCommitterService throws NPE when running sls

2018-02-04 Thread Jiandan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  updated YARN-7880:

Description: 
sls test case: node count = 9000, job count=10k,task num of job = 500, task run 
time = 100s

{code}
18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: 
container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED to 
RUNNING

java.lang.NullPointerException
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541)
{code}

  was:
{code}
18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: 
container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED to 
RUNNING

java.lang.NullPointerException
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541)
{code}


> CapacityScheduler$ResourceCommitterService throws NPE when running sls
> --
>
> Key: YARN-7880
> URL: https://issues.apache.org/jira/browse/YARN-7880
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jiandan Yang 
>Priority: Major
>
> sls test case: node count = 9000, job count=10k,task num of job = 500, task 
> run time = 100s
> {code}
> 18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: 
> container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED 
> to RUNNING
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7880) CapacityScheduler$ResourceCommitterService throws NPE when running sls

2018-02-04 Thread Jiandan Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  updated YARN-7880:

Summary: CapacityScheduler$ResourceCommitterService throws NPE when running 
sls  (was: FiCaSchedulerApp.commonCheckContainerAllocation throws NPE when 
running sls)

> CapacityScheduler$ResourceCommitterService throws NPE when running sls
> --
>
> Key: YARN-7880
> URL: https://issues.apache.org/jira/browse/YARN-7880
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jiandan Yang 
>Priority: Major
>
> {code}
> 18/02/02 20:54:28 INFO rmcontainer.RMContainerImpl: 
> container_1517575125794_5707_01_86 Container Transitioned from ACQUIRED 
> to RUNNING
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.commonCheckContainerAllocation(FiCaSchedulerApp.java:324)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.accept(FiCaSchedulerApp.java:420)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2506)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:541)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7892) NodeAttributePBImpl does not implement hashcode and Equals properly

2018-02-04 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351910#comment-16351910
 ] 

Naganarasimha G R commented on YARN-7892:
-

[~sunil.gov...@gmail.com], hope this modification is fine ?

> NodeAttributePBImpl does not implement hashcode and Equals properly
> ---
>
> Key: YARN-7892
> URL: https://issues.apache.org/jira/browse/YARN-7892
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Major
> Attachments: YARN-7892-YARN-3409.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7892) NodeAttributePBImpl does not implement hashcode and Equals properly

2018-02-04 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-7892:

Attachment: YARN-7892-YARN-3409.001.patch

> NodeAttributePBImpl does not implement hashcode and Equals properly
> ---
>
> Key: YARN-7892
> URL: https://issues.apache.org/jira/browse/YARN-7892
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Major
> Attachments: YARN-7892-YARN-3409.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7892) NodeAttributePBImpl does not implement hashcode and Equals properly

2018-02-04 Thread Naganarasimha G R (JIRA)
Naganarasimha G R created YARN-7892:
---

 Summary: NodeAttributePBImpl does not implement hashcode and 
Equals properly
 Key: YARN-7892
 URL: https://issues.apache.org/jira/browse/YARN-7892
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7757) Refactor NodeLabelsProvider to be more generic and reusable for node attributes providers

2018-02-04 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351899#comment-16351899
 ] 

Naganarasimha G R commented on YARN-7757:
-

Thanks [~cheersyang] Seems like many of the things we plan to track in other 
jira. Hope we ensure that its tracked and not missed. I am +1  on it. [~sunilg] 
as they were getting blocked on this i have gone ahead and committed it. 

> Refactor NodeLabelsProvider to be more generic and reusable for node 
> attributes providers
> -
>
> Key: YARN-7757
> URL: https://issues.apache.org/jira/browse/YARN-7757
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Blocker
> Attachments: YARN-7757-YARN-3409.001.patch, 
> YARN-7757-YARN-3409.002.patch, YARN-7757-YARN-3409.003.patch, 
> YARN-7757-YARN-3409.004.patch, YARN-7757-YARN-3409.005.patch, 
> YARN-7757-YARN-3409.006.patch, 
> nodeLabelsProvider_refactor_class_hierarchy.pdf, 
> nodeLabelsProvider_refactor_v2.pdf, nodeLabelsProvider_refactor_v3.pdf
>
>
> Propose to do refactor on {{NodeLabelsProvider}}, 
> {{AbstractNodeLabelsProvider}} to be more generic, so node attributes 
> providers can reuse these interface/abstract classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7292) Revisit Resource Profile Behavior

2018-02-04 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351826#comment-16351826
 ] 

genericqa commented on YARN-7292:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
48s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 40s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
6s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
40s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 49s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 5 new + 298 unchanged - 11 fixed = 303 total (was 309) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 21s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
42s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
38s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
8s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
8s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 29s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 28m 24s{color} 
| {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 12m 50s{color} 
| {color:red} hadoop-yarn-applications-distributedshell in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 

[jira] [Commented] (YARN-7891) LogAggregationIndexedFileController should support HAR file

2018-02-04 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351795#comment-16351795
 ] 

genericqa commented on YARN-7891:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 28s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
8s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 63m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7891 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909138/YARN-7891.1.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  xml  findbugs  checkstyle  |
| uname | Linux 2f188757c339 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4e9a59c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/19599/artifact/out/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19599/testReport/ |
| Max. process+thread count | 410 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console 

[jira] [Updated] (YARN-7891) LogAggregationIndexedFileController should support HAR file

2018-02-04 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-7891:

Attachment: YARN-7891.1.patch

> LogAggregationIndexedFileController should support HAR file
> ---
>
> Key: YARN-7891
> URL: https://issues.apache.org/jira/browse/YARN-7891
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-7891.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7292) Revisit Resource Profile Behavior

2018-02-04 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351770#comment-16351770
 ] 

Wangda Tan commented on YARN-7292:
--

Rebased to latest trunk (003)

> Revisit Resource Profile Behavior
> -
>
> Key: YARN-7292
> URL: https://issues.apache.org/jira/browse/YARN-7292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-7292.002.patch, YARN-7292.003.patch, 
> YARN-7292.wip.001.patch
>
>
> Had discussions with [~templedf], [~vvasudev], [~sunilg] offline. There're a 
> couple of resource profile related behaviors might need to be updated:
> 1) Configure resource profile in server side or client side: 
> Currently resource profile can be only configured centrally:
> - Advantages:
> A given resource profile has a the same meaning in the cluster. It won’t 
> change when we run different apps in different configurations. A job can run 
> under Amazon’s G2.8X can also run on YARN with G2.8X profile. A side benefit 
> is YARN scheduler can potentially do better bin packing.
> - Disadvantages: 
> Hard for applications to add their own resource profiles. 
> 2) Do we really need mandatory resource profiles such as 
> minimum/maximum/default? 
> 3) Should we send resource profile name inside ResourceRequest, or should 
> client/AM translate it to resource and set it to the existing resource 
> fields? 
> 4) Related to above, should we allow resource overrides or client/AM should 
> send final resource to RM?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7292) Revisit Resource Profile Behavior

2018-02-04 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-7292:
-
Attachment: YARN-7292.003.patch

> Revisit Resource Profile Behavior
> -
>
> Key: YARN-7292
> URL: https://issues.apache.org/jira/browse/YARN-7292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-7292.002.patch, YARN-7292.003.patch, 
> YARN-7292.wip.001.patch
>
>
> Had discussions with [~templedf], [~vvasudev], [~sunilg] offline. There're a 
> couple of resource profile related behaviors might need to be updated:
> 1) Configure resource profile in server side or client side: 
> Currently resource profile can be only configured centrally:
> - Advantages:
> A given resource profile has a the same meaning in the cluster. It won’t 
> change when we run different apps in different configurations. A job can run 
> under Amazon’s G2.8X can also run on YARN with G2.8X profile. A side benefit 
> is YARN scheduler can potentially do better bin packing.
> - Disadvantages: 
> Hard for applications to add their own resource profiles. 
> 2) Do we really need mandatory resource profiles such as 
> minimum/maximum/default? 
> 3) Should we send resource profile name inside ResourceRequest, or should 
> client/AM translate it to resource and set it to the existing resource 
> fields? 
> 4) Related to above, should we allow resource overrides or client/AM should 
> send final resource to RM?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7655) avoid AM preemption caused by RRs for specific nodes or racks

2018-02-04 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351744#comment-16351744
 ] 

genericqa commented on YARN-7655:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 29s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 37s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 26s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}112m 19s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServiceAppsNodelabel |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7655 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909131/YARN-7655-002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux eb1f0bd357ce 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4e9a59c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/19596/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19596/testReport/ |
| Max. process+thread count | 881 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 

[jira] [Commented] (YARN-7292) Revisit Resource Profile Behavior

2018-02-04 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351712#comment-16351712
 ] 

genericqa commented on YARN-7292:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  4s{color} 
| {color:red} YARN-7292 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-7292 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909132/YARN-7292.002.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/19597/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Revisit Resource Profile Behavior
> -
>
> Key: YARN-7292
> URL: https://issues.apache.org/jira/browse/YARN-7292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-7292.002.patch, YARN-7292.wip.001.patch
>
>
> Had discussions with [~templedf], [~vvasudev], [~sunilg] offline. There're a 
> couple of resource profile related behaviors might need to be updated:
> 1) Configure resource profile in server side or client side: 
> Currently resource profile can be only configured centrally:
> - Advantages:
> A given resource profile has a the same meaning in the cluster. It won’t 
> change when we run different apps in different configurations. A job can run 
> under Amazon’s G2.8X can also run on YARN with G2.8X profile. A side benefit 
> is YARN scheduler can potentially do better bin packing.
> - Disadvantages: 
> Hard for applications to add their own resource profiles. 
> 2) Do we really need mandatory resource profiles such as 
> minimum/maximum/default? 
> 3) Should we send resource profile name inside ResourceRequest, or should 
> client/AM translate it to resource and set it to the existing resource 
> fields? 
> 4) Related to above, should we allow resource overrides or client/AM should 
> send final resource to RM?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7292) Revisit Resource Profile Behavior

2018-02-04 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351707#comment-16351707
 ] 

Wangda Tan commented on YARN-7292:
--

Attached ver.2 patch, fixed reported issues and completed all changes (i think 
so).

> Revisit Resource Profile Behavior
> -
>
> Key: YARN-7292
> URL: https://issues.apache.org/jira/browse/YARN-7292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-7292.002.patch, YARN-7292.wip.001.patch
>
>
> Had discussions with [~templedf], [~vvasudev], [~sunilg] offline. There're a 
> couple of resource profile related behaviors might need to be updated:
> 1) Configure resource profile in server side or client side: 
> Currently resource profile can be only configured centrally:
> - Advantages:
> A given resource profile has a the same meaning in the cluster. It won’t 
> change when we run different apps in different configurations. A job can run 
> under Amazon’s G2.8X can also run on YARN with G2.8X profile. A side benefit 
> is YARN scheduler can potentially do better bin packing.
> - Disadvantages: 
> Hard for applications to add their own resource profiles. 
> 2) Do we really need mandatory resource profiles such as 
> minimum/maximum/default? 
> 3) Should we send resource profile name inside ResourceRequest, or should 
> client/AM translate it to resource and set it to the existing resource 
> fields? 
> 4) Related to above, should we allow resource overrides or client/AM should 
> send final resource to RM?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7292) Revisit Resource Profile Behavior

2018-02-04 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-7292:
-
Attachment: YARN-7292.002.patch

> Revisit Resource Profile Behavior
> -
>
> Key: YARN-7292
> URL: https://issues.apache.org/jira/browse/YARN-7292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-7292.002.patch, YARN-7292.wip.001.patch
>
>
> Had discussions with [~templedf], [~vvasudev], [~sunilg] offline. There're a 
> couple of resource profile related behaviors might need to be updated:
> 1) Configure resource profile in server side or client side: 
> Currently resource profile can be only configured centrally:
> - Advantages:
> A given resource profile has a the same meaning in the cluster. It won’t 
> change when we run different apps in different configurations. A job can run 
> under Amazon’s G2.8X can also run on YARN with G2.8X profile. A side benefit 
> is YARN scheduler can potentially do better bin packing.
> - Disadvantages: 
> Hard for applications to add their own resource profiles. 
> 2) Do we really need mandatory resource profiles such as 
> minimum/maximum/default? 
> 3) Should we send resource profile name inside ResourceRequest, or should 
> client/AM translate it to resource and set it to the existing resource 
> fields? 
> 4) Related to above, should we allow resource overrides or client/AM should 
> send final resource to RM?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7655) avoid AM preemption caused by RRs for specific nodes or racks

2018-02-04 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351704#comment-16351704
 ] 

Steven Rand commented on YARN-7655:
---

Thanks [~yufeigu], new patch is attached. 

Unfortunately I'm still struggling to have the starved app be allocated the 
right number of containers in the test (though the preemption part happens 
correctly). The details of that are in my first comment above. It seems like 
the options are:

* What the current patch does, which is just leave a TODO above where we check 
for allocation.
* Only test that the preemption went as expected, and don't test allocation, 
i.e., don't call {{verifyPreemption}}.
* Find a way to have the allocation work out while still guaranteeing that the 
RR we consider for preemption is the {{NODE_LOCAL}} one. I thought I'd be able 
to figure this out, but have to admit I've been unsuccessful.

> avoid AM preemption caused by RRs for specific nodes or racks
> -
>
> Key: YARN-7655
> URL: https://issues.apache.org/jira/browse/YARN-7655
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.0
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Major
> Attachments: YARN-7655-001.patch, YARN-7655-002.patch
>
>
> We frequently see AM preemptions when 
> {{starvedApp.getStarvedResourceRequests()}} in 
> {{FSPreemptionThread#identifyContainersToPreempt}} includes one or more RRs 
> that request containers on a specific node. Since this causes us to only 
> consider one node to preempt containers on, the really good work that was 
> done in YARN-5830 doesn't save us from AM preemption. Even though there might 
> be multiple nodes on which we could preempt enough non-AM containers to 
> satisfy the app's starvation, we often wind up preempting one or more AM 
> containers on the single node that we're considering.
> A proposed solution is that if we're going to preempt one or more AM 
> containers for an RR that specifies a node or rack, then we should instead 
> expand the search space to consider all nodes. That way we take advantage of 
> YARN-5830, and only preempt AMs if there's no alternative. I've attached a 
> patch with an initial implementation of this. We've been running it on a few 
> clusters, and have seen AM preemptions drop from double-digit occurrences on 
> many days to zero.
> Of course, the tradeoff is some loss of locality, since the starved app is 
> less likely to be allocated resources at the most specific locality level 
> that it asked for. My opinion is that this tradeoff is worth it, but 
> interested to hear what others think as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7655) avoid AM preemption caused by RRs for specific nodes or racks

2018-02-04 Thread Steven Rand (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rand updated YARN-7655:
--
Attachment: YARN-7655-002.patch

> avoid AM preemption caused by RRs for specific nodes or racks
> -
>
> Key: YARN-7655
> URL: https://issues.apache.org/jira/browse/YARN-7655
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.0
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Major
> Attachments: YARN-7655-001.patch, YARN-7655-002.patch
>
>
> We frequently see AM preemptions when 
> {{starvedApp.getStarvedResourceRequests()}} in 
> {{FSPreemptionThread#identifyContainersToPreempt}} includes one or more RRs 
> that request containers on a specific node. Since this causes us to only 
> consider one node to preempt containers on, the really good work that was 
> done in YARN-5830 doesn't save us from AM preemption. Even though there might 
> be multiple nodes on which we could preempt enough non-AM containers to 
> satisfy the app's starvation, we often wind up preempting one or more AM 
> containers on the single node that we're considering.
> A proposed solution is that if we're going to preempt one or more AM 
> containers for an RR that specifies a node or rack, then we should instead 
> expand the search space to consider all nodes. That way we take advantage of 
> YARN-5830, and only preempt AMs if there's no alternative. I've attached a 
> patch with an initial implementation of this. We've been running it on a few 
> clusters, and have seen AM preemptions drop from double-digit occurrences on 
> many days to zero.
> Of course, the tradeoff is some loss of locality, since the starved app is 
> less likely to be allocated resources at the most specific locality level 
> that it asked for. My opinion is that this tradeoff is worth it, but 
> interested to hear what others think as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7872) labeled node cannot be used to satisfy locality specified request

2018-02-04 Thread Yuqi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351675#comment-16351675
 ] 

Yuqi Wang commented on YARN-7872:
-

[~leftnoteasy], could you please take a look at this? :)

Appreciate your insights!

> labeled node cannot be used to satisfy locality specified request
> -
>
> Key: YARN-7872
> URL: https://issues.apache.org/jira/browse/YARN-7872
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>Priority: Blocker
> Fix For: 2.7.2
>
> Attachments: YARN-7872-branch-2.7.2.001.patch
>
>
> *Issue summary:*
> labeled node (i.e. node with 'not empty' node label) cannot be used to 
> satisfy locality specified request (i.e. container request with 'not ANY' 
> resource name and the relax locality is false).
>  
> *For example:*
> The node with available resource:
> [Resource: [MemoryMB: [100] CpuNumber: [12]] {color:#14892c}NodeLabel: 
> [persistent]{color} {color:#f79232}HostName: \{SRG}{color} RackName: 
> \{/default-rack}]
> The container request:
>  [Priority: [1] Resource: [MemoryMB: [1] CpuNumber: [1]] 
> {color:#14892c}NodeLabel: [null]{color} {color:#f79232}HostNames: 
> \{SRG}{color} RackNames: {} {color:#59afe1}RelaxLocality: [false]{color}]
> Current RM capacity scheduler's behavior is that (at least for version 2.7 
> and 2.8), the node cannot allocate container for the request, because the 
> node label is not matched when the leaf queue assign container.
>  
> *Possible solution:*
> However, node locality and node label should be two orthogonal dimensions to 
> select candidate nodes for container request. And the node label matching 
> should only be executed for container request with ANY resource name, since 
> only this kind of container request is allowed to have 'not empty' node label.
> So, for container request with 'not ANY' resource name (so, we clearly know 
> it should not have node label), we should use the requested resource name to 
> match with the node instead of using the requested node label to match with 
> the node. And this resource name matching should be safe, since the node 
> whose node label is not accessible for the queue will not be sent to the leaf 
> queue.
>  
> *Discussion:*
> Attachment is the fix according to this principle, please help to review.
> Without it, we cannot use locality to request container within these labeled 
> nodes.
> If the fix is acceptable, we should also recheck whether the same issue 
> happens in trunk and other hadoop versions.
> If not acceptable (i.e. the current behavior is by designed), so, how can we 
> use locality to request container within these labeled nodes?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org