[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-09-04 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603634#comment-16603634
 ] 

Konstantin Shvachko commented on YARN-8200:
---

I was trying to build YARN-8200 branch with this build:
https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-branch2-java7-linux-x86-jhung/8/console
And it is failing similar to HADOOP-15644. I think YARN-8200 branch need to be 
rebased to latest branch-2.

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: 
> counter.scheduler.operation.allocate.csv.defaultResources, 
> counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json
>
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-05-02 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461622#comment-16461622
 ] 

Konstantin Shvachko commented on YARN-8200:
---

Hey guys I rebased branch YARN-8200 onto branch-2 and pushed [~jhung]'s commits 
into it.
Please take a look. Testing is in progress as I hear.

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-04-30 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459363#comment-16459363
 ] 

Konstantin Shvachko commented on YARN-8200:
---

[~sunilg], thanks for the hints on the benchmarks.
Also I agree we should branch off of branch-2 rather than 2.9. Will re-branch.

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-04-25 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453121#comment-16453121
 ] 

Konstantin Shvachko commented on YARN-8200:
---

Cut a branch for this jira out of branch-2.9. [~jhung] could you please merge 
your packports there.
[~sunilg], [~templedf] could you please advise on the tools for measuring 
performance impact for Capacity Scheduler and Resource Manager.

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-04-24 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16451384#comment-16451384
 ] 

Konstantin Shvachko commented on YARN-8200:
---

What people think if we create a branch so that Jonathan could apply his work 
on the backporting?
That way we can make this discussion more material.
Also you guys will be able to try it and see if it fits your requirements.

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-04-23 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449000#comment-16449000
 ] 

Konstantin Shvachko commented on YARN-8200:
---

Hey [~leftnoteasy], we discussed it in [this 
thread|https://lists.apache.org/thread.html/6e200891756aefbfd8b36cd1d9f22f99626284b656671ab719ee1496@%3Chdfs-dev.hadoop.apache.org%3E]
 some time ago.
Clearly we want everybody on the same code base, but its a challenge to get 
there. So the thread proposed to build a bridge release, to help cross over to 
3.

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Closed] (YARN-7249) Fix CapacityScheduler NPE issue when a container preempted while the node is being removed

2018-04-19 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko closed YARN-7249.
-

> Fix CapacityScheduler NPE issue when a container preempted while the node is 
> being removed
> --
>
> Key: YARN-7249
> URL: https://issues.apache.org/jira/browse/YARN-7249
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.1, 2.7.5
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 2.8.2, 2.7.6
>
> Attachments: YARN-7249.branch-2.8.001.patch
>
>
> This issue could happen when 3 conditions satisfied:
> 1) A node is removing from scheduler.
> 2) A container running on the node is being preempted. 
> 3) A rare race condition causes scheduler pass a null node to leaf queue.
> Fix of the problem is to add a null node check inside CapacityScheduler.
> Stack trace:
> {code}
> 2017-08-31 02:51:24,748 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(714)) - Error in handling event type 
> KILL_RESERVED_CONTAINER to the scheduler 
> java.lang.NullPointerException 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1308)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1469)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:497)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.killReservedContainer(CapacityScheduler.java:1505)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1341)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:127)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:705)
>  
> {code}
> This is an issue only existed in 2.8.x



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7249) Fix CapacityScheduler NPE issue when a container preempted while the node is being removed

2018-03-30 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved YARN-7249.
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.7.6

Just committed this to branch-2.7. Resolving.

> Fix CapacityScheduler NPE issue when a container preempted while the node is 
> being removed
> --
>
> Key: YARN-7249
> URL: https://issues.apache.org/jira/browse/YARN-7249
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.1, 2.7.5
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 2.7.6, 2.8.2
>
> Attachments: YARN-7249.branch-2.8.001.patch
>
>
> This issue could happen when 3 conditions satisfied:
> 1) A node is removing from scheduler.
> 2) A container running on the node is being preempted. 
> 3) A rare race condition causes scheduler pass a null node to leaf queue.
> Fix of the problem is to add a null node check inside CapacityScheduler.
> Stack trace:
> {code}
> 2017-08-31 02:51:24,748 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(714)) - Error in handling event type 
> KILL_RESERVED_CONTAINER to the scheduler 
> java.lang.NullPointerException 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1308)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1469)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:497)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.killReservedContainer(CapacityScheduler.java:1505)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1341)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:127)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:705)
>  
> {code}
> This is an issue only existed in 2.8.x



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4167) NPE on RMActiveServices#serviceStop when store is null

2017-12-20 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-4167:
--
Fix Version/s: 2.7.6

> NPE on RMActiveServices#serviceStop when store is null
> --
>
> Key: YARN-4167
> URL: https://issues.apache.org/jira/browse/YARN-4167
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha1, 2.7.6
>
> Attachments: 0001-YARN-4167.patch, 0001-YARN-4167.patch, 
> 0002-YARN-4167.patch
>
>
> Configure 
> {{yarn.resourcemanager.container-tokens.master-key-rolling-interval-secs}} 
> mismatching with {{yarn.nm.liveness-monitor.expiry-interval-ms}}
> On startup NPE is thrown on {{RMActiveServices#serviceStop}}
> {noformat}
> 2015-09-16 12:23:29,504 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state INITED; cause: 
> java.lang.IllegalArgumentException: 
> yarn.resourcemanager.container-tokens.master-key-rolling-interval-secs should 
> be more than 3 X yarn.nm.liveness-monitor.expiry-interval-ms
> java.lang.IllegalArgumentException: 
> yarn.resourcemanager.container-tokens.master-key-rolling-interval-secs should 
> be more than 3 X yarn.nm.liveness-monitor.expiry-interval-ms
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.(RMContainerTokenSecretManager.java:82)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMSecretManagerService.createContainerTokenSecretManager(RMSecretManagerService.java:109)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.RMSecretManagerService.(RMSecretManagerService.java:57)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createRMSecretManagerService(ResourceManager.java:)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:423)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:963)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1193)
> 2015-09-16 12:23:29,507 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error closing 
> store.
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:608)
>  at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>  at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>  at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:963)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1193
> {noformat}
> *Impact Area*: RM failover with wrong configuration



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3425) NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit failed

2017-12-20 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-3425:
--
Fix Version/s: 2.7.6

> NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit 
> failed
> --
>
> Key: YARN-3425
> URL: https://issues.apache.org/jira/browse/YARN-3425
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
> Environment: 1 RM, 1 NM , 1 NN , I DN
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha1, 2.7.6
>
> Attachments: YARN-3425.001.patch
>
>
> Configure yarn.node-labels.enabled to true 
> and yarn.node-labels.fs-store.root-dir /node-labels
> Start resource manager without starting DN/NM
> {quote}
> 2015-03-31 16:44:13,782 WARN org.apache.hadoop.service.AbstractService: When 
> stopping the service 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager : 
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:261)
>   at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:267)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:556)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:984)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:251)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1207)
> {quote}
> {code}
>  protected void stopDispatcher() {
> AsyncDispatcher asyncDispatcher = (AsyncDispatcher) dispatcher;
>asyncDispatcher.stop(); 
>   }
> {code}
> Null check missing during stop



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6632) Backport YARN-3425 to branch 2.7

2017-12-18 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295727#comment-16295727
 ] 

Konstantin Shvachko commented on YARN-6632:
---

Hey [~elgoiri] can you change the status to "Patch Available"?
Otherwise Jenkins [refuses to 
run|https://builds.apache.org/job/PreCommit-YARN-Build/18957/console].

> Backport YARN-3425 to branch 2.7
> 
>
> Key: YARN-6632
> URL: https://issues.apache.org/jira/browse/YARN-6632
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: YARN-3425-branch-2.7.patch
>
>
> NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit 
> failed



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6632) Backport YARN-3425 to branch 2.7

2017-12-15 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-6632:
--
Issue Type: Bug  (was: Task)

> Backport YARN-3425 to branch 2.7
> 
>
> Key: YARN-6632
> URL: https://issues.apache.org/jira/browse/YARN-6632
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: YARN-3425-branch-2.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-12-07 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-6959:
--
Fix Version/s: (was: 2.7.1)
   2.7.5

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 3.0.0-beta1, 2.7.5
>
> Attachments: YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.7.006.patch, YARN-6959-branch-2.8.001.patch, 
> YARN-6959-branch-2.8.002.patch, YARN-6959.005.patch, 
> YARN-6959.yarn_nm.log.zip, YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3687) We should be able to remove node-label if there's no queue can use it.

2017-12-04 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-3687:
--
Target Version/s:   (was: 2.7.5)

> We should be able to remove node-label if there's no queue can use it.
> --
>
> Key: YARN-3687
> URL: https://issues.apache.org/jira/browse/YARN-3687
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>
> Currently, we cannot remove node label from the cluster if there's no queue 
> configure it, but actually we should be able to remove it if capacity on the 
> node label in root queue is 0. This can avoid painful when user wants to 
> reconfigure node label.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6002) Range support when serving a container log

2017-08-06 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-6002:
--
Fix Version/s: (was: 2.7.4)

> Range support when serving a container log
> --
>
> Key: YARN-6002
> URL: https://issues.apache.org/jira/browse/YARN-6002
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.3
>Reporter: Raul Gutierrez Segales
>Priority: Minor
> Attachments: YARN-6002-branch-2.7.001.patch
>
>
> Currently, when we access a container's logfile (via 
> /ws/v1/node/containerlogs/) we can only get the full content, not a specified 
> range.
> Support for the Range: header would improve this, special when dealing with 
> big files that need to be tailed or streamed, i.e.:
> {code}
> curl -H 'Range: bytes=10240' \
>   http://node:8042/ws/v1/node/containerlogs/container_XXX/stdout
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6728) Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.

2017-08-06 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-6728:
--
Fix Version/s: (was: 2.7.4)
   (was: 2.9.0)

> Job will run slow when the performance of defaultFs degrades and the 
> log-aggregation is enable. 
> 
>
> Key: YARN-6728
> URL: https://issues.apache.org/jira/browse/YARN-6728
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 2.7.1
> Environment: CentOS 7.1 hadoop-2.7.1
>Reporter: zhengchenyu
> Attachments: YARN-6728.patch.00_branch-2.7
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> In our cluster, I found many map keep "NEW" state  for several minutes. Here 
> I got the container log: 
> {code}
> [2017-06-13T18:21:23.068+08:00] [INFO] 
> containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
> 304) [AsyncDispatcher event handler] : Adding 
> container_1495632926847_2459604_01_11 to application 
> application_1495632926847_2459604
> [2017-06-13T18:23:08.715+08:00] [INFO] 
> containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
> [AsyncDispatcher event handler] : Container 
> container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
> {code}
> Then I search the log from 18:21:23.068 to 18:23:08.715. I found some 
> dispatch of  AsyncDispather run slow, because they visit the defaultFs. Our 
> cluster increase to 4k node, the pressure of defaultFs increase.  (Note: 
> log-aggregation is enable. )
> Container runs in nodemanager will invoke initApp(), then invoke 
> verifyAndCreateRemoteLogDir and mkdir remote log, these operation will visit 
> the defaultFs. So the container will be stuck here. Then application will run 
> slow.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6568) A queue which runs a long time job couldn't acquire any container for long time.

2017-08-06 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-6568:
--
Fix Version/s: (was: 2.7.4)

> A queue which runs a long time job couldn't acquire any container for long 
> time.
> 
>
> Key: YARN-6568
> URL: https://issues.apache.org/jira/browse/YARN-6568
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
> Environment: CentOS 7.1
>Reporter: zhengchenyu
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> In our cluster, we find some applications couldn't acquire any container for 
> long time. (Note: we use FairSharePolicy and FairScheduler)
> First, I found some unreasonable configuration, we set minRes=maxRes. So some 
> application keep pending for long time, we kill some large applicaiton to 
> solve this problem. Then we changed this configuration, this problem 
> relieves. 
> But this problem is not completely solved. In our cluster, I found 
> applications in  some queue which request few container keep pending for long 
> time. 
> I simulate in test cluster. I submit DistributedShell application which run 
> many loo applications to queueA, then I submit my own yarn application which 
> request container and release container constantly to queueB.  At this time, 
> any applicaitons which are submmited to queueA keep pending!
> We know this is the problem of FairSharePolicy, it consider the request of 
> queue. So after sort the queues, some queues which have few request are 
> ordered last all time.
> We know if the AM container is launched, then the request will increase, But 
> FairSharePolicy can't distinguish which request is AM request. I think if am 
> container is assigned, the problem is solved. 
> Our companion discuss this problem. we recommend set a timeout for queue, it 
> means the time length of a queue is not assigned. If timeout, we set this 
> queue to the first place of queues list. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6698) Backport YARN-5121 to branch-2.7

2017-07-27 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-6698:
--
Fix Version/s: (was: 2.7.4)

> Backport YARN-5121 to branch-2.7
> 
>
> Key: YARN-6698
> URL: https://issues.apache.org/jira/browse/YARN-6698
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Blocker
> Attachments: YARN-6698-branch-2.7-01.patch, 
> YARN-6698-branch-2.7-test.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6818) User limit per partition is not honored in branch-2.7 >=

2017-07-17 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-6818:
--
Labels:   (was: release-blocker)

> User limit per partition is not honored in branch-2.7 >=
> 
>
> Key: YARN-6818
> URL: https://issues.apache.org/jira/browse/YARN-6818
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
> Fix For: 2.7.4
>
> Attachments: YARN-6818-branch-2.7.001.patch, 
> YARN-6818-branch-2.7.002.patch
>
>
> We are seeing an issue where user limit factor does not cap the amount of 
> resources a user can consume in a queue in a partition. Suppose you have a 
> queue with access to partition X, used resources in default partition is 0, 
> and used resources in partition X is at the partition's user limit. This is 
> the problematic code as far as I can tell: (in LeafQueue.java){noformat}
> if (Resources
> .greaterThan(resourceCalculator, clusterResource,
> user.getUsed(label),
> limit)) {
>   // if enabled, check to see if could we potentially use this node 
> instead
>   // of a reserved node if the application has reserved containers
>   if (this.reservationsContinueLooking) {
> if (Resources.lessThanOrEqual(
> resourceCalculator,
> clusterResource,
> Resources.subtract(user.getUsed(), 
> application.getCurrentReservation()),
> limit)) {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("User " + userName + " in queue " + getQueueName()
> + " will exceed limit based on reservations - " + " consumed: 
> "
> + user.getUsed() + " reserved: "
> + application.getCurrentReservation() + " limit: " + limit);
>   }
>   Resource amountNeededToUnreserve = 
> Resources.subtract(user.getUsed(label), limit);
>   // we can only acquire a new container if we unreserve first since 
> we ignored the
>   // user limit. Choose the max of user limit or what was previously 
> set by max
>   // capacity.
>   
> currentResoureLimits.setAmountNeededUnreserve(Resources.max(resourceCalculator,
>   clusterResource, 
> currentResoureLimits.getAmountNeededUnreserve(),
>   amountNeededToUnreserve));
>   return true;
> }
>   }
>   if (LOG.isDebugEnabled()) {
> LOG.debug("User " + userName + " in queue " + getQueueName()
> + " will exceed limit - " + " consumed: "
> + user.getUsed() + " limit: " + limit);
>   }
>   return false;
> }
> {noformat}
> First it sees the used resources in partition X is greater than partition's 
> user limit. Then the reservation check also succeeds because it is checking 
> {{user.getUsed() - application.getCurrentReservation() <= limit}} and returns 
> true.
> One fix is to just set {{Resources.subtract(user.getUsed(), 
> application.getCurrentReservation())}} to 
> {{Resources.subtract(user.getUsed(label), 
> application.getCurrentReservation())}}.
> This doesn't seem to be a problem in branch-2.8 and higher since YARN-3356 
> introduces this check: {noformat}  if (this.reservationsContinueLooking 
> && checkReservations
>   && label.equals(CommonNodeLabelsManager.NO_LABEL)) {{noformat}
> so in this case getting the used resources in default partition seems to be 
> correct.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5988) RM unable to start in secure setup

2017-06-15 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-5988:
--
Attachment: (was: YARN-5988-branch-2.8.05.patch)

> RM unable to start in secure setup
> --
>
> Key: YARN-5988
> URL: https://issues.apache.org/jira/browse/YARN-5988
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Blocker
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha2, 2.8.2
>
> Attachments: hadoop-secureuser-resourcemanager-vm1.log, 
> YARN-5988.01.patch, YARN-5988.02.patch, YARN-5988.03.patch, 
> YARN-5988.04.patch, YARN-5988.05.patch, YARN-5988-branch-2.7.05.patch, 
> YARN-5988-branch-2.8.0001.patch
>
>
> When CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION=true
> RM is unable to start



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5988) RM unable to start in secure setup

2017-06-15 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-5988:
--
Attachment: YARN-5988-branch-2.8.0001.patch

> RM unable to start in secure setup
> --
>
> Key: YARN-5988
> URL: https://issues.apache.org/jira/browse/YARN-5988
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Blocker
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha2, 2.8.2
>
> Attachments: hadoop-secureuser-resourcemanager-vm1.log, 
> YARN-5988.01.patch, YARN-5988.02.patch, YARN-5988.03.patch, 
> YARN-5988.04.patch, YARN-5988.05.patch, YARN-5988-branch-2.7.05.patch, 
> YARN-5988-branch-2.8.0001.patch, YARN-5988-branch-2.8.05.patch
>
>
> When CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION=true
> RM is unable to start



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5121) fix some container-executor portability issues

2017-06-09 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-5121:
--
Target Version/s: 3.0.0-alpha2, 2.7.4  (was: 2.7.4, 3.0.0-alpha2)
  Labels: security  (was: release-blocker security)
   Fix Version/s: 2.7.4

Committed this to branch-2.7. Thank you [~ajisakaa] for backport.
Could you please attach the final patch for branch-2.7 from YARN-6698 here.

> fix some container-executor portability issues
> --
>
> Key: YARN-5121
> URL: https://issues.apache.org/jira/browse/YARN-5121
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, security
>Affects Versions: 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Blocker
>  Labels: security
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha1
>
> Attachments: YARN-5121.00.patch, YARN-5121.01.patch, 
> YARN-5121.02.patch, YARN-5121.03.patch, YARN-5121.04.patch, 
> YARN-5121.06.patch, YARN-5121.07.patch, YARN-5121.08.patch
>
>
> container-executor has some issues that are preventing it from even compiling 
> on the OS X jenkins instance.  Let's fix those.  While we're there, let's 
> also try to take care of some of the other portability problems that have 
> crept in over the years, since it used to work great on Solaris but now 
> doesn't.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1471) The SLS simulator is not running the preemption policy for CapacityScheduler

2017-06-05 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-1471:
--
   Labels:   (was: release-blocker)
Fix Version/s: 2.8.2
   2.7.4
   2.9.0

I double-checked all SLS tests pass.
Just committed this to branch-2, -2.8, and -2.7. Thank you [~zhouyejoe] for 
backports.

> The SLS simulator is not running the preemption policy for CapacityScheduler
> 
>
> Key: YARN-1471
> URL: https://issues.apache.org/jira/browse/YARN-1471
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Minor
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha1, 2.8.2
>
> Attachments: SLSCapacityScheduler.java, YARN-1471.2.patch, 
> YARN-1471-branch-2.7.4.patch, YARN-1471-branch-2.8.patch, 
> YARN-1471-branch-2.patch, YARN-1471.patch, YARN-1471.patch
>
>
> The simulator does not run the ProportionalCapacityPreemptionPolicy monitor.  
> This is because the policy needs to interact with a CapacityScheduler, and 
> the wrapping done by the simulator breaks this. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5333) Some recovered apps are put into default queue when RM HA

2017-06-05 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-5333:
--
Labels:   (was: release-blocker)

> Some recovered apps are put into default queue when RM HA
> -
>
> Key: YARN-5333
> URL: https://issues.apache.org/jira/browse/YARN-5333
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha1, 2.8.2
>
> Attachments: YARN-5333.01.patch, YARN-5333.02.patch, 
> YARN-5333.03.patch, YARN-5333.04.patch, YARN-5333.05.patch, 
> YARN-5333.06.patch, YARN-5333.07.patch, YARN-5333.08.patch, 
> YARN-5333.09.patch, YARN-5333.10.patch
>
>
> Enable RM HA and use FairScheduler, 
> {{yarn.scheduler.fair.allow-undeclared-pools}} is set to false, 
> {{yarn.scheduler.fair.user-as-default-queue}} is set to false.
> Reproduce steps:
> 1. Start two RMs.
> 2. After RMs are running, change both RM's file 
> {{etc/hadoop/fair-scheduler.xml}}, then add some queues.
> 3. Submit some apps to the new added queues.
> 4. Stop the active RM, then the standby RM will transit to active and recover 
> apps.
> However the new active RM will put recovered apps into default queue because 
> it might have not loaded the new {{fair-scheduler.xml}}. We need call 
> {{initScheduler}} before start active services or bring {{refreshAll()}} in 
> front of {{rm.transitionToActive()}}. *It seems it is also important for 
> other scheduler*.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5988) RM unable to start in secure setup

2017-06-04 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-5988:
--
Labels:   (was: release-blocker)

> RM unable to start in secure setup
> --
>
> Key: YARN-5988
> URL: https://issues.apache.org/jira/browse/YARN-5988
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Blocker
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha2, 2.8.2
>
> Attachments: hadoop-secureuser-resourcemanager-vm1.log, 
> YARN-5988.01.patch, YARN-5988.02.patch, YARN-5988.03.patch, 
> YARN-5988.04.patch, YARN-5988.05.patch, YARN-5988-branch-2.7.05.patch, 
> YARN-5988-branch-2.8.05.patch
>
>
> When CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION=true
> RM is unable to start



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5988) RM unable to start in secure setup

2017-06-04 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036377#comment-16036377
 ] 

Konstantin Shvachko commented on YARN-5988:
---

OK, great! Could you please replace my patches with the ones committed under 
YARN-6664.

> RM unable to start in secure setup
> --
>
> Key: YARN-5988
> URL: https://issues.apache.org/jira/browse/YARN-5988
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Blocker
>  Labels: release-blocker
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha2, 2.8.2
>
> Attachments: hadoop-secureuser-resourcemanager-vm1.log, 
> YARN-5988.01.patch, YARN-5988.02.patch, YARN-5988.03.patch, 
> YARN-5988.04.patch, YARN-5988.05.patch, YARN-5988-branch-2.7.05.patch, 
> YARN-5988-branch-2.8.05.patch
>
>
> When CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION=true
> RM is unable to start



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5988) RM unable to start in secure setup

2017-06-03 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-5988:
--
Attachment: YARN-5988-branch-2.8.05.patch
YARN-5988-branch-2.7.05.patch

Attaching backport patches for branch-2.8 and branch-2.7.
Please review.

> RM unable to start in secure setup
> --
>
> Key: YARN-5988
> URL: https://issues.apache.org/jira/browse/YARN-5988
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Blocker
>  Labels: release-blocker
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: hadoop-secureuser-resourcemanager-vm1.log, 
> YARN-5988.01.patch, YARN-5988.02.patch, YARN-5988.03.patch, 
> YARN-5988.04.patch, YARN-5988.05.patch, YARN-5988-branch-2.7.05.patch, 
> YARN-5988-branch-2.8.05.patch
>
>
> When CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION=true
> RM is unable to start



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4250) NPE in AppSchedulingInfo#isRequestLabelChanged

2017-06-03 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-4250:
--
   Labels:   (was: release-blocker)
Fix Version/s: 2.7.4

I just committed this to branch-2.7.

> NPE in AppSchedulingInfo#isRequestLabelChanged
> --
>
> Key: YARN-4250
> URL: https://issues.apache.org/jira/browse/YARN-4250
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.8.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha1
>
> Attachments: YARN-4250-002.patch, YARN-4250-003.patch, 
> YARN-4250-004.patch, YARN-4250.patch
>
>
>  *Trace* 
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.isRequestLabelChanged(AppSchedulingInfo.java:420)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:342)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:300)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.allocate(FifoScheduler.java:350)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyFifoScheduler.allocate(TestAMRMClientOnRMRestart.java:544)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:507)
>   at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:277)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientResendsRequestsOnRMRestart(TestAMRMClientOnRMRestart.java:187)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2017-06-03 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-4140:
--
Labels:   (was: release-blocker)

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch, 
> YARN-4140-branch-2.7.001.patch, YARN-4140-branch-2.7.002.patch, 
> YARN-4140-branch-2.7.002-YARN-4250.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
> {code}
> dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1>
>  cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | grep 
> "root.b.b1" | wc -l
> 500
> {code}

[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2017-06-02 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035723#comment-16035723
 ] 

Konstantin Shvachko commented on YARN-4140:
---

The backport patch looks good. Minor nits in 
{{TestNodeLabelContainerAllocation}}:
# you can remove 3 unused imports: Arrays, HashSet, RMNode
# and remove unused variable nm1 in testResourceRequestUpdateNodePartitions

Otherwise LGTM.

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>  Labels: release-blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch, 
> YARN-4140-branch-2.7.001.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {c

[jira] [Updated] (YARN-4925) ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression

2017-06-02 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-4925:
--
   Labels:   (was: release-blocker)
Fix Version/s: 2.7.4

Just committed this to branch-2.7. Thank you [~jhung] for the backport.

> ContainerRequest in AMRMClient, application should be able to specify 
> nodes/racks together with nodeLabelExpression
> ---
>
> Key: YARN-4925
> URL: https://issues.apache.org/jira/browse/YARN-4925
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4925.patch, 0002-YARN-4925.patch, 
> YARN-4925-branch-2.7.001.patch
>
>
> Currently with nodelabel AMRMClient will not be able to specify nodelabels 
> with Node/Rack requests.For application like spark NODE_LOCAL requests cannot 
> be asked with label expression.
> As per the check in  {{AMRMClientImpl#checkNodeLabelExpression}}
> {noformat}
> // Don't allow specify node label against ANY request
> if ((containerRequest.getRacks() != null && 
> (!containerRequest.getRacks().isEmpty()))
> || 
> (containerRequest.getNodes() != null && 
> (!containerRequest.getNodes().isEmpty( {
>   throw new InvalidContainerRequestException(
>   "Cannot specify node label with rack and node");
> }
> {noformat}
> {{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of 
> OFF-SWITCH. 
> The above check is not required for ContainerRequest ask /cc [~wangda] thank 
> you for confirming



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4925) ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression

2017-06-02 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035651#comment-16035651
 ] 

Konstantin Shvachko commented on YARN-4925:
---

Turned out the *Client tests were failing due to unusual local URI 
configuration, which should be addressed in YARN-6684. I was able to 
successfully run the tests locally with the work around.

> ContainerRequest in AMRMClient, application should be able to specify 
> nodes/racks together with nodeLabelExpression
> ---
>
> Key: YARN-4925
> URL: https://issues.apache.org/jira/browse/YARN-4925
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>  Labels: release-blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4925.patch, 0002-YARN-4925.patch, 
> YARN-4925-branch-2.7.001.patch
>
>
> Currently with nodelabel AMRMClient will not be able to specify nodelabels 
> with Node/Rack requests.For application like spark NODE_LOCAL requests cannot 
> be asked with label expression.
> As per the check in  {{AMRMClientImpl#checkNodeLabelExpression}}
> {noformat}
> // Don't allow specify node label against ANY request
> if ((containerRequest.getRacks() != null && 
> (!containerRequest.getRacks().isEmpty()))
> || 
> (containerRequest.getNodes() != null && 
> (!containerRequest.getNodes().isEmpty( {
>   throw new InvalidContainerRequestException(
>   "Cannot specify node label with rack and node");
> }
> {noformat}
> {{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of 
> OFF-SWITCH. 
> The above check is not required for ContainerRequest ask /cc [~wangda] thank 
> you for confirming



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1471) The SLS simulator is not running the preemption policy for CapacityScheduler

2017-06-02 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035595#comment-16035595
 ] 

Konstantin Shvachko commented on YARN-1471:
---

I see a new patch in YARN-6608. So this is progressing. And if this change is 
incorporated there we just need to port into branch-2.7. Will do just that if 
there are no objections.

> The SLS simulator is not running the preemption policy for CapacityScheduler
> 
>
> Key: YARN-1471
> URL: https://issues.apache.org/jira/browse/YARN-1471
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Minor
>  Labels: release-blocker
> Fix For: 3.0.0-alpha1
>
> Attachments: SLSCapacityScheduler.java, YARN-1471.2.patch, 
> YARN-1471-branch-2.7.4.patch, YARN-1471.patch, YARN-1471.patch
>
>
> The simulator does not run the ProportionalCapacityPreemptionPolicy monitor.  
> This is because the policy needs to interact with a CapacityScheduler, and 
> the wrapping done by the simulator breaks this. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1471) The SLS simulator is not running the preemption policy for CapacityScheduler

2017-05-31 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032448#comment-16032448
 ] 

Konstantin Shvachko commented on YARN-1471:
---

Hey guys. Let's target full SLS feature for the next release. Would be too big 
change for a dot one.
Here I just need a rebase of the branch-2 patch, which has been attached, but 
got stale, and which is just sufficient for our needs in 2.7. I am pretty 
confident about this one in particular as we were running it internally in 
production. Also since this is in trunk it may ease the backport effort of 
YARN-6608.

> The SLS simulator is not running the preemption policy for CapacityScheduler
> 
>
> Key: YARN-1471
> URL: https://issues.apache.org/jira/browse/YARN-1471
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Minor
>  Labels: release-blocker
> Fix For: 3.0.0-alpha1
>
> Attachments: SLSCapacityScheduler.java, YARN-1471.2.patch, 
> YARN-1471-branch-2.7.4.patch, YARN-1471.patch, YARN-1471.patch
>
>
> The simulator does not run the ProportionalCapacityPreemptionPolicy monitor.  
> This is because the policy needs to interact with a CapacityScheduler, and 
> the wrapping done by the simulator breaks this. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2017-05-31 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032114#comment-16032114
 ] 

Konstantin Shvachko commented on YARN-4140:
---

Hey [~bibinchundatt] did you want to work on porting this to branch-2.7?
Would appreciate your help.

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>  Labels: release-blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
> {code}
> dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1>
>  cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | 

[jira] [Commented] (YARN-4925) ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression

2017-05-31 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031876#comment-16031876
 ] 

Konstantin Shvachko commented on YARN-4925:
---

I've done quite a few runs for the client tests.
I see {{TestAMRMClient}} and other client tests failing consistently on my 
Linux box. On Mac they succeed most of the time, but also fail once in a while. 
I don't think it is related to this particular patch, but I don't know for sure.
So I think we should get some understanding on why the tests are failing, which 
will give us an idea if we can commit this jira and focus on fixing the tests 
for branch-2.7.

> ContainerRequest in AMRMClient, application should be able to specify 
> nodes/racks together with nodeLabelExpression
> ---
>
> Key: YARN-4925
> URL: https://issues.apache.org/jira/browse/YARN-4925
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>  Labels: release-blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4925.patch, 0002-YARN-4925.patch, 
> YARN-4925-branch-2.7.001.patch
>
>
> Currently with nodelabel AMRMClient will not be able to specify nodelabels 
> with Node/Rack requests.For application like spark NODE_LOCAL requests cannot 
> be asked with label expression.
> As per the check in  {{AMRMClientImpl#checkNodeLabelExpression}}
> {noformat}
> // Don't allow specify node label against ANY request
> if ((containerRequest.getRacks() != null && 
> (!containerRequest.getRacks().isEmpty()))
> || 
> (containerRequest.getNodes() != null && 
> (!containerRequest.getNodes().isEmpty( {
>   throw new InvalidContainerRequestException(
>   "Cannot specify node label with rack and node");
> }
> {noformat}
> {{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of 
> OFF-SWITCH. 
> The above check is not required for ContainerRequest ask /cc [~wangda] thank 
> you for confirming



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-4925) ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression

2017-05-21 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018974#comment-16018974
 ] 

Konstantin Shvachko edited comment on YARN-4925 at 5/21/17 8:27 PM:


Hey [~jhung] looks like there are problems with a few tests, including 
TestAMRMClient on branch-2.7.
Could you please check.


was (Author: shv):
Hey [~jhung] looks like there are problems with a few tests, including 
TestAMRClient on branch-2.7.
Could you please check.

> ContainerRequest in AMRMClient, application should be able to specify 
> nodes/racks together with nodeLabelExpression
> ---
>
> Key: YARN-4925
> URL: https://issues.apache.org/jira/browse/YARN-4925
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>  Labels: release-blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4925.patch, 0002-YARN-4925.patch, 
> YARN-4925-branch-2.7.001.patch
>
>
> Currently with nodelabel AMRMClient will not be able to specify nodelabels 
> with Node/Rack requests.For application like spark NODE_LOCAL requests cannot 
> be asked with label expression.
> As per the check in  {{AMRMClientImpl#checkNodeLabelExpression}}
> {noformat}
> // Don't allow specify node label against ANY request
> if ((containerRequest.getRacks() != null && 
> (!containerRequest.getRacks().isEmpty()))
> || 
> (containerRequest.getNodes() != null && 
> (!containerRequest.getNodes().isEmpty( {
>   throw new InvalidContainerRequestException(
>   "Cannot specify node label with rack and node");
> }
> {noformat}
> {{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of 
> OFF-SWITCH. 
> The above check is not required for ContainerRequest ask /cc [~wangda] thank 
> you for confirming



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4925) ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression

2017-05-21 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018974#comment-16018974
 ] 

Konstantin Shvachko commented on YARN-4925:
---

Hey [~jhung] looks like there are problems with a few tests, including 
TestAMRClient on branch-2.7.
Could you please check.

> ContainerRequest in AMRMClient, application should be able to specify 
> nodes/racks together with nodeLabelExpression
> ---
>
> Key: YARN-4925
> URL: https://issues.apache.org/jira/browse/YARN-4925
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>  Labels: release-blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4925.patch, 0002-YARN-4925.patch, 
> YARN-4925-branch-2.7.001.patch
>
>
> Currently with nodelabel AMRMClient will not be able to specify nodelabels 
> with Node/Rack requests.For application like spark NODE_LOCAL requests cannot 
> be asked with label expression.
> As per the check in  {{AMRMClientImpl#checkNodeLabelExpression}}
> {noformat}
> // Don't allow specify node label against ANY request
> if ((containerRequest.getRacks() != null && 
> (!containerRequest.getRacks().isEmpty()))
> || 
> (containerRequest.getNodes() != null && 
> (!containerRequest.getNodes().isEmpty( {
>   throw new InvalidContainerRequestException(
>   "Cannot specify node label with rack and node");
> }
> {noformat}
> {{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of 
> OFF-SWITCH. 
> The above check is not required for ContainerRequest ask /cc [~wangda] thank 
> you for confirming



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4612) Fix rumen and scheduler load simulator handle killed tasks properly

2017-05-21 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-4612:
--
   Labels:   (was: release-blocker)
Fix Version/s: 2.8.2
   2.7.4

I just committed this to branch-2.8 and branch-2.7.
Thank you [~zhouyejoe] for backporting.

> Fix rumen and scheduler load simulator handle killed tasks properly
> ---
>
> Key: YARN-4612
> URL: https://issues.apache.org/jira/browse/YARN-4612
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha1, 2.8.2
>
> Attachments: YARN-4612-2.patch, YARN-4612.patch
>
>
> Killed tasks might not any attempts. Rumen and SLS throw exceptions when 
> processing such data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1471) The SLS simulator is not running the preemption policy for CapacityScheduler

2017-05-21 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018960#comment-16018960
 ] 

Konstantin Shvachko commented on YARN-1471:
---

Hey [~zhouyejoe] or [~chris.douglas] do you mind updating path for branch-2 as 
well. I tried to merge but it didn't work.
Cannot commit to 2.7 without committing to 2.

> The SLS simulator is not running the preemption policy for CapacityScheduler
> 
>
> Key: YARN-1471
> URL: https://issues.apache.org/jira/browse/YARN-1471
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Minor
>  Labels: release-blocker
> Fix For: 3.0.0-alpha1
>
> Attachments: SLSCapacityScheduler.java, YARN-1471.2.patch, 
> YARN-1471-branch-2.7.4.patch, YARN-1471.patch, YARN-1471.patch
>
>
> The simulator does not run the ProportionalCapacityPreemptionPolicy monitor.  
> This is because the policy needs to interact with a CapacityScheduler, and 
> the wrapping done by the simulator breaks this. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4367) SLS webapp doesn't load

2017-05-21 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-4367:
--
   Labels:   (was: release-blocker)
Fix Version/s: 2.7.4

I just committed this to branch-2.7.
Thank you [~zhouyejoe] for backporting.

> SLS webapp doesn't load
> ---
>
> Key: YARN-4367
> URL: https://issues.apache.org/jira/browse/YARN-4367
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Fix For: 2.8.0, 2.7.4
>
> Attachments: YARN-4367-branch-2.1.patch, YARN-4367-branch-2.2.patch, 
> YARN-4367-branch-2.patch
>
>
> When I run the SLS, the webapp doesn't load and I see the following error:
> {noformat}
> 15/11/17 15:33:30 INFO resourcemanager.ResourceManager: Using Scheduler: 
> org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper
> java.lang.NullPointerException
> at org.apache.hadoop.yarn.sls.web.SLSWebApp.(SLSWebApp.java:87)
> at 
> org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.initMetrics(ResourceSchedulerWrapper.java:483)
> at 
> org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.setConf(ResourceSchedulerWrapper.java:181)
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createScheduler(ResourceManager.java:299)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4302) SLS not able start due to NPE in SchedulerApplicationAttempt#getResourceUsageReport

2017-05-21 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-4302:
--
   Labels:   (was: release-blocker)
Fix Version/s: 2.7.4

I just committed this to branch-2.7.
Thank you [~zhouyejoe] for backporting.

> SLS not able start due to NPE in 
> SchedulerApplicationAttempt#getResourceUsageReport
> ---
>
> Key: YARN-4302
> URL: https://issues.apache.org/jira/browse/YARN-4302
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4302.patch, 0001-YARN-4302.patch
>
>
> Configure the samples from tools/sls
> yarn-site.xml
> capacityscheduler.xml
> sls-runner.xml
> to /etc/hadoop
> Start sls using
>  
> bin/slsrun.sh --input-rumen=sample-data/2jobs2min-rumen-jh.json 
> --output-dir=out
> {noformat}
> 15/10/27 14:43:36 ERROR resourcemanager.ResourceManager: Error in handling 
> event type ATTEMPT_ADDED for applicationAttempt application_1445937212593_0001
> java.lang.NullPointerException
> at org.apache.hadoop.yarn.util.resource.Resources.clone(Resources.java:117)
> at org.apache.hadoop.yarn.util.resource.Resources.multiply(Resources.java:151)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceUsageReport(SchedulerApplicationAttempt.java:692)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getAppResourceUsageReport(AbstractYarnScheduler.java:326)
> at 
> org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.getAppResourceUsageReport(ResourceSchedulerWrapper.java:912)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics.getAggregateAppResourceUsage(RMAppAttemptMetrics.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.storeNewApplicationAttempt(RMStateStore.java:819)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.storeAttempt(RMAppAttemptImpl.java:2011)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$2700(RMAppAttemptImpl.java:109)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ScheduleTransition.transition(RMAppAttemptImpl.java:1021)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ScheduleTransition.transition(RMAppAttemptImpl.java:974)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:839)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:820)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:801)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4109) Exception on RM scheduler page loading with labels

2017-05-18 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-4109:
--
   Labels:   (was: release-blocker)
Fix Version/s: 2.7.4

Just committed this to branch-2.7.

> Exception on RM scheduler page loading with labels
> --
>
> Key: YARN-4109
> URL: https://issues.apache.org/jira/browse/YARN-4109
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Mohammad Shahid Khan
>Priority: Minor
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha1
>
> Attachments: YARN-4109_1.patch
>
>
> Configure node label and load scheduler Page
> On each reload of the page the below exception gets thrown in logs
> {code}
> 2015-09-03 11:27:08,544 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
> handling URI: /cluster/scheduler
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:139)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:663)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:291)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:615)
>   at 
> org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:82)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1211)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>   at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>   at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>   at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>   at org.mortbay.jetty.Server.handle(Server.java:326)
>   at 
> org.mortbay.jetty.HttpConnec

[jira] [Updated] (YARN-5543) ResourceManager SchedulingMonitor could potentially terminate the preemption checker thread

2017-05-11 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-5543:
--
Labels: oct16-medium  (was: oct16-medium release-blocker)

> ResourceManager SchedulingMonitor could potentially terminate the preemption 
> checker thread
> ---
>
> Key: YARN-5543
> URL: https://issues.apache.org/jira/browse/YARN-5543
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Affects Versions: 2.7.0, 2.6.1
>Reporter: Min Shen
>Assignee: Min Shen
>  Labels: oct16-medium
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha3, 2.8.2
>
> Attachments: YARN-5543.001.patch, YARN-5543.002.patch, 
> YARN-5543.003.patch, YARN-5543.004.patch, YARN-5543-branch-2.7.001.patch, 
> YARN-5543-branch-2.7.002.patch
>
>
> In SchedulingMonitor.java, when the service starts, it starts a checker 
> thread to perform Capacity Scheduler's preemption. However, the 
> implementation of this checker thread has the following issue:
> {code}
> while (!stopped && !Thread.currentThread().isInterrupted()) {
> 
> try {
>   Thread.sleep(monitorInterval)
> } catch (InterruptedException e) {
>   
>   break;
> }
> }
> {code}
> The above code snippet will terminate the checker thread whenever it is 
> interrupted. 
> We noticed in our cluster that this could lead to CapacityScheduler's 
> preemption disabled unexpectedly due to the checker thread getting terminated.
> We propose to use ScheduledExecutorService to improve the robustness of this 
> part of the code to ensure the liveness of CapacityScheduler's preemption 
> functionality.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5543) ResourceManager SchedulingMonitor could potentially terminate the preemption checker thread

2017-05-11 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007117#comment-16007117
 ] 

Konstantin Shvachko commented on YARN-5543:
---

In the new test could you guys make sure that {{rm}} and {{monitor}} objects 
are closed.
To avoid warning "Resource leak: 'monitor' is never closed"

> ResourceManager SchedulingMonitor could potentially terminate the preemption 
> checker thread
> ---
>
> Key: YARN-5543
> URL: https://issues.apache.org/jira/browse/YARN-5543
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Affects Versions: 2.7.0, 2.6.1
>Reporter: Min Shen
>Assignee: Min Shen
>  Labels: oct16-medium, release-blocker
> Attachments: YARN-5543.001.patch, YARN-5543.002.patch, 
> YARN-5543.003.patch, YARN-5543-branch-2.7.001.patch
>
>
> In SchedulingMonitor.java, when the service starts, it starts a checker 
> thread to perform Capacity Scheduler's preemption. However, the 
> implementation of this checker thread has the following issue:
> {code}
> while (!stopped && !Thread.currentThread().isInterrupted()) {
> 
> try {
>   Thread.sleep(monitorInterval)
> } catch (InterruptedException e) {
>   
>   break;
> }
> }
> {code}
> The above code snippet will terminate the checker thread whenever it is 
> interrupted. 
> We noticed in our cluster that this could lead to CapacityScheduler's 
> preemption disabled unexpectedly due to the checker thread getting terminated.
> We propose to use ScheduledExecutorService to improve the robustness of this 
> part of the code to ensure the liveness of CapacityScheduler's preemption 
> functionality.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4925) ContainerRequest in AMRMClient, application should be able to specify nodes/racks together with nodeLabelExpression

2017-05-04 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-4925:
--
Target Version/s: 2.7.4
  Labels: release-blocker  (was: )

> ContainerRequest in AMRMClient, application should be able to specify 
> nodes/racks together with nodeLabelExpression
> ---
>
> Key: YARN-4925
> URL: https://issues.apache.org/jira/browse/YARN-4925
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>  Labels: release-blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4925.patch, 0002-YARN-4925.patch
>
>
> Currently with nodelabel AMRMClient will not be able to specify nodelabels 
> with Node/Rack requests.For application like spark NODE_LOCAL requests cannot 
> be asked with label expression.
> As per the check in  {{AMRMClientImpl#checkNodeLabelExpression}}
> {noformat}
> // Don't allow specify node label against ANY request
> if ((containerRequest.getRacks() != null && 
> (!containerRequest.getRacks().isEmpty()))
> || 
> (containerRequest.getNodes() != null && 
> (!containerRequest.getNodes().isEmpty( {
>   throw new InvalidContainerRequestException(
>   "Cannot specify node label with rack and node");
> }
> {noformat}
> {{AppSchedulingInfo#updateResourceRequests}} we do reset of labels to that of 
> OFF-SWITCH. 
> The above check is not required for ContainerRequest ask /cc [~wangda] thank 
> you for confirming



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4250) NPE in AppSchedulingInfo#isRequestLabelChanged

2017-05-04 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-4250:
--
Target Version/s: 2.7.4
  Labels: release-blocker  (was: )

> NPE in AppSchedulingInfo#isRequestLabelChanged
> --
>
> Key: YARN-4250
> URL: https://issues.apache.org/jira/browse/YARN-4250
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.8.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>  Labels: release-blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-4250-002.patch, YARN-4250-003.patch, 
> YARN-4250-004.patch, YARN-4250.patch
>
>
>  *Trace* 
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.isRequestLabelChanged(AppSchedulingInfo.java:420)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:342)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:300)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.allocate(FifoScheduler.java:350)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart$MyFifoScheduler.allocate(TestAMRMClientOnRMRestart.java:544)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:507)
>   at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:277)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart.testAMRMClientResendsRequestsOnRMRestart(TestAMRMClientOnRMRestart.java:187)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2017-05-04 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-4140:
--
Target Version/s: 2.7.4  (was: 2.7.3)
  Labels: release-blocker  (was: )

> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>  Labels: release-blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch, 0014-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,832 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
> {code}
> dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1>
>  cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | grep 
> "root.b.b1" | wc -l
> 500
> {code}
>  
> (Consumes about 6 min

[jira] [Updated] (YARN-4109) Exception on RM scheduler page loading with labels

2017-05-04 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-4109:
--
Target Version/s: 2.7.4
  Labels: release-blocker  (was: )

> Exception on RM scheduler page loading with labels
> --
>
> Key: YARN-4109
> URL: https://issues.apache.org/jira/browse/YARN-4109
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Mohammad Shahid Khan
>Priority: Minor
>  Labels: release-blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-4109_1.patch
>
>
> Configure node label and load scheduler Page
> On each reload of the page the below exception gets thrown in logs
> {code}
> 2015-09-03 11:27:08,544 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
> handling URI: /cluster/scheduler
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:139)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:663)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:291)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:615)
>   at 
> org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:82)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1211)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>   at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>   at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>   at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>   at org.mortbay.jetty.Server.handle(Server.java:326)
>   at 
> org.mortbay.jetty.HttpConn

[jira] [Updated] (YARN-4612) Fix rumen and scheduler load simulator handle killed tasks properly

2017-05-04 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-4612:
--
Target Version/s: 2.7.4
  Labels: release-blocker  (was: )

> Fix rumen and scheduler load simulator handle killed tasks properly
> ---
>
> Key: YARN-4612
> URL: https://issues.apache.org/jira/browse/YARN-4612
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
>  Labels: release-blocker
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: YARN-4612-2.patch, YARN-4612.patch
>
>
> Killed tasks might not any attempts. Rumen and SLS throw exceptions when 
> processing such data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4367) SLS webapp doesn't load

2017-05-04 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-4367:
--
Target Version/s: 2.8.0, 2.7.4  (was: 2.8.0)
   Fix Version/s: (was: 2.7.4)

> SLS webapp doesn't load
> ---
>
> Key: YARN-4367
> URL: https://issues.apache.org/jira/browse/YARN-4367
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: release-blocker
> Fix For: 2.8.0
>
> Attachments: YARN-4367-branch-2.1.patch, YARN-4367-branch-2.2.patch, 
> YARN-4367-branch-2.patch
>
>
> When I run the SLS, the webapp doesn't load and I see the following error:
> {noformat}
> 15/11/17 15:33:30 INFO resourcemanager.ResourceManager: Using Scheduler: 
> org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper
> java.lang.NullPointerException
> at org.apache.hadoop.yarn.sls.web.SLSWebApp.(SLSWebApp.java:87)
> at 
> org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.initMetrics(ResourceSchedulerWrapper.java:483)
> at 
> org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.setConf(ResourceSchedulerWrapper.java:181)
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createScheduler(ResourceManager.java:299)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4367) SLS webapp doesn't load

2017-05-04 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-4367:
--
   Labels: release-blocker  (was: )
Fix Version/s: 2.7.4

> SLS webapp doesn't load
> ---
>
> Key: YARN-4367
> URL: https://issues.apache.org/jira/browse/YARN-4367
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: release-blocker
> Fix For: 2.8.0, 2.7.4
>
> Attachments: YARN-4367-branch-2.1.patch, YARN-4367-branch-2.2.patch, 
> YARN-4367-branch-2.patch
>
>
> When I run the SLS, the webapp doesn't load and I see the following error:
> {noformat}
> 15/11/17 15:33:30 INFO resourcemanager.ResourceManager: Using Scheduler: 
> org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper
> java.lang.NullPointerException
> at org.apache.hadoop.yarn.sls.web.SLSWebApp.(SLSWebApp.java:87)
> at 
> org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.initMetrics(ResourceSchedulerWrapper.java:483)
> at 
> org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.setConf(ResourceSchedulerWrapper.java:181)
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createScheduler(ResourceManager.java:299)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4302) SLS not able start due to NPE in SchedulerApplicationAttempt#getResourceUsageReport

2017-05-04 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-4302:
--
Target Version/s: 2.7.4
  Labels: release-blocker  (was: )

> SLS not able start due to NPE in 
> SchedulerApplicationAttempt#getResourceUsageReport
> ---
>
> Key: YARN-4302
> URL: https://issues.apache.org/jira/browse/YARN-4302
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>  Labels: release-blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4302.patch, 0001-YARN-4302.patch
>
>
> Configure the samples from tools/sls
> yarn-site.xml
> capacityscheduler.xml
> sls-runner.xml
> to /etc/hadoop
> Start sls using
>  
> bin/slsrun.sh --input-rumen=sample-data/2jobs2min-rumen-jh.json 
> --output-dir=out
> {noformat}
> 15/10/27 14:43:36 ERROR resourcemanager.ResourceManager: Error in handling 
> event type ATTEMPT_ADDED for applicationAttempt application_1445937212593_0001
> java.lang.NullPointerException
> at org.apache.hadoop.yarn.util.resource.Resources.clone(Resources.java:117)
> at org.apache.hadoop.yarn.util.resource.Resources.multiply(Resources.java:151)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceUsageReport(SchedulerApplicationAttempt.java:692)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getAppResourceUsageReport(AbstractYarnScheduler.java:326)
> at 
> org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.getAppResourceUsageReport(ResourceSchedulerWrapper.java:912)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics.getAggregateAppResourceUsage(RMAppAttemptMetrics.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.storeNewApplicationAttempt(RMStateStore.java:819)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.storeAttempt(RMAppAttemptImpl.java:2011)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$2700(RMAppAttemptImpl.java:109)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ScheduleTransition.transition(RMAppAttemptImpl.java:1021)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ScheduleTransition.transition(RMAppAttemptImpl.java:974)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:839)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:820)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:801)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1471) The SLS simulator is not running the preemption policy for CapacityScheduler

2017-05-04 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-1471:
--
Target Version/s: 2.7.4
  Labels: release-blocker  (was: )

> The SLS simulator is not running the preemption policy for CapacityScheduler
> 
>
> Key: YARN-1471
> URL: https://issues.apache.org/jira/browse/YARN-1471
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>Priority: Minor
>  Labels: release-blocker
> Fix For: 3.0.0-alpha1
>
> Attachments: SLSCapacityScheduler.java, YARN-1471.2.patch, 
> YARN-1471.patch, YARN-1471.patch
>
>
> The simulator does not run the ProportionalCapacityPreemptionPolicy monitor.  
> This is because the policy needs to interact with a CapacityScheduler, and 
> the wrapping done by the simulator breaks this. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4771) Some containers can be skipped during log aggregation after NM restart

2017-05-04 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-4771:
--
Priority: Major  (was: Critical)

Changing to Major to remove from the scope of 2.7.4 release.
Feel free to add back if you plan to fix it.

> Some containers can be skipped during log aggregation after NM restart
> --
>
> Key: YARN-4771
> URL: https://issues.apache.org/jira/browse/YARN-4771
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.2
>Reporter: Jason Lowe
> Attachments: YARN-4771.001.patch, YARN-4771.002.patch
>
>
> A container can be skipped during log aggregation after a work-preserving 
> nodemanager restart if the following events occur:
> # Container completes more than 
> yarn.nodemanager.duration-to-track-stopped-containers milliseconds before the 
> restart
> # At least one other container completes after the above container and before 
> the restart



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-3255) RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options

2015-02-26 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved YARN-3255.
---
   Resolution: Fixed
Fix Version/s: 2.7.0

I just committed this.
Thank you guys for prompt reviews.

> RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support 
> generic options
> ---
>
> Key: YARN-3255
> URL: https://issues.apache.org/jira/browse/YARN-3255
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 2.7.0
>
> Attachments: YARN-3255-01.patch, YARN-3255-02.patch, 
> YARN-3255-branch-2.patch
>
>
> Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore 
> generic options, like {{-conf}} and {{-fs}}. It would be good to have the 
> ability to pass generic options in order to specify configuration files or 
> the NameNode location, when the services start through {{main()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3255) RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options

2015-02-26 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-3255:
--
Attachment: YARN-3255-branch-2.patch

Patch for branch-2. Minor difference in import section with trunk.

> RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support 
> generic options
> ---
>
> Key: YARN-3255
> URL: https://issues.apache.org/jira/browse/YARN-3255
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: YARN-3255-01.patch, YARN-3255-02.patch, 
> YARN-3255-branch-2.patch
>
>
> Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore 
> generic options, like {{-conf}} and {{-fs}}. It would be good to have the 
> ability to pass generic options in order to specify configuration files or 
> the NameNode location, when the services start through {{main()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3255) RM and NM main() should support generic options

2015-02-25 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-3255:
--
Attachment: YARN-3255-02.patch

Added {{GenericOptionsParser}} to {{WebAppProxyServer}} and 
{{JobHistoryServer}}.
Checked other builds on the findbugs issue. Looks like all of them report 5 new 
findbugs warnings. Something with the build I guess.
TestAllocationFileLoaderService is passing locally.


> RM and NM main() should support generic options
> ---
>
> Key: YARN-3255
> URL: https://issues.apache.org/jira/browse/YARN-3255
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: YARN-3255-01.patch, YARN-3255-02.patch
>
>
> Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore 
> generic options, like {{-conf}} and {{-fs}}. It would be good to have the 
> ability to pass generic options in order to specify configuration files or 
> the NameNode location, when the services start through {{main()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3255) RM and NM main() should support generic options

2015-02-24 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-3255:
--
Attachment: YARN-3255-01.patch

A simple patch, which particularly lets me run a Yarn cluster in Eclipse.

> RM and NM main() should support generic options
> ---
>
> Key: YARN-3255
> URL: https://issues.apache.org/jira/browse/YARN-3255
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.5.0
>Reporter: Konstantin Shvachko
> Attachments: YARN-3255-01.patch
>
>
> Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore 
> generic options, like {{-conf}} and {{-fs}}. It would be good to have the 
> ability to pass generic options in order to specify configuration files or 
> the NameNode location, when the services start through {{main()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3255) RM and NM main() should support generic options

2015-02-24 Thread Konstantin Shvachko (JIRA)
Konstantin Shvachko created YARN-3255:
-

 Summary: RM and NM main() should support generic options
 Key: YARN-3255
 URL: https://issues.apache.org/jira/browse/YARN-3255
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Affects Versions: 2.5.0
Reporter: Konstantin Shvachko


Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore generic 
options, like {{-conf}} and {{-fs}}. It would be good to have the ability to 
pass generic options in order to specify configuration files or the NameNode 
location, when the services start through {{main()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-854) App submission fails on secure deploy

2013-07-30 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13724214#comment-13724214
 ] 

Konstantin Shvachko commented on YARN-854:
--

Committed to branch 2.0.6.

> App submission fails on secure deploy
> -
>
> Key: YARN-854
> URL: https://issues.apache.org/jira/browse/YARN-854
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Ramya Sunil
>Assignee: Omkar Vinit Joshi
>Priority: Blocker
> Fix For: 2.1.0-beta
>
> Attachments: YARN-854.20130619.1.patch, YARN-854.20130619.2.patch, 
> YARN-854.20130619.patch, YARN-854-branch-2.0.6.patch
>
>
> App submission on secure cluster fails with the following exception:
> {noformat}
> INFO mapreduce.Job: Job jobID failed with state FAILED due to: Application 
> applicationID failed 2 times due to AM Container for appattemptID exited with 
>  exitCode: -1000 due to: App initialization failed (255) with output: main : 
> command provided 0
> main : user is qa_user
> javax.security.sasl.SaslException: DIGEST-MD5: digest response format 
> violation. Mismatched response. [Caused by 
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): 
> DIGEST-MD5: digest response format violation. Mismatched response.]
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:65)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:348)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): 
> DIGEST-MD5: digest response format violation. Mismatched response.
>   at org.apache.hadoop.ipc.Client.call(Client.java:1298)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1250)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:204)
>   at $Proxy7.heartbeat(Unknown Source)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
>   ... 3 more
> .Failing this attempt.. Failing the application.
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-854) App submission fails on secure deploy

2013-07-29 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13723232#comment-13723232
 ] 

Konstantin Shvachko commented on YARN-854:
--

Aaron thanks for the review.
The changes to RegisterApplicationMasterResponsePBImpl in this patch for trunk 
and branch 2.1 fix a bug introduced by YARN-610.
YARN-610 is not in 2.0.5 branch, so the changes are not applicable to 2.0.x and 
therefore not reflected in my patch.
Here is the [diff that 
visualises|https://github.com/apache/hadoop-common/blame/branch-2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/RegisterApplicationMasterResponsePBImpl.java]
 changes.

> App submission fails on secure deploy
> -
>
> Key: YARN-854
> URL: https://issues.apache.org/jira/browse/YARN-854
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Ramya Sunil
>Assignee: Omkar Vinit Joshi
>Priority: Blocker
> Fix For: 2.1.0-beta
>
> Attachments: YARN-854.20130619.1.patch, YARN-854.20130619.2.patch, 
> YARN-854.20130619.patch, YARN-854-branch-2.0.6.patch
>
>
> App submission on secure cluster fails with the following exception:
> {noformat}
> INFO mapreduce.Job: Job jobID failed with state FAILED due to: Application 
> applicationID failed 2 times due to AM Container for appattemptID exited with 
>  exitCode: -1000 due to: App initialization failed (255) with output: main : 
> command provided 0
> main : user is qa_user
> javax.security.sasl.SaslException: DIGEST-MD5: digest response format 
> violation. Mismatched response. [Caused by 
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): 
> DIGEST-MD5: digest response format violation. Mismatched response.]
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:65)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:348)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): 
> DIGEST-MD5: digest response format violation. Mismatched response.
>   at org.apache.hadoop.ipc.Client.call(Client.java:1298)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1250)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:204)
>   at $Proxy7.heartbeat(Unknown Source)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
>   ... 3 more
> .Failing this attempt.. Failing the application.
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-854) App submission fails on secure deploy

2013-06-29 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-854:
-

Attachment: YARN-854-branch-2.0.6.patch

Attaching fix for branch 2.0. Could somebody please review.
Would be also good to reopen this for the merge. I don't have access to the 
reopen button.

> App submission fails on secure deploy
> -
>
> Key: YARN-854
> URL: https://issues.apache.org/jira/browse/YARN-854
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Ramya Sunil
>Assignee: Omkar Vinit Joshi
>Priority: Blocker
> Fix For: 2.1.0-beta
>
> Attachments: YARN-854.20130619.1.patch, YARN-854.20130619.2.patch, 
> YARN-854.20130619.patch, YARN-854-branch-2.0.6.patch
>
>
> App submission on secure cluster fails with the following exception:
> {noformat}
> INFO mapreduce.Job: Job jobID failed with state FAILED due to: Application 
> applicationID failed 2 times due to AM Container for appattemptID exited with 
>  exitCode: -1000 due to: App initialization failed (255) with output: main : 
> command provided 0
> main : user is qa_user
> javax.security.sasl.SaslException: DIGEST-MD5: digest response format 
> violation. Mismatched response. [Caused by 
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): 
> DIGEST-MD5: digest response format violation. Mismatched response.]
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:65)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:348)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): 
> DIGEST-MD5: digest response format violation. Mismatched response.
>   at org.apache.hadoop.ipc.Client.call(Client.java:1298)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1250)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:204)
>   at $Proxy7.heartbeat(Unknown Source)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
>   ... 3 more
> .Failing this attempt.. Failing the application.
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira