[jira] [Commented] (YARN-7535) We should display origin value of demand in fair scheduler page

2017-11-26 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266313#comment-16266313
 ] 

Wilfred Spiegelenburg commented on YARN-7535:
-

The code has changed in recent versions, there is no updateDemandForApp any 
more after YARN-6172.

Demand for a queue as [~yufeigu] explained should be limited to the maximum the 
queue can use. So the existing code should be left as is. Changing the 
calculation would affect the minimum share starvation and some other 
calculations that use the demand. Having the extra detail on how high demand 
really is in a queue is could provide some more detail for tuning. The 
{{FSAppAttempt}} does not cap it so we have the info already.

Some considerations:
- We could store the extra detail to the {{leafQueue}}. There would not really 
be an overhead beside some extra local storage.
- Adding it to the {{parentQueue}} to get it for the whole hierarchy would be 
possible but it does involve overhead. We would then also need to choose if we 
want the unlimited demand from the child queue or the limited version
- The scheduler state dump is easily changed, 
- Do we want to display this in the web UI? It might be confusing to show the 
two numbers always and the state dump would be a much better place because it 
can be seen over time instead of just one instance


> We should display origin value of demand in fair scheduler page
> ---
>
> Key: YARN-7535
> URL: https://issues.apache.org/jira/browse/YARN-7535
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: YunFan Zhou
>Assignee: YunFan Zhou
>
> The value of *demand* of leaf queue that we now view on the fair scheduler 
> page shows only the value of *maxResources* when the demand value is greater 
> than *maxResources*. It doesn't reflect the real situation. Most of the time, 
> when we expand the queue, we often rely on seeing the current demand real 
> value.
> {code:java}
> private void updateDemandForApp(FSAppAttempt sched, Resource maxRes) {
> sched.updateDemand();
> Resource toAdd = sched.getDemand();
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Counting resource from " + sched.getName() + " " + toAdd
>   + "; Total resource consumption for " + getName() + " now "
>   + demand);
> }
> demand = Resources.add(demand, toAdd);
> demand = Resources.componentwiseMin(demand, maxRes);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7534) Fair scheduler assign resources may exceed maxResources

2017-11-26 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266302#comment-16266302
 ] 

Wilfred Spiegelenburg commented on YARN-7534:
-

Based on the current analysis I do not think we have a problem.
[~daemon] if you have logs that show this is not working please attach 
otherwise I will close this as not a problem

> Fair scheduler assign resources may exceed maxResources
> ---
>
> Key: YARN-7534
> URL: https://issues.apache.org/jira/browse/YARN-7534
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: YunFan Zhou
>Assignee: Wilfred Spiegelenburg
>
> The logic we're scheduling now is to check whether the resources used by the 
> queue has exceeded *maxResources* before assigning the container. This will 
> leads to the fact that after assigning this container the queue uses more 
> resources than *maxResources*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7560) Resourcemanager hangs when resourceUsedWithWeightToResourceRatio return a overflow value

2017-11-26 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266300#comment-16266300
 ] 

Wilfred Spiegelenburg commented on YARN-7560:
-

looks good to me, +1 (non binding)


> Resourcemanager hangs when  resourceUsedWithWeightToResourceRatio return a 
> overflow value 
> --
>
> Key: YARN-7560
> URL: https://issues.apache.org/jira/browse/YARN-7560
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 3.0.0
>Reporter: zhengchenyu
>Assignee: zhengchenyu
> Fix For: 3.0.0
>
> Attachments: YARN-7560.000.patch, YARN-7560.001.patch
>
>
> In our cluster, we changed the configuration, then refreshQueues, we found 
> the resourcemanager hangs. And the Resourcemanager can't restart 
> successfully. We got jstack information, always show like this:
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x7f98e8017000 nid=0x2f5 runnable 
> [0x7f98eed9a000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:182)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeSharesInternal(ComputeFairShares.java:140)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeSteadyShares(ComputeFairShares.java:66)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeSteadyShares(FairSharePolicy.java:148)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeSteadyShares(FSParentQueue.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:148)
> - locked <0x7f8c4a8177a0> (a java.util.HashMap)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:101)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.updateAllocationConfiguration(QueueManager.java:387)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$AllocationReloadListener.onReload(FairScheduler.java:1728)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:422)
> - locked <0x7f8c4a7eb2e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1597)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1621)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> - locked <0x7f8c4a76ac48> (a java.lang.Object)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:569)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> - locked <0x7f8c49254268> (a java.lang.Object)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:997)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:257)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> - locked <0x7f8c467495e0> (a java.lang.Object)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1220)
> {code}
> When we debug the cluster, we found resourceUsedWithWeightToResourceRatio 
> return a negative value. So the loop can't return. We found in our cluster, 
> the sum of all minRes is over int.max, so 
> resourceUsedWithWeightToResourceRatio return a negative value.
> below is the loop. Because totalResource is long, so always postive. But 
> resourceUsedWithWeightToResourceRatio return int type. Our cluster is so big 
> that resourceUsedWithWeightToResourceRatio will return a overflow value, just 
> a negative. So the loop will never break.
> {code}
> while (resourceUsedWithWeightToResourceRatio(rMax, schedulables, type)
> < totalResource) {
>   rMax *= 2.0;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (YARN-7560) Resourcemanager hangs when resourceUsedWithWeightToResourceRatio return a overflow value

2017-11-26 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266273#comment-16266273
 ] 

Wilfred Spiegelenburg commented on YARN-7560:
-

Thank you [~zhengchenyu] for the patch
Some comments on the patch:
* Can you please remove the unneeded casts to long that are left in 
computeSharesInternal, handleFixedFairShares:
{code}
127  totalMaxShare = Math.min(maxShare + (long)totalMaxShare,
128  Long.MAX_VALUE);
...
169  target.setResourceValue(type, (long)computeShare(sched, right, type));
{code}
and
{code}
224totalResource = Math.min((long)totalResource + (long)fixedShare,
225Long.MAX_VALUE);
{code}
* In resourceUsedWithWeightToResourceRatio we should not have to create a 
temporary variable share and could do:
{code}
  resourcesTaken += computeShare(sched, w2rRatio, type);
{code}
* In {{computeShare}} we should move the cast from double to long to the point 
where we calculate the share instead of leaving at to after we do the min and 
max checks and remove the cast at the end of the call that will speed up 
calculations slightly and won't change the outcome:
{code}
192long share = (long)(sched.getWeight() * w2rRatio);
{code}

> Resourcemanager hangs when  resourceUsedWithWeightToResourceRatio return a 
> overflow value 
> --
>
> Key: YARN-7560
> URL: https://issues.apache.org/jira/browse/YARN-7560
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 3.0.0
>Reporter: zhengchenyu
>Assignee: zhengchenyu
> Fix For: 3.0.0
>
> Attachments: YARN-7560.000.patch
>
>
> In our cluster, we changed the configuration, then refreshQueues, we found 
> the resourcemanager hangs. And the Resourcemanager can't restart 
> successfully. We got jstack information, always show like this:
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x7f98e8017000 nid=0x2f5 runnable 
> [0x7f98eed9a000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:182)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeSharesInternal(ComputeFairShares.java:140)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeSteadyShares(ComputeFairShares.java:66)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeSteadyShares(FairSharePolicy.java:148)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeSteadyShares(FSParentQueue.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:148)
> - locked <0x7f8c4a8177a0> (a java.util.HashMap)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:101)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.updateAllocationConfiguration(QueueManager.java:387)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$AllocationReloadListener.onReload(FairScheduler.java:1728)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:422)
> - locked <0x7f8c4a7eb2e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1597)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1621)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> - locked <0x7f8c4a76ac48> (a java.lang.Object)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:569)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> - locked <0x7f8c49254268> (a java.lang.Object)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:997)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:257)
> at 
> 

[jira] [Commented] (YARN-7524) Remove unused FairSchedulerEventLog

2017-11-22 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263587#comment-16263587
 ] 

Wilfred Spiegelenburg commented on YARN-7524:
-

thank you for the commit [~yufeigu]

> Remove unused FairSchedulerEventLog
> ---
>
> Key: YARN-7524
> URL: https://issues.apache.org/jira/browse/YARN-7524
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Fix For: 3.0.0, 3.1.0
>
> Attachments: YARN-7524.001.patch, YARN-7524.002.patch
>
>
> The FairSchedulerEventLog is no longer used. It is only being written to in 
> one location in the FS (see YARN-1383) and the functionality requested in 
> that jira has been implemented using the normal OOTB logging in the 
> AbstractYarnScheduler.
> The functionality the scheduler event log used to provide has been replaced 
> with normal logging and the scheduler state dump in YARN-6042



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7513) Remove scheduler lock in FSAppAttempt.getWeight()

2017-11-21 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261805#comment-16261805
 ] 

Wilfred Spiegelenburg commented on YARN-7513:
-

Thank you [~yufeigu] and [~templedf] for the reviews and commit

> Remove scheduler lock in FSAppAttempt.getWeight()
> -
>
> Key: YARN-7513
> URL: https://issues.apache.org/jira/browse/YARN-7513
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
> Fix For: 3.0.0, 3.1.0
>
> Attachments: YARN-7513.001.patch
>
>
> With the change from YARN-7414 a new FindBugs warning was introduced.
> The code that was moved from the FairScheduler to the FSAppAttempt can also 
> be simplified by removing the unneeded locking.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7534) Fair scheduler assign resources may exceed maxResources

2017-11-20 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg reassigned YARN-7534:
---

Assignee: Wilfred Spiegelenburg

> Fair scheduler assign resources may exceed maxResources
> ---
>
> Key: YARN-7534
> URL: https://issues.apache.org/jira/browse/YARN-7534
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: YunFan Zhou
>Assignee: Wilfred Spiegelenburg
>
> The logic we're scheduling now is to check whether the resources used by the 
> queue has exceeded *maxResources* before assigning the container. This will 
> leads to the fact that after assigning this container the queue uses more 
> resources than *maxResources*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7534) Fair scheduler assign resources may exceed maxResources

2017-11-20 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258977#comment-16258977
 ] 

Wilfred Spiegelenburg commented on YARN-7534:
-

I would like to work on this one if you don't mind

I think two things are getting mixed up: the queue used resources are not 
linked to the node. It is the sum of all the resources of containers from 
applications that run in a queue. A node heartbeat with a changed usage does 
not mean that the usage changed because an application in the queue has changed 
it. It could have changed due to a different queue/application adding a 
container.

We're also not allocating anything just yet and have thus not gone over. When 
the application is updated, at a later point in time, that is when we do that 
check. We just have a preliminary check here to see if we can offer this node 
to the queue. Another point to take into account: we are not checking what the 
application asked for here. That is the next step that follows just below when 
we run over all the applications that have a demand:

{code}
for (FSAppAttempt sched : fetchAppsWithDemand(true)) {
  if (SchedulerAppUtils.isPlaceBlacklisted(sched, node, LOG)) {
continue;
  }
  assigned = sched.assignContainer(node);
{code}

This is the earliest we can find what the ask is. If there are more 
applications with a demand for the queue we walk over the list. We call 
[assignContainer 
|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java#L830]
and that is where the checks happen.
One of the checks we perform is in hasContainerForNode for the FSAppAttempt:
{code}
} else if (!getQueue().fitsInMaxShare(resource)) {
  // The requested container must fit in queue maximum share
  updateAMDiagnosticMsg(resource,
  " exceeds current queue or its parents maximum resource allowed).");

  ret = false;
{code}

Which makes the allocation fail and thus we drop out and check the next request 
for the application and if that all fails we check the next application in the 
list from apps with demand.

Do you have any logs that show that this is not working as it should?


> Fair scheduler assign resources may exceed maxResources
> ---
>
> Key: YARN-7534
> URL: https://issues.apache.org/jira/browse/YARN-7534
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: YunFan Zhou
>
> The logic we're scheduling now is to check whether the resources used by the 
> queue has exceeded *maxResources* before assigning the container. This will 
> leads to the fact that after assigning this container the queue uses more 
> resources than *maxResources*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7524) Remove unused FairSchedulerEventLog

2017-11-18 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-7524:

Attachment: YARN-7524.002.patch

update to remove the unused import in FairSchedulerConfiguration

> Remove unused FairSchedulerEventLog
> ---
>
> Key: YARN-7524
> URL: https://issues.apache.org/jira/browse/YARN-7524
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-7524.001.patch, YARN-7524.002.patch
>
>
> The FairSchedulerEventLog is no longer used. It is only being written to in 
> one location in the FS (see YARN-1383) and the functionality requested in 
> that jira has been implemented using the normal OOTB logging in the 
> AbstractYarnScheduler.
> The functionality the scheduler event log used to provide has been replaced 
> with normal logging and the scheduler state dump in YARN-6042



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7524) Remove unused FairSchedulerEventLog

2017-11-16 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-7524:

Attachment: YARN-7524.001.patch

Patch to remove:
- log init and writing the log
- log class
- test class
- configuration

Ran all FS tests locally and they passed

> Remove unused FairSchedulerEventLog
> ---
>
> Key: YARN-7524
> URL: https://issues.apache.org/jira/browse/YARN-7524
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-7524.001.patch
>
>
> The FairSchedulerEventLog is no longer used. It is only being written to in 
> one location in the FS (see YARN-1383) and the functionality requested in 
> that jira has been implemented using the normal OOTB logging in the 
> AbstractYarnScheduler.
> The functionality the scheduler event log used to provide has been replaced 
> with normal logging and the scheduler state dump in YARN-6042



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7524) Remove unused FairSchedulerEventLog

2017-11-16 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-7524:
---

 Summary: Remove unused FairSchedulerEventLog
 Key: YARN-7524
 URL: https://issues.apache.org/jira/browse/YARN-7524
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


The FairSchedulerEventLog is no longer used. It is only being written to in one 
location in the FS (see YARN-1383) and the functionality requested in that jira 
has been implemented using the normal OOTB logging in the AbstractYarnScheduler.

The functionality the scheduler event log used to provide has been replaced 
with normal logging and the scheduler state dump in YARN-6042



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6486) FairScheduler: Deprecate continuous scheduling

2017-11-16 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-6486:

Attachment: YARN-6486.003.patch

oops, forgot one annotation to fix the javac messages

unit and asflicense failures are not related to the patch, build runs OOM

> FairScheduler: Deprecate continuous scheduling
> --
>
> Key: YARN-6486
> URL: https://issues.apache.org/jira/browse/YARN-6486
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-6486.001.patch, YARN-6486.002.patch, 
> YARN-6486.003.patch
>
>
> Mark continuous scheduling as deprecated in 2.9 and remove the code in 3.0. 
> Removing continuous scheduling from the code will be logged as a separate jira



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7513) FindBugs in FSAppAttempt.getWeight()

2017-11-16 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256238#comment-16256238
 ] 

Wilfred Spiegelenburg commented on YARN-7513:
-

Holding the scheduler lock to get the demand for an application attempt does 
not seem correct. The call out used to come from the FSAppAttempt back to the 
scheduler and then was using FSAppAttempt for details. Which could have been 
the reason. However there was no locking at all on this call before YARN-3139 
was implemented. Current releases of CDH do not have YARN-3139 and we do not 
have any issues in those releases.
Based on all that and the fact that we use private FSAppAttempt variables 
{{demand}} and {{appPriority}} I doubt that there will be a need now or that we 
ever had a reason for the locking.

Test failures are not related to this jira.

> FindBugs in FSAppAttempt.getWeight()
> 
>
> Key: YARN-7513
> URL: https://issues.apache.org/jira/browse/YARN-7513
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
> Attachments: YARN-7513.001.patch
>
>
> With the change from YARN-7414 a new FindBugs warning was introduced.
> The code that was moved from the FairScheduler to the FSAppAttempt can also 
> be simplified by removing the unneeded locking.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7513) FindBugs in FSAppAttempt.getWeight()

2017-11-16 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-7513:

Attachment: YARN-7513.001.patch

patch to remove the locking and clean up the ported code

> FindBugs in FSAppAttempt.getWeight()
> 
>
> Key: YARN-7513
> URL: https://issues.apache.org/jira/browse/YARN-7513
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
> Attachments: YARN-7513.001.patch
>
>
> With the change from YARN-7414 a new FindBugs warning was introduced.
> The code that was moved from the FairScheduler to the FSAppAttempt can also 
> be simplified by removing the unneeded locking.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6486) FairScheduler: Deprecate continuous scheduling

2017-11-16 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-6486:

Attachment: YARN-6486.002.patch

Updated patch: fix for javac warning about deprecated API use

findbugs warning logged as YARN-7513

test failures are not related to the code logged as YARN-7507

> FairScheduler: Deprecate continuous scheduling
> --
>
> Key: YARN-6486
> URL: https://issues.apache.org/jira/browse/YARN-6486
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-6486.001.patch, YARN-6486.002.patch
>
>
> Mark continuous scheduling as deprecated in 2.9 and remove the code in 3.0. 
> Removing continuous scheduling from the code will be logged as a separate jira



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7513) FindBugs in FSAppAttempt.getWeight()

2017-11-16 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-7513:
---

 Summary: FindBugs in FSAppAttempt.getWeight()
 Key: YARN-7513
 URL: https://issues.apache.org/jira/browse/YARN-7513
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 3.1.0
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg
Priority: Minor


With the change from YARN-7414 a new FindBugs warning was introduced.
The code that was moved from the FairScheduler to the FSAppAttempt can also be 
simplified by removing the unneeded locking.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6486) FairScheduler: Deprecate continuous scheduling

2017-11-15 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-6486:

Attachment: YARN-6486.001.patch

Patch marking all continuous scheduling public methods deprecated.
There is no documentation for continuous scheduling or its settings so no 
documentation updates. Since there is no code change there are no new tests 
added.

The FairScheduler is marked @ LimitedPrivate("yarn") @ Unstable and 
FairSchedulerConfig is marked as @ Private @ Evolving. Based on the 
compatibility guidelines we should be able to remove the code in a later hadoop 
3.x release.

Since we might have clusters using this we should allow enough time for users 
to reconfigure clusters and move away from continuous scheduling.

> FairScheduler: Deprecate continuous scheduling
> --
>
> Key: YARN-6486
> URL: https://issues.apache.org/jira/browse/YARN-6486
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-6486.001.patch
>
>
> Mark continuous scheduling as deprecated in 2.9 and remove the code in 3.0. 
> Removing continuous scheduling from the code will be logged as a separate jira



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6486) FairScheduler: Deprecate continuous scheduling

2017-11-15 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-6486:

Summary: FairScheduler: Deprecate continuous scheduling  (was: 
FairScheduler: Deprecate continuous scheduling in 2.9)

> FairScheduler: Deprecate continuous scheduling
> --
>
> Key: YARN-6486
> URL: https://issues.apache.org/jira/browse/YARN-6486
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>
> Mark continuous scheduling as deprecated in 2.9 and remove the code in 3.0. 
> Removing continuous scheduling from the code will be logged as a separate jira



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7139) FairScheduler: finished applications are always restored to default queue

2017-11-02 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236970#comment-16236970
 ] 

Wilfred Spiegelenburg commented on YARN-7139:
-

findbugs warning is fixed via YARN-7432

> FairScheduler: finished applications are always restored to default queue
> -
>
> Key: YARN-7139
> URL: https://issues.apache.org/jira/browse/YARN-7139
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.1
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-7139.01.patch, YARN-7139.02.patch, 
> YARN-7139.03.patch, YARN-7139.04.patch
>
>
> The queue an application gets submitted to is defined by the placement policy 
> in the FS. The placement policy returns the queue and the application object 
> is updated. When an application is stored in the state store the application 
> submission context is used which has not been updated after the placement 
> rules have run. 
> This means that the original queue from the submission is still stored which 
> is the incorrect queue. On restore we then read back the wrong queue and 
> display the wrong queue in the RM web UI.
> We should update the submission context after we have run the placement 
> policies to make sure that we store the correct queue for the application.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7139) FairScheduler: finished applications are always restored to default queue

2017-11-02 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235312#comment-16235312
 ] 

Wilfred Spiegelenburg commented on YARN-7139:
-

Failed tests pass locally, findbugs warning is not related to the patch for 
this jira

> FairScheduler: finished applications are always restored to default queue
> -
>
> Key: YARN-7139
> URL: https://issues.apache.org/jira/browse/YARN-7139
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.1
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-7139.01.patch, YARN-7139.02.patch, 
> YARN-7139.03.patch, YARN-7139.04.patch
>
>
> The queue an application gets submitted to is defined by the placement policy 
> in the FS. The placement policy returns the queue and the application object 
> is updated. When an application is stored in the state store the application 
> submission context is used which has not been updated after the placement 
> rules have run. 
> This means that the original queue from the submission is still stored which 
> is the incorrect queue. On restore we then read back the wrong queue and 
> display the wrong queue in the RM web UI.
> We should update the submission context after we have run the placement 
> policies to make sure that we store the correct queue for the application.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7139) FairScheduler: finished applications are always restored to default queue

2017-11-01 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-7139:

Attachment: YARN-7139.04.patch

Patch that fixes the test failures:
- reverted some of the order changes which seem to affect the majority of the 
tests
- added an extra null check to fix the 2 NPEs
- use the same app attempt object since we mock the isWaitingForAMContainer() 
call

Timed out test all pass locally


> FairScheduler: finished applications are always restored to default queue
> -
>
> Key: YARN-7139
> URL: https://issues.apache.org/jira/browse/YARN-7139
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.1
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-7139.01.patch, YARN-7139.02.patch, 
> YARN-7139.03.patch, YARN-7139.04.patch
>
>
> The queue an application gets submitted to is defined by the placement policy 
> in the FS. The placement policy returns the queue and the application object 
> is updated. When an application is stored in the state store the application 
> submission context is used which has not been updated after the placement 
> rules have run. 
> This means that the original queue from the submission is still stored which 
> is the incorrect queue. On restore we then read back the wrong queue and 
> display the wrong queue in the RM web UI.
> We should update the submission context after we have run the placement 
> policies to make sure that we store the correct queue for the application.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7139) FairScheduler: finished applications are always restored to default queue

2017-11-01 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-7139:

Attachment: YARN-7139.03.patch

Updated patch: rebased to trunk, fixed some check style warning in and around 
the code that has changed

> FairScheduler: finished applications are always restored to default queue
> -
>
> Key: YARN-7139
> URL: https://issues.apache.org/jira/browse/YARN-7139
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.1
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-7139.01.patch, YARN-7139.02.patch, 
> YARN-7139.03.patch
>
>
> The queue an application gets submitted to is defined by the placement policy 
> in the FS. The placement policy returns the queue and the application object 
> is updated. When an application is stored in the state store the application 
> submission context is used which has not been updated after the placement 
> rules have run. 
> This means that the original queue from the submission is still stored which 
> is the incorrect queue. On restore we then read back the wrong queue and 
> display the wrong queue in the RM web UI.
> We should update the submission context after we have run the placement 
> policies to make sure that we store the correct queue for the application.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7139) FairScheduler: finished applications are always restored to default queue

2017-09-01 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-7139:

Attachment: YARN-7139.02.patch

Some of the tests were failing due to sequencing of the RMApp creation and 
calling the scheduler. These have been fixed by fixing the sequencing.

The left over tests that failed run without creating a RMApp and just test the 
handling of the events. This means  that the RMApp can not be updated as is 
needed for the restore. I added a check in the scheduler but it would be better 
to update all the tests to create an RMApp and add it.

> FairScheduler: finished applications are always restored to default queue
> -
>
> Key: YARN-7139
> URL: https://issues.apache.org/jira/browse/YARN-7139
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.1
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-7139.01.patch, YARN-7139.02.patch
>
>
> The queue an application gets submitted to is defined by the placement policy 
> in the FS. The placement policy returns the queue and the application object 
> is updated. When an application is stored in the state store the application 
> submission context is used which has not been updated after the placement 
> rules have run. 
> This means that the original queue from the submission is still stored which 
> is the incorrect queue. On restore we then read back the wrong queue and 
> display the wrong queue in the RM web UI.
> We should update the submission context after we have run the placement 
> policies to make sure that we store the correct queue for the application.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7139) FairScheduler: finished applications are always restored to default queue

2017-08-30 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-7139:

Attachment: YARN-7139.01.patch

> FairScheduler: finished applications are always restored to default queue
> -
>
> Key: YARN-7139
> URL: https://issues.apache.org/jira/browse/YARN-7139
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.1
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-7139.01.patch
>
>
> The queue an application gets submitted to is defined by the placement policy 
> in the FS. The placement policy returns the queue and the application object 
> is updated. When an application is stored in the state store the application 
> submission context is used which has not been updated after the placement 
> rules have run. 
> This means that the original queue from the submission is still stored which 
> is the incorrect queue. On restore we then read back the wrong queue and 
> display the wrong queue in the RM web UI.
> We should update the submission context after we have run the placement 
> policies to make sure that we store the correct queue for the application.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7139) FairScheduler: finished applications are always restored to default queue

2017-08-30 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-7139:
---

 Summary: FairScheduler: finished applications are always restored 
to default queue
 Key: YARN-7139
 URL: https://issues.apache.org/jira/browse/YARN-7139
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.8.1
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


The queue an application gets submitted to is defined by the placement policy 
in the FS. The placement policy returns the queue and the application object is 
updated. When an application is stored in the state store the application 
submission context is used which has not been updated after the placement rules 
have run. 

This means that the original queue from the submission is still stored which is 
the incorrect queue. On restore we then read back the wrong queue and display 
the wrong queue in the RM web UI.

We should update the submission context after we have run the placement 
policies to make sure that we store the correct queue for the application.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-4227) FairScheduler: RM quits processing expired container from a removed node

2017-08-23 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg reopened YARN-4227:
-

[~Steven Rand] looks like you are correct, lets re-open this issue.
The {{containerCompleted}} call we do just before we release the container on 
the node is synchronised which means that we could have been waiting there for 
just long enough to cause the node to become {{null}}
The patch will need a rebase due to all the pre-emption changes that have gone 
in and I'll upload a new patch soon.

> FairScheduler: RM quits processing expired container from a removed node
> 
>
> Key: YARN-4227
> URL: https://issues.apache.org/jira/browse/YARN-4227
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.3.0, 2.5.0, 2.7.1
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Critical
> Attachments: YARN-4227.2.patch, YARN-4227.3.patch, YARN-4227.4.patch, 
> YARN-4227.patch
>
>
> Under some circumstances the node is removed before an expired container 
> event is processed causing the RM to exit:
> {code}
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
> Expired:container_1436927988321_1307950_01_12 Timed out after 600 secs
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1436927988321_1307950_01_12 Container Transitioned from 
> ACQUIRED to EXPIRED
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp: 
> Completed container: container_1436927988321_1307950_01_12 in state: 
> EXPIRED event:EXPIRE
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=system_op   
>OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
> APPID=application_1436927988321_1307950 
> CONTAINERID=container_1436927988321_1307950_01_12
> 2015-10-04 21:14:01,063 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type CONTAINER_EXPIRED to the scheduler
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:849)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1273)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:585)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {code}
> The stack trace is from 2.3.0 but the same issue has been observed in 2.5.0 
> and 2.6.0 by different customers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1558) After apps are moved across queues, store new queue info in the RM state store

2017-06-20 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056731#comment-16056731
 ] 

Wilfred Spiegelenburg commented on YARN-1558:
-

This issue is fixed as part of YARN-5932 in which the move application was 
changed.
Can we close this as a duplicate?

> After apps are moved across queues, store new queue info in the RM state store
> --
>
> Key: YARN-1558
> URL: https://issues.apache.org/jira/browse/YARN-1558
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Sandy Ryza
>Assignee: Varun Saxena
>
> The result of moving an app to a new queue should persist across RM restarts. 
>  This will require updating the ApplicationSubmissionContext, the single 
> source of truth upon state recovery, with the new queue info.
> There will be a brief window after the move completes before the move is 
> stored.  If the RM dies during this window, the recovered RM will include the 
> old queue info.  Schedulers should be resilient to this situation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6615) AmIpFilter drops query parameters on redirect

2017-05-24 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024167#comment-16024167
 ] 

Wilfred Spiegelenburg commented on YARN-6615:
-

Thank you [~jlowe] for the reviews and the commit

> AmIpFilter drops query parameters on redirect
> -
>
> Key: YARN-6615
> URL: https://issues.apache.org/jira/browse/YARN-6615
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Fix For: 2.9.0, 2.7.4, 2.8.1, 2.6.6, 3.0.0-alpha3
>
> Attachments: YARN-6615.1.patch, YARN-6615-branch-2.6.1.patch, 
> YARN-6615-branch-2.6.2.patch, YARN-6615-branch-2.6.3.patch, 
> YARN-6615-branch-2.8.1.patch
>
>
> When an AM web request is redirected to the RM the query parameters are 
> dropped from the web request.
> This happens for Spark as described in SPARK-20772.
> The repro steps are:
> - Start up the spark-shell in yarn mode and run a job
> - Try to access the job details through http://:4040/jobs/job?id=0
> - A HTTP ERROR 400 is thrown (requirement failed: missing id parameter)
> This works fine in local or standalone mode, but does not work on Yarn where 
> the query parameter is dropped. If the UI filter 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is removed from 
> the config which shows that the problem is in the filter



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6615) AmIpFilter drops query parameters on redirect

2017-05-24 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-6615:

Attachment: YARN-6615-branch-2.6.3.patch

OK, found a simple way to fix the findbugs warning. Parse and then re-add the 
parameters: this is done under the covers in the buildTrackingUrl and its use 
of the UriBuilder in later versions.

ran findbugs before and after 

> AmIpFilter drops query parameters on redirect
> -
>
> Key: YARN-6615
> URL: https://issues.apache.org/jira/browse/YARN-6615
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-6615.1.patch, YARN-6615-branch-2.6.1.patch, 
> YARN-6615-branch-2.6.2.patch, YARN-6615-branch-2.6.3.patch, 
> YARN-6615-branch-2.8.1.patch
>
>
> When an AM web request is redirected to the RM the query parameters are 
> dropped from the web request.
> This happens for Spark as described in SPARK-20772.
> The repro steps are:
> - Start up the spark-shell in yarn mode and run a job
> - Try to access the job details through http://:4040/jobs/job?id=0
> - A HTTP ERROR 400 is thrown (requirement failed: missing id parameter)
> This works fine in local or standalone mode, but does not work on Yarn where 
> the query parameter is dropped. If the UI filter 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is removed from 
> the config which shows that the problem is in the filter



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6615) AmIpFilter drops query parameters on redirect

2017-05-18 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-6615:

Attachment: YARN-6615-branch-2.6.2.patch

Attached the wrong version of the patch this one has the fixed junit test and 
passes the redirect URL correctly through the encoding.

The find bugs warning would need more to fix and would require most of what we 
do in ProxyUtils in later releases. let me know if that needs fixing for 
branch-2.6 or not.

> AmIpFilter drops query parameters on redirect
> -
>
> Key: YARN-6615
> URL: https://issues.apache.org/jira/browse/YARN-6615
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-6615.1.patch, YARN-6615-branch-2.6.1.patch, 
> YARN-6615-branch-2.6.2.patch, YARN-6615-branch-2.8.1.patch
>
>
> When an AM web request is redirected to the RM the query parameters are 
> dropped from the web request.
> This happens for Spark as described in SPARK-20772.
> The repro steps are:
> - Start up the spark-shell in yarn mode and run a job
> - Try to access the job details through http://:4040/jobs/job?id=0
> - A HTTP ERROR 400 is thrown (requirement failed: missing id parameter)
> This works fine in local or standalone mode, but does not work on Yarn where 
> the query parameter is dropped. If the UI filter 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is removed from 
> the config which shows that the problem is in the filter



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6615) AmIpFilter drops query parameters on redirect

2017-05-17 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-6615:

Attachment: YARN-6615-branch-2.6.1.patch

Patch for branch-2.6, slightly different than the branch-2.8 patch because 
there is no ProxyUtils yet

The trunk patch applies to branch-2 also which seems to have covered all open 
branches

> AmIpFilter drops query parameters on redirect
> -
>
> Key: YARN-6615
> URL: https://issues.apache.org/jira/browse/YARN-6615
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-6615.1.patch, YARN-6615-branch-2.6.1.patch, 
> YARN-6615-branch-2.8.1.patch
>
>
> When an AM web request is redirected to the RM the query parameters are 
> dropped from the web request.
> This happens for Spark as described in SPARK-20772.
> The repro steps are:
> - Start up the spark-shell in yarn mode and run a job
> - Try to access the job details through http://:4040/jobs/job?id=0
> - A HTTP ERROR 400 is thrown (requirement failed: missing id parameter)
> This works fine in local or standalone mode, but does not work on Yarn where 
> the query parameter is dropped. If the UI filter 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is removed from 
> the config which shows that the problem is in the filter



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6615) AmIpFilter drops query parameters on redirect

2017-05-17 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-6615:

Attachment: YARN-6615-branch-2.8.1.patch

As requested a patch for branch-2.8. This patch also applied to branch-2.7

> AmIpFilter drops query parameters on redirect
> -
>
> Key: YARN-6615
> URL: https://issues.apache.org/jira/browse/YARN-6615
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-6615.1.patch, YARN-6615-branch-2.8.1.patch
>
>
> When an AM web request is redirected to the RM the query parameters are 
> dropped from the web request.
> This happens for Spark as described in SPARK-20772.
> The repro steps are:
> - Start up the spark-shell in yarn mode and run a job
> - Try to access the job details through http://:4040/jobs/job?id=0
> - A HTTP ERROR 400 is thrown (requirement failed: missing id parameter)
> This works fine in local or standalone mode, but does not work on Yarn where 
> the query parameter is dropped. If the UI filter 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is removed from 
> the config which shows that the problem is in the filter



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6615) AmIpFilter drops query parameters on redirect

2017-05-17 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-6615:

Attachment: YARN-6615.1.patch

Adding the query parameters if set before we trigger the redirect including a 
new test

> AmIpFilter drops query parameters on redirect
> -
>
> Key: YARN-6615
> URL: https://issues.apache.org/jira/browse/YARN-6615
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: amrmproxy
>Affects Versions: 3.0.0-alpha2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-6615.1.patch
>
>
> When an AM web request is redirected to the RM the query parameters are 
> dropped from the web request.
> This happens for Spark as described in SPARK-20772.
> The repro steps are:
> - Start up the spark-shell in yarn mode and run a job
> - Try to access the job details through http://:4040/jobs/job?id=0
> - A HTTP ERROR 400 is thrown (requirement failed: missing id parameter)
> This works fine in local or standalone mode, but does not work on Yarn where 
> the query parameter is dropped. If the UI filter 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is removed from 
> the config which shows that the problem is in the filter



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6615) AmIpFilter drops query parameters on redirect

2017-05-17 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-6615:
---

 Summary: AmIpFilter drops query parameters on redirect
 Key: YARN-6615
 URL: https://issues.apache.org/jira/browse/YARN-6615
 Project: Hadoop YARN
  Issue Type: Bug
  Components: amrmproxy
Affects Versions: 3.0.0-alpha2
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


When an AM web request is redirected to the RM the query parameters are dropped 
from the web request.

This happens for Spark as described in SPARK-20772.
The repro steps are:
- Start up the spark-shell in yarn mode and run a job
- Try to access the job details through http://:4040/jobs/job?id=0
- A HTTP ERROR 400 is thrown (requirement failed: missing id parameter)

This works fine in local or standalone mode, but does not work on Yarn where 
the query parameter is dropped. If the UI filter 
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is removed from the 
config which shows that the problem is in the filter



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6566) add a property for a hadoop job to identified the full hive HQL script text in hadoop web view in multi-users environment

2017-05-10 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16004175#comment-16004175
 ] 

Wilfred Spiegelenburg commented on YARN-6566:
-

Looking at the feature I think we have two possible problems:
- From a security point I do not think that this is a good idea. The Hive SQL 
query could have sensitive information contained in it. There could be names or 
details in multiple parts of the query that you do not want to expose. This 
feature could thus lead to information disclosure that you can not control.
- Second problem I see here is that the query info could be large which has a 
performance impact. Not just for the UI but also for the state store. Queries 
can be hundreds of kilobytes and that detail would need to be persisted in the 
state store. There have been a number of jiras to implemented and currently 
being worked on to decrease the size of the object that is stored in the state 
store and this would negate a lot of that work.

> add a property for a hadoop job to identified the full hive HQL script text 
> in hadoop web view in multi-users environment
> -
>
> Key: YARN-6566
> URL: https://issues.apache.org/jira/browse/YARN-6566
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client, resourcemanager, webapp, yarn
>Affects Versions: 2.6.0
> Environment: centos 6.4 64bit
>Reporter: liuzhenhua
>  Labels: patch
> Fix For: 2.6.0
>
> Attachments: application-page.bmp, applications-page.bmp, 
> YARN-6566.1.patch, YARN-6566.2.patch, YARN-6566.3.patch, YARN-6566.4.patch, 
> YARN-6566.5.patch, YARN-6566.6.patch, YARN-6566.7.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When I tuning the hive HQL in multi-users environment,I can not get the full 
> SQL text in hadoop web view,so it is difficult to tuning the  SQL.When I try 
> to set the SQL text in hadoop job's jobname property,I realize it is going to 
> damage the structure of hadoop applications web view,so I add a property to 
> hadoop job,which named "jobdescription",When a hive HQL be submitted,the full 
> HQL text was assigned to the property,so i can identified the HQL coveniently 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6517) Fix warnings from Spotbugs in hadoop-yarn-common

2017-04-30 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990546#comment-15990546
 ] 

Wilfred Spiegelenburg commented on YARN-6517:
-

looks good now, +1 for me (non binding)

> Fix warnings from Spotbugs in hadoop-yarn-common
> 
>
> Key: YARN-6517
> URL: https://issues.apache.org/jira/browse/YARN-6517
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>  Labels: findbugs
> Attachments: YARN-6517.001.patch, YARN-6517.002.patch
>
>
> There are 2 findbugs warnings in hadoop-yarn-common project since switched to 
> spotbugs,
> # Possible null pointer dereference in 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogValue.getPendingLogFilesToUpload(File)
>  due to return value of called method
> # Possible null pointer dereference in 
> org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.getProcessList() due to 
> return value of called method
> see more in 
> [https://builds.apache.org/job/PreCommit-HADOOP-Build/12157/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common-warnings.html]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6510) Fix warning - procfs stat file is not in the expected format: YARN-3344 is not enough

2017-04-26 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984304#comment-15984304
 ] 

Wilfred Spiegelenburg commented on YARN-6510:
-

There only seems to be a maximum limit on the length of the name (maximum 16 
characters, including the ending null) in the underlying structures in the OS. 
Nothing stops an empty name at the OS level. We should thus be able to handle 
the empty name even if it is not a really useful case to support.

> Fix warning - procfs stat file is not in the expected format: YARN-3344 is 
> not enough
> -
>
> Key: YARN-6510
> URL: https://issues.apache.org/jira/browse/YARN-6510
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-6510.01.patch
>
>
> Even with the fix for YARN-3344 we still have issues with the procfs format.
> This is the case that is causing issues:
> {code}
> [user@nm1 ~]$ cat /proc/2406/stat
> 2406 (ib_fmr(mlx4_0)) S 2 0 0 0 -1 2149613632 0 0 0 0 166 126908 0 0 20 0 1 0 
> 4284 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 
> 0 0 17 6 0 0 0 0 0
> {code}
> We do not handle the parenthesis in the name which causes the pattern 
> matching to fail



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6517) Fix warnings from Spotbugs in hadoop-yarn-common

2017-04-25 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984145#comment-15984145
 ] 

Wilfred Spiegelenburg commented on YARN-6517:
-

[~cheersyang] I have closed the two jiras I opened as duplicates of this one. A 
couple of points, one already made by [~haibochen] in YARN-6513:
- A lot of indentation changes to re-use a return statement in both changes. 
Far simpler changes are attached to YARN-6512 and YARN-6513
- If we simplify the change to return an empty set, we could pass 0 to the 
HashSet constructor given that we don't reuse it.
- Checking procfsDir for null will hide the case of an incorrect configuration. 
This currently could only happen in test cases but it could hide a broken test 
and would be careful doing this.

if we really want to handle the null or empty procfsDir case we should do that 
from a new jira.


> Fix warnings from Spotbugs in hadoop-yarn-common
> 
>
> Key: YARN-6517
> URL: https://issues.apache.org/jira/browse/YARN-6517
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>  Labels: findbugs
> Attachments: YARN-6517.001.patch
>
>
> There are 2 findbugs warnings in hadoop-yarn-common project since switched to 
> spotbugs,
> # Possible null pointer dereference in 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogValue.getPendingLogFilesToUpload(File)
>  due to return value of called method
> # Possible null pointer dereference in 
> org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.getProcessList() due to 
> return value of called method
> see more in 
> [https://builds.apache.org/job/PreCommit-HADOOP-Build/12157/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common-warnings.html]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6513) Fix for FindBugs getPendingLogFilesToUpload() possible NPE

2017-04-25 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984011#comment-15984011
 ] 

Wilfred Spiegelenburg commented on YARN-6513:
-

Yep, I am happy with making it a duplicate, I'll comment on the change there.

> Fix for FindBugs getPendingLogFilesToUpload() possible NPE
> --
>
> Key: YARN-6513
> URL: https://issues.apache.org/jira/browse/YARN-6513
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-6513.01.patch
>
>
> {code}
> Possible null pointer dereference in 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogValue.getPendingLogFilesToUpload(File)
>  due to return value of called method
> Bug type NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE (click for details) 
> In class org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogValue
> In method 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogValue.getPendingLogFilesToUpload(File)
> Local variable stored in JVM register ?
> Method invoked at AggregatedLogFormat.java:[line 314]
> Known null at AggregatedLogFormat.java:[line 314]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6512) Fix for FindBugs getProcessList() possible NPE

2017-04-25 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984010#comment-15984010
 ] 

Wilfred Spiegelenburg commented on YARN-6512:
-

I am happy to make this a duplicate of YARN-6517. I'll comment on the changes 
in that jira.

> Fix for FindBugs getProcessList() possible NPE
> --
>
> Key: YARN-6512
> URL: https://issues.apache.org/jira/browse/YARN-6512
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-6512.01.patch
>
>
> Findbugs output:
> {code}
> Possible null pointer dereference in 
> org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.getProcessList() due to 
> return value of called method
> Bug type NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE
> In class org.apache.hadoop.yarn.util.ProcfsBasedProcessTree
> In method org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.getProcessList()
> Value loaded from processDirs
> Dereferenced at ProcfsBasedProcessTree.java:[line 487]
> Known null at ProcfsBasedProcessTree.java:[line 484]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6513) Fix for FindBugs getPendingLogFilesToUpload() possible NPE

2017-04-22 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-6513:

Attachment: YARN-6513.01.patch

simple patch for findbugs warning no tests added

> Fix for FindBugs getPendingLogFilesToUpload() possible NPE
> --
>
> Key: YARN-6513
> URL: https://issues.apache.org/jira/browse/YARN-6513
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-6513.01.patch
>
>
> {code}
> Possible null pointer dereference in 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogValue.getPendingLogFilesToUpload(File)
>  due to return value of called method
> Bug type NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE (click for details) 
> In class org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogValue
> In method 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogValue.getPendingLogFilesToUpload(File)
> Local variable stored in JVM register ?
> Method invoked at AggregatedLogFormat.java:[line 314]
> Known null at AggregatedLogFormat.java:[line 314]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6513) Fix for FindBugs getPendingLogFilesToUpload() possible NPE

2017-04-22 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-6513:
---

 Summary: Fix for FindBugs getPendingLogFilesToUpload() possible NPE
 Key: YARN-6513
 URL: https://issues.apache.org/jira/browse/YARN-6513
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wilfred Spiegelenburg


{code}
Possible null pointer dereference in 
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogValue.getPendingLogFilesToUpload(File)
 due to return value of called method
Bug type NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE (click for details) 
In class org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogValue
In method 
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogValue.getPendingLogFilesToUpload(File)
Local variable stored in JVM register ?
Method invoked at AggregatedLogFormat.java:[line 314]
Known null at AggregatedLogFormat.java:[line 314]
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6510) Fix warning - procfs stat file is not in the expected format: YARN-3344 is not enough

2017-04-22 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980109#comment-15980109
 ] 

Wilfred Spiegelenburg commented on YARN-6510:
-

FindBugs warnings are not related to this patch:
YARN-6512 for the getProcessList()
YARN-6513 for the getPendingLogFilesToUpload()

> Fix warning - procfs stat file is not in the expected format: YARN-3344 is 
> not enough
> -
>
> Key: YARN-6510
> URL: https://issues.apache.org/jira/browse/YARN-6510
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-6510.01.patch
>
>
> Even with the fix for YARN-3344 we still have issues with the procfs format.
> This is the case that is causing issues:
> {code}
> [user@nm1 ~]$ cat /proc/2406/stat
> 2406 (ib_fmr(mlx4_0)) S 2 0 0 0 -1 2149613632 0 0 0 0 166 126908 0 0 20 0 1 0 
> 4284 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 
> 0 0 17 6 0 0 0 0 0
> {code}
> We do not handle the parenthesis in the name which causes the pattern 
> matching to fail



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6513) Fix for FindBugs getPendingLogFilesToUpload() possible NPE

2017-04-22 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg reassigned YARN-6513:
---

Assignee: Wilfred Spiegelenburg

> Fix for FindBugs getPendingLogFilesToUpload() possible NPE
> --
>
> Key: YARN-6513
> URL: https://issues.apache.org/jira/browse/YARN-6513
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>
> {code}
> Possible null pointer dereference in 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogValue.getPendingLogFilesToUpload(File)
>  due to return value of called method
> Bug type NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE (click for details) 
> In class org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogValue
> In method 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogValue.getPendingLogFilesToUpload(File)
> Local variable stored in JVM register ?
> Method invoked at AggregatedLogFormat.java:[line 314]
> Known null at AggregatedLogFormat.java:[line 314]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6512) Fix for FindBugs getProcessList() possible NPE

2017-04-22 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-6512:

Attachment: YARN-6512.01.patch

> Fix for FindBugs getProcessList() possible NPE
> --
>
> Key: YARN-6512
> URL: https://issues.apache.org/jira/browse/YARN-6512
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-6512.01.patch
>
>
> Findbugs output:
> {code}
> Possible null pointer dereference in 
> org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.getProcessList() due to 
> return value of called method
> Bug type NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE
> In class org.apache.hadoop.yarn.util.ProcfsBasedProcessTree
> In method org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.getProcessList()
> Value loaded from processDirs
> Dereferenced at ProcfsBasedProcessTree.java:[line 487]
> Known null at ProcfsBasedProcessTree.java:[line 484]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6512) Fix for FindBugs getProcessList() possible NPE

2017-04-22 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-6512:
---

 Summary: Fix for FindBugs getProcessList() possible NPE
 Key: YARN-6512
 URL: https://issues.apache.org/jira/browse/YARN-6512
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


Findbugs output:
{code}
Possible null pointer dereference in 
org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.getProcessList() due to 
return value of called method
Bug type NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE
In class org.apache.hadoop.yarn.util.ProcfsBasedProcessTree
In method org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.getProcessList()
Value loaded from processDirs
Dereferenced at ProcfsBasedProcessTree.java:[line 487]
Known null at ProcfsBasedProcessTree.java:[line 484]
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6510) Fix warning - procfs stat file is not in the expected format: YARN-3344 is not enough

2017-04-21 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-6510:

Attachment: YARN-6510.01.patch

Patch including test

> Fix warning - procfs stat file is not in the expected format: YARN-3344 is 
> not enough
> -
>
> Key: YARN-6510
> URL: https://issues.apache.org/jira/browse/YARN-6510
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-6510.01.patch
>
>
> Even with the fix for YARN-3344 we still have issues with the procfs format.
> This is the case that is causing issues:
> {code}
> [user@nm1 ~]$ cat /proc/2406/stat
> 2406 (ib_fmr(mlx4_0)) S 2 0 0 0 -1 2149613632 0 0 0 0 166 126908 0 0 20 0 1 0 
> 4284 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 
> 0 0 17 6 0 0 0 0 0
> {code}
> We do not handle the parenthesis in the name which causes the pattern 
> matching to fail



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6510) Fix warning - procfs stat file is not in the expected format: YARN-3344 is not enough

2017-04-21 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-6510:
---

 Summary: Fix warning - procfs stat file is not in the expected 
format: YARN-3344 is not enough
 Key: YARN-6510
 URL: https://issues.apache.org/jira/browse/YARN-6510
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0-alpha2, 2.8.0
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


Even with the fix for YARN-3344 we still have issues with the procfs format.

This is the case that is causing issues:
{code}
[user@nm1 ~]$ cat /proc/2406/stat
2406 (ib_fmr(mlx4_0)) S 2 0 0 0 -1 2149613632 0 0 0 0 166 126908 0 0 20 0 1 0 
4284 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744073709551615 0 
0 17 6 0 0 0 0 0
{code}

We do not handle the parenthesis in the name which causes the pattern matching 
to fail



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6490) Turn on assign multiple after removing continuous scheduling

2017-04-17 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-6490:
---

 Summary: Turn on assign multiple after removing continuous 
scheduling
 Key: YARN-6490
 URL: https://issues.apache.org/jira/browse/YARN-6490
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


To help loading up a cluster when not using continuous scheduling change the 
default for {{yarn.scheduler.fair.assignmultiple}} from {{false}} to {{true}}.

This requires the change from YARN-5035 to make sure that we leverage assigning 
more than one container to a node per heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6489) Remove continuous scheduling code

2017-04-17 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-6489:
---

 Summary: Remove continuous scheduling code
 Key: YARN-6489
 URL: https://issues.apache.org/jira/browse/YARN-6489
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6488) Remove continuous scheduling tests

2017-04-17 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-6488:
---

 Summary: Remove continuous scheduling tests
 Key: YARN-6488
 URL: https://issues.apache.org/jira/browse/YARN-6488
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


Remove all continuous scheduling tests from the code



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6487) FairScheduler: remove continuous scheduling (YARN-1010)

2017-04-17 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-6487:

Summary: FairScheduler: remove continuous scheduling (YARN-1010)  (was: 
FairScheduler: remove continuous scheduling (YARN-1010)

> FairScheduler: remove continuous scheduling (YARN-1010)
> ---
>
> Key: YARN-6487
> URL: https://issues.apache.org/jira/browse/YARN-6487
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>
> Remove deprecated FairScheduler continuous scheduler code



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6487) FairScheduler: remove continuous scheduling (YARN-1010

2017-04-17 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-6487:
---

 Summary: FairScheduler: remove continuous scheduling (YARN-1010
 Key: YARN-6487
 URL: https://issues.apache.org/jira/browse/YARN-6487
 Project: Hadoop YARN
  Issue Type: Task
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


Remove deprecated FairScheduler continuous scheduler code



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6486) FairScheduler: Deprecate continuous scheduling in 2.9

2017-04-17 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-6486:
---

 Summary: FairScheduler: Deprecate continuous scheduling in 2.9
 Key: YARN-6486
 URL: https://issues.apache.org/jira/browse/YARN-6486
 Project: Hadoop YARN
  Issue Type: Task
  Components: fairscheduler
Affects Versions: 2.9.0
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


Mark continuous scheduling as deprecated in 2.9 and remove the code in 3.0. 
Removing continuous scheduling from the code will be logged as a separate jira



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6175) Negative vcore for resource needed to preempt

2017-02-14 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867208#comment-15867208
 ] 

Wilfred Spiegelenburg commented on YARN-6175:
-

The change overall looks good. The junit test failures are not related to the 
change.

+1 (non binding)

> Negative vcore for resource needed to preempt
> -
>
> Key: YARN-6175
> URL: https://issues.apache.org/jira/browse/YARN-6175
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6175.001.patch
>
>
> Both old preemption code (2.8 and before) and new preemption code could have 
> negative vcores while calculating resources needed to preempt.
> For old preemption, you can find following messages in RM logs:
> {code}
> Should preempt  
> {code}
> The related code is in method {{resourceDeficit()}}. 
> For new preemption code, there are no messages in RM logs, the related code 
> is in method {{fairShareStarvation()}}. 
> The negative value isn't only a display issue, but also may cause missing 
> necessary preemption. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6042) Fairscheduler: Dump scheduler state in log

2017-02-14 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867112#comment-15867112
 ] 

Wilfred Spiegelenburg commented on YARN-6042:
-

Thank you for the update, we discussed it offline and the change looks good now.

I have one minor nit in the text update we should have in FSAppAttempt:  
_creating_ should be _create_ in the debug text
_LOG.debug("Couldn't creating reservation for app:  " + getName()_

> Fairscheduler: Dump scheduler state in log
> --
>
> Key: YARN-6042
> URL: https://issues.apache.org/jira/browse/YARN-6042
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6042.001.patch, YARN-6042.002.patch, 
> YARN-6042.003.patch, YARN-6042.004.patch, YARN-6042.005.patch
>
>
> To improve the debugging of scheduler issues it would be a big improvement to 
> be able to dump the scheduler state into a log on request. 
> The Dump the scheduler state at a point in time would allow debugging of a 
> scheduler that is not hung (deadlocked) but also not assigning containers. 
> Currently we do not have a proper overview of what state the scheduler and 
> the queues are in and we have to make assumptions or guess
> The scheduler and queue state needed would include (not exhaustive):
> - instantaneous and steady fair share (app / queue)
> - AM share and resources
> - weight
> - app demand
> - application run state (runnable/non runnable)
> - last time at fair/min share



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6042) Fairscheduler: Dump scheduler state in log

2017-02-07 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857340#comment-15857340
 ] 

Wilfred Spiegelenburg commented on YARN-6042:
-

I looked at the changes and it will help debugging the FS a lot when we get 
this into a release

A couple of things:
# In the FairScheduler change you add a new method {{dumpSchedulerState()}} why 
are you not passing in the rootQueue to the method? It safes getting it again 
since you have already got it the update method.
# I am missing one number for the applications in the {{dumpStateInternal()}} 
for the FSLeafQueue: {{getNumPendingApps()}} or {{getNumActiveApps()}}. We need 
to have one of those to have a full view of what the application state is in 
the queue.
# We add the LastTimeAtMinShare but not the LastTimeAtFairShare for the leaf 
queue as per: {{getLastTimeAtFairShareThreshold()}}

I am also a bit worried about the test: in the output we build the debug string 
and get the time in milliseconds for the LastTimeAtMinShare. What if the 
{{updateStarvationStats()}} call was run 1 millisecond earlier than the debug 
string was build? The comparison would fail and the test would fail because of 
that. I don't think we can guarantee that those two calls will be in the same 
millisecond.

> Fairscheduler: Dump scheduler state in log
> --
>
> Key: YARN-6042
> URL: https://issues.apache.org/jira/browse/YARN-6042
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6042.001.patch, YARN-6042.002.patch
>
>
> To improve the debugging of scheduler issues it would be a big improvement to 
> be able to dump the scheduler state into a log on request. 
> The Dump the scheduler state at a point in time would allow debugging of a 
> scheduler that is not hung (deadlocked) but also not assigning containers. 
> Currently we do not have a proper overview of what state the scheduler and 
> the queues are in and we have to make assumptions or guess
> The scheduler and queue state needed would include (not exhaustive):
> - instantaneous and steady fair share (app / queue)
> - AM share and resources
> - weight
> - app demand
> - application run state (runnable/non runnable)
> - last time at fair/min share



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue

2017-01-05 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803665#comment-15803665
 ] 

Wilfred Spiegelenburg commented on YARN-5554:
-

bq. In testMoveApplicationSubmitTargetQueue() and 
testMoveApplicationAdminTargetQueue(), would it make sense to test that the 
moves that are supposed to work do actually work?

The application object itself is hidden and mocked up as an object. The testing 
from this side is purely for the ACL checks and enforcement. The application 
object that would be changed is not reachable from the {{ClientRMService}} at 
all. It would require a lot of changes that test underlying the underlying code 
base more than the client service.
Now that I think about it: we might even be able to make this far simpler if we 
move things around. Now that we have the move in the {{RMAppManager}} we could 
even think about moving all these ACL checks etc into the pre-validate check, 
or a security check, that is performed in the app manager. It does make more 
sense to have it there.

bq. Why a ConcurrentHashMap in createClientRMServiceForMoveApplicationRequest() 
instead of Collections.singletonMap()?
I used that because the {{thenReturn}} expects a {{ConcurrentHashMap}}. The 
{{apps}} variable must be declared like it is. To use th singletonMap I then 
have to cast in the code which does not make it any more readable or 
maintainable. The code that works would look like this:
{code}
ConcurrentHashMap apps = (ConcurrentHashMap) 
Collections.singletonMap(applicationId, app);
when(rmContext.getRMApps()).thenReturn(apps);
{code}
That does not look any nicer than what we now have does it?

> MoveApplicationAcrossQueues does not check user permission on the target queue
> --
>
> Key: YARN-5554
> URL: https://issues.apache.org/jira/browse/YARN-5554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
>  Labels: oct16-medium
> Attachments: YARN-5554.10.patch, YARN-5554.11.patch, 
> YARN-5554.12.patch, YARN-5554.13.patch, YARN-5554.14.patch, 
> YARN-5554.2.patch, YARN-5554.3.patch, YARN-5554.4.patch, YARN-5554.5.patch, 
> YARN-5554.6.patch, YARN-5554.7.patch, YARN-5554.8.patch, YARN-5554.9.patch
>
>
> moveApplicationAcrossQueues operation currently does not check user 
> permission on the target queue. This incorrectly allows one user to move 
> his/her own applications to a queue that the user has no access to



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6056) Yarn NM using LCE shows a failure when trying to delete a non-existing dir

2017-01-05 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15801512#comment-15801512
 ] 

Wilfred Spiegelenburg commented on YARN-6056:
-

correct, if you pass in multiple directories then a directory in that list 
which does not exist on the file system should not be fatal. We should not stop 
processing and just continue with the next in the list. In that way a directory 
that does not exist is not a failed delete. The end result is the correct the 
directory does not exist (any more) on the FS and should thus not be a failure.

I am not sure what is going on with the build but it looks like {{protoc}} 
failed which caused a cascading failure.


> Yarn NM using LCE shows a failure when trying to delete a non-existing dir
> --
>
> Key: YARN-6056
> URL: https://issues.apache.org/jira/browse/YARN-6056
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.6.5
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-6056-branch-2.6.1.patch
>
>
> As part of YARN-2902 the clean up of the local directories was changed to 
> ignore non existing directories and proceed with others in the list. This 
> part of the code change was not backported into branch-2.6, backporting just 
> that part now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue

2017-01-05 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-5554:

Attachment: YARN-5554.14.patch

fixed one checkstyle issue introduced and the one remark from the review

> MoveApplicationAcrossQueues does not check user permission on the target queue
> --
>
> Key: YARN-5554
> URL: https://issues.apache.org/jira/browse/YARN-5554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
>  Labels: oct16-medium
> Attachments: YARN-5554.10.patch, YARN-5554.11.patch, 
> YARN-5554.12.patch, YARN-5554.13.patch, YARN-5554.14.patch, 
> YARN-5554.2.patch, YARN-5554.3.patch, YARN-5554.4.patch, YARN-5554.5.patch, 
> YARN-5554.6.patch, YARN-5554.7.patch, YARN-5554.8.patch, YARN-5554.9.patch
>
>
> moveApplicationAcrossQueues operation currently does not check user 
> permission on the target queue. This incorrectly allows one user to move 
> his/her own applications to a queue that the user has no access to



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6056) Yarn NM using LCE shows a failure when trying to delete a non-existing dir

2017-01-04 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-6056:

Attachment: YARN-6056-branch-2.6.1.patch

patch just for branch-2.6

> Yarn NM using LCE shows a failure when trying to delete a non-existing dir
> --
>
> Key: YARN-6056
> URL: https://issues.apache.org/jira/browse/YARN-6056
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.6.5
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-6056-branch-2.6.1.patch
>
>
> As part of YARN-2902 the clean up of the local directories was changed to 
> ignore non existing directories and proceed with others in the list. This 
> part of the code change was not backported into branch-2.6, backporting just 
> that part now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2017-01-04 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15800488#comment-15800488
 ] 

Wilfred Spiegelenburg commented on YARN-2902:
-

Oops yes branch-2.6 was the broken one, it has a separate patch. I will port 
only the ENOENT change to branch-2.6
thank you for confirming this.

Opened: YARN-6056 and will provide a patch there

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Fix For: 2.8.0, 2.7.2, 2.6.4, 3.0.0-alpha1
>
> Attachments: YARN-2902-branch-2.6.01.patch, YARN-2902.002.patch, 
> YARN-2902.03.patch, YARN-2902.04.patch, YARN-2902.05.patch, 
> YARN-2902.06.patch, YARN-2902.07.patch, YARN-2902.08.patch, 
> YARN-2902.09.patch, YARN-2902.10.patch, YARN-2902.11.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6056) Yarn NM using LCE shows a failure when trying to delete a non-existing dir

2017-01-04 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-6056:
---

 Summary: Yarn NM using LCE shows a failure when trying to delete a 
non-existing dir
 Key: YARN-6056
 URL: https://issues.apache.org/jira/browse/YARN-6056
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.6.5
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


As part of YARN-2902 the clean up of the local directories was changed to 
ignore non existing directories and proceed with others in the list. This part 
of the code change was not backported into branch-2.6, backporting just that 
part now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue

2017-01-04 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-5554:

Attachment: YARN-5554.13.patch

Fixed up the comments as per the feedback

> MoveApplicationAcrossQueues does not check user permission on the target queue
> --
>
> Key: YARN-5554
> URL: https://issues.apache.org/jira/browse/YARN-5554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
>  Labels: oct16-medium
> Attachments: YARN-5554.10.patch, YARN-5554.11.patch, 
> YARN-5554.12.patch, YARN-5554.13.patch, YARN-5554.2.patch, YARN-5554.3.patch, 
> YARN-5554.4.patch, YARN-5554.5.patch, YARN-5554.6.patch, YARN-5554.7.patch, 
> YARN-5554.8.patch, YARN-5554.9.patch
>
>
> moveApplicationAcrossQueues operation currently does not check user 
> permission on the target queue. This incorrectly allows one user to move 
> his/her own applications to a queue that the user has no access to



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2017-01-03 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796805#comment-15796805
 ] 

Wilfred Spiegelenburg commented on YARN-2902:
-

I was looking at a problem around removing non existing directories in the LCE 
(native code) and saw that there was a difference in behaviour between trunk 
and branch-2.
In trunk we "ignore" a non existing directory in {{delete_as_user()}} whan the 
stat returns an {{ENOENT}} we do not do that in branch-2. I tracked it back to 
the backport of this jira into branch-2. The native code part of the change was 
not ported back into branch-2.

Question is was that on purpose or was it an oversight? If it was an oversight 
does this require a new jira to backport the native code or does it get handled 
as an addendum to this one. It has been a while since this jira was closed and 
the jira has been in a number of releases.

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Fix For: 2.8.0, 2.7.2, 2.6.4, 3.0.0-alpha1
>
> Attachments: YARN-2902-branch-2.6.01.patch, YARN-2902.002.patch, 
> YARN-2902.03.patch, YARN-2902.04.patch, YARN-2902.05.patch, 
> YARN-2902.06.patch, YARN-2902.07.patch, YARN-2902.08.patch, 
> YARN-2902.09.patch, YARN-2902.10.patch, YARN-2902.11.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue

2017-01-02 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-5554:

Attachment: YARN-5554.12.patch

Updated patch:
- rebased to make sure it still applies to trunk after YARN-5932
- In ClientRMService added the missing space before the pipes
- Added a comment to the QueueACLsManager.checkAccess() to explain the versions 
and the scheduler dependency
- Added \@Override in TestClientRMService.getQueueAclManager() and 
TestClientRMService.createClientRMServiceForMoveApplicationRequest()
- removed the suppress warnings annotations (not needed after rebase)

The other two remarks I have left for the backport to branch-2:
- problems with Java 7 and the non-final parameters being used inside the 
anonymous inner class (yes it exists)
- suppress warnings annotation should not be needed after YARN-5932

I will log a new jira to work on the change for the QueueACLsManager.

> MoveApplicationAcrossQueues does not check user permission on the target queue
> --
>
> Key: YARN-5554
> URL: https://issues.apache.org/jira/browse/YARN-5554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
>  Labels: oct16-medium
> Attachments: YARN-5554.10.patch, YARN-5554.11.patch, 
> YARN-5554.12.patch, YARN-5554.2.patch, YARN-5554.3.patch, YARN-5554.4.patch, 
> YARN-5554.5.patch, YARN-5554.6.patch, YARN-5554.7.patch, YARN-5554.8.patch, 
> YARN-5554.9.patch
>
>
> moveApplicationAcrossQueues operation currently does not check user 
> permission on the target queue. This incorrectly allows one user to move 
> his/her own applications to a queue that the user has no access to



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-220) NM should limit number of applications who's logs are being aggregated

2016-12-27 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15782062#comment-15782062
 ] 

Wilfred Spiegelenburg commented on YARN-220:


Should this be marked as fixed now that we have YARN-4697 (limit to the 
threadpool for uploads) and YARN-4766 (do not upload files older than the 
retention policy). It does not solve the case of falling behind but at least we 
have limits on what we upload now.

> NM should limit number of applications who's logs are being aggregated
> --
>
> Key: YARN-220
> URL: https://issues.apache.org/jira/browse/YARN-220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 0.23.4
>Reporter: Robert Joseph Evans
>
> The NodeManager should limit the number of applications that have their logs 
> being aggregated in parallel.  This will reduce the load on the NN.  We need 
> to ensure that the RM will continue to renew the token while this is 
> happening.  We also should look if the NM starts to fall behind if it can 
> delete some of the logs or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5136) Error in handling event type APP_ATTEMPT_REMOVED to the scheduler

2016-12-07 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730837#comment-15730837
 ] 

Wilfred Spiegelenburg commented on YARN-5136:
-

Thank you [~templedf] for the review and commit

> Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
> -
>
> Key: YARN-5136
> URL: https://issues.apache.org/jira/browse/YARN-5136
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: Wilfred Spiegelenburg
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: YARN-5136.1.patch, YARN-5136.2.patch
>
>
> move app cause rm exit
> {noformat}
> 2016-05-24 23:20:47,202 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_REMOVED to the scheduler
> java.lang.IllegalStateException: Given app to remove 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt@ea94c3b
>  does not exist in queue [root.bdp_xx.bdp_mart_xx_formal, 
> demand=, running= vCores:13422>, share=, w= weight=1.0>]
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:779)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1231)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:680)
> at java.lang.Thread.run(Thread.java:745)
> 2016-05-24 23:20:47,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e04_1464073905025_15410_01_001759 Container Transitioned from 
> ACQUIRED to RELEASED
> 2016-05-24 23:20:47,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue

2016-12-07 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730829#comment-15730829
 ] 

Wilfred Spiegelenburg commented on YARN-5554:
-

I am all for it but I think we should do that from a follow up jira and not as 
part of this one.

The reason I think that we should do it in a separate jira is that within the 
FairScheduler when you dig deeper the access check performed in the queue is 
exactly what is now done for the CapacityScheduler. The {{FSQueue.hasAccess()}} 
is using the same call to an {{YarnAuthorizationProvider}} as we have now in 
the in the QueueACLsManager for CS. 

> MoveApplicationAcrossQueues does not check user permission on the target queue
> --
>
> Key: YARN-5554
> URL: https://issues.apache.org/jira/browse/YARN-5554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
>  Labels: oct16-medium
> Attachments: YARN-5554.10.patch, YARN-5554.11.patch, 
> YARN-5554.2.patch, YARN-5554.3.patch, YARN-5554.4.patch, YARN-5554.5.patch, 
> YARN-5554.6.patch, YARN-5554.7.patch, YARN-5554.8.patch, YARN-5554.9.patch
>
>
> moveApplicationAcrossQueues operation currently does not check user 
> permission on the target queue. This incorrectly allows one user to move 
> his/her own applications to a queue that the user has no access to



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue

2016-12-07 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730275#comment-15730275
 ] 

Wilfred Spiegelenburg commented on YARN-5554:
-

Correct the {{checkAccess()}} methods does not have a way to communicate back 
that the queue does not exist and says that access is denied. There is no way 
to distinguish the two and we really want to leave some clue behind in the logs 
which case we have seen.
In the normal {{checkAccess()}} case a queue that does not exist is not likely, 
maybe not even possible, since the queue is set on the application.

> MoveApplicationAcrossQueues does not check user permission on the target queue
> --
>
> Key: YARN-5554
> URL: https://issues.apache.org/jira/browse/YARN-5554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
>  Labels: oct16-medium
> Attachments: YARN-5554.10.patch, YARN-5554.11.patch, 
> YARN-5554.2.patch, YARN-5554.3.patch, YARN-5554.4.patch, YARN-5554.5.patch, 
> YARN-5554.6.patch, YARN-5554.7.patch, YARN-5554.8.patch, YARN-5554.9.patch
>
>
> moveApplicationAcrossQueues operation currently does not check user 
> permission on the target queue. This incorrectly allows one user to move 
> his/her own applications to a queue that the user has no access to



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue

2016-12-06 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727669#comment-15727669
 ] 

Wilfred Spiegelenburg commented on YARN-5554:
-

The main point is that the {{ClientRMService}} does not have direct access to 
the Scheduler. All access checks run through the {{QueueACLsManager}} or the 
{{ApplicationACLsManager}}. Any change must thus go through that. In this case 
the new method was introduced because the current method does not have the 
destination queue available. We need to check the destination queue the 
originating queue is already checked earlier by calling the existing method. 
The passed in application has not been moved yet and thus still has the 
original queue. Updating the application is not possible because that would 
pre-empt the fact that the application can and will be moved.

The target queue checks are performed because it comes out of the move request 
and has not been checked at the time the access check is performed. To be able 
to distinguish between an access denied and a queue that does not exist the log 
message was added if the queue returned is empty. Without that check, and the 
log entries, at that point we would not be able to trace back that difference.

I looked at folding the two methods into one to remove some code duplication 
but stopped with that. The small but important differences between the two 
methods required a number of {{if ... else ...}} constructs which made the code 
really difficult to read and understand.





> MoveApplicationAcrossQueues does not check user permission on the target queue
> --
>
> Key: YARN-5554
> URL: https://issues.apache.org/jira/browse/YARN-5554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
>  Labels: oct16-medium
> Attachments: YARN-5554.10.patch, YARN-5554.11.patch, 
> YARN-5554.2.patch, YARN-5554.3.patch, YARN-5554.4.patch, YARN-5554.5.patch, 
> YARN-5554.6.patch, YARN-5554.7.patch, YARN-5554.8.patch, YARN-5554.9.patch
>
>
> moveApplicationAcrossQueues operation currently does not check user 
> permission on the target queue. This incorrectly allows one user to move 
> his/her own applications to a queue that the user has no access to



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue

2016-12-05 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-5554:

Attachment: YARN-5554.11.patch

removing unused imports to fix checkstyle warnings

> MoveApplicationAcrossQueues does not check user permission on the target queue
> --
>
> Key: YARN-5554
> URL: https://issues.apache.org/jira/browse/YARN-5554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
>  Labels: oct16-medium
> Attachments: YARN-5554.10.patch, YARN-5554.11.patch, 
> YARN-5554.2.patch, YARN-5554.3.patch, YARN-5554.4.patch, YARN-5554.5.patch, 
> YARN-5554.6.patch, YARN-5554.7.patch, YARN-5554.8.patch, YARN-5554.9.patch
>
>
> moveApplicationAcrossQueues operation currently does not check user 
> permission on the target queue. This incorrectly allows one user to move 
> his/her own applications to a queue that the user has no access to



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue

2016-12-02 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-5554:

Attachment: YARN-5554.10.patch

New patch with the changes from the review

> MoveApplicationAcrossQueues does not check user permission on the target queue
> --
>
> Key: YARN-5554
> URL: https://issues.apache.org/jira/browse/YARN-5554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
>  Labels: oct16-medium
> Attachments: YARN-5554.10.patch, YARN-5554.2.patch, 
> YARN-5554.3.patch, YARN-5554.4.patch, YARN-5554.5.patch, YARN-5554.6.patch, 
> YARN-5554.7.patch, YARN-5554.8.patch, YARN-5554.9.patch
>
>
> moveApplicationAcrossQueues operation currently does not check user 
> permission on the target queue. This incorrectly allows one user to move 
> his/her own applications to a queue that the user has no access to



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue

2016-12-02 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15715560#comment-15715560
 ] 

Wilfred Spiegelenburg commented on YARN-5554:
-

bq." doesn't have permissions submit to target queue: " is missing a "to" 
before the "submit."

fixed the typo

bq. In QueueACLsManager.checkAccess(), I don't see why you need to do the 
scheduler-dependent if. Can't you just call checkAccess() in all cases?

The capacity scheduler part is a copy of the checkAccess() that is already 
there. The change to not use the checkAccess() of the scheduler for the 
capacity scheduler was made as part of YARN-4571. Bringing the FairScheduler 
and the CapacityScheduler in sync is more work than we can just push into this 
jira. I think it is better to open a follow up jira to refactor this and bring 
the two schedulers in sync again. Let me know if you agree with that approach.

bq. In your tests, I would feel better if you tested that the app is in the 
right queue after the successful moves.

Because of the way the tests are mocked up the current tests can not do that. 
We create a ClientRMService which does not have a scheduler or an application 
manager. The test are focussed on the ACL managers and making sure that they 
stop the move in the service. We can extend the tests to do the app checks but 
that would introduce scheduler specific testing into the client service.

bq. Note that your use of a lambda in 
createClientRMServiceForMoveApplicationRequest() means this patch can only go 
into trunk.

oops did not think about that. I'll have rewritten the tests to remove the 
lambda. I now really appreciate the simplicity of using a lambda ;-)

> MoveApplicationAcrossQueues does not check user permission on the target queue
> --
>
> Key: YARN-5554
> URL: https://issues.apache.org/jira/browse/YARN-5554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
>  Labels: oct16-medium
> Attachments: YARN-5554.2.patch, YARN-5554.3.patch, YARN-5554.4.patch, 
> YARN-5554.5.patch, YARN-5554.6.patch, YARN-5554.7.patch, YARN-5554.8.patch, 
> YARN-5554.9.patch
>
>
> moveApplicationAcrossQueues operation currently does not check user 
> permission on the target queue. This incorrectly allows one user to move 
> his/her own applications to a queue that the user has no access to



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5136) Error in handling event type APP_ATTEMPT_REMOVED to the scheduler

2016-12-02 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-5136:

Attachment: YARN-5136.2.patch

Updated the patch with the review comments:
- added state checks in the tests
- change the return to a throw if the app was stopped before the move

> Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
> -
>
> Key: YARN-5136
> URL: https://issues.apache.org/jira/browse/YARN-5136
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-5136.1.patch, YARN-5136.2.patch
>
>
> move app cause rm exit
> {noformat}
> 2016-05-24 23:20:47,202 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_REMOVED to the scheduler
> java.lang.IllegalStateException: Given app to remove 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt@ea94c3b
>  does not exist in queue [root.bdp_xx.bdp_mart_xx_formal, 
> demand=, running= vCores:13422>, share=, w= weight=1.0>]
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:779)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1231)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:680)
> at java.lang.Thread.run(Thread.java:745)
> 2016-05-24 23:20:47,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e04_1464073905025_15410_01_001759 Container Transitioned from 
> ACQUIRED to RELEASED
> 2016-05-24 23:20:47,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5136) Error in handling event type APP_ATTEMPT_REMOVED to the scheduler

2016-11-16 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15672202#comment-15672202
 ] 

Wilfred Spiegelenburg commented on YARN-5136:
-

Opened YARN-5895 to track the new failure in 
TestRMRestart#testFinishedAppRemovalAfterRMRestart

> Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
> -
>
> Key: YARN-5136
> URL: https://issues.apache.org/jira/browse/YARN-5136
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-5136.1.patch
>
>
> move app cause rm exit
> {noformat}
> 2016-05-24 23:20:47,202 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_REMOVED to the scheduler
> java.lang.IllegalStateException: Given app to remove 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt@ea94c3b
>  does not exist in queue [root.bdp_xx.bdp_mart_xx_formal, 
> demand=, running= vCores:13422>, share=, w= weight=1.0>]
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:779)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1231)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:680)
> at java.lang.Thread.run(Thread.java:745)
> 2016-05-24 23:20:47,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e04_1464073905025_15410_01_001759 Container Transitioned from 
> ACQUIRED to RELEASED
> 2016-05-24 23:20:47,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5895) TestRMRestart#testFinishedAppRemovalAfterRMRestart is still flakey

2016-11-16 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-5895:
---

 Summary: TestRMRestart#testFinishedAppRemovalAfterRMRestart is 
still flakey 
 Key: YARN-5895
 URL: https://issues.apache.org/jira/browse/YARN-5895
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0-alpha1
Reporter: Wilfred Spiegelenburg


Even after YARN-5362 the test is still flaky:
{code}
Tests run: 29, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 100.652 sec 
<<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
testFinishedAppRemovalAfterRMRestart(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
  Time elapsed: 0.338 sec  <<< FAILURE!
java.lang.AssertionError: expected null, but was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotNull(Assert.java:664)
at org.junit.Assert.assertNull(Assert.java:646)
at org.junit.Assert.assertNull(Assert.java:656)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1659)
{code}

The test finishes with two asserts. This is the second assert that fails, 
YARN-5362 looked at a failure on the first of the two asserts



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5136) Error in handling event type APP_ATTEMPT_REMOVED to the scheduler

2016-11-16 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15671627#comment-15671627
 ] 

Wilfred Spiegelenburg commented on YARN-5136:
-

TestRMRestart#testFinishedAppRemovalAfterRMRestart failure is logged as 
YARN-5362 and closed as resolved. It looks like the change has not fixed it 
completely. Maybe a follow up needs to be logged for that.
TestTokenClientRMService#testCancelWithMultipleAppSubmissions failure is 
tracked in YARN-5816 and is not caused by this change.

Both tests pass in my local testing.

> Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
> -
>
> Key: YARN-5136
> URL: https://issues.apache.org/jira/browse/YARN-5136
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-5136.1.patch
>
>
> move app cause rm exit
> {noformat}
> 2016-05-24 23:20:47,202 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_REMOVED to the scheduler
> java.lang.IllegalStateException: Given app to remove 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt@ea94c3b
>  does not exist in queue [root.bdp_xx.bdp_mart_xx_formal, 
> demand=, running= vCores:13422>, share=, w= weight=1.0>]
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:779)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1231)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:680)
> at java.lang.Thread.run(Thread.java:745)
> 2016-05-24 23:20:47,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e04_1464073905025_15410_01_001759 Container Transitioned from 
> ACQUIRED to RELEASED
> 2016-05-24 23:20:47,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5136) Error in handling event type APP_ATTEMPT_REMOVED to the scheduler

2016-11-16 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-5136:

Attachment: YARN-5136.1.patch

> Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
> -
>
> Key: YARN-5136
> URL: https://issues.apache.org/jira/browse/YARN-5136
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-5136.1.patch
>
>
> move app cause rm exit
> {noformat}
> 2016-05-24 23:20:47,202 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_REMOVED to the scheduler
> java.lang.IllegalStateException: Given app to remove 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt@ea94c3b
>  does not exist in queue [root.bdp_xx.bdp_mart_xx_formal, 
> demand=, running= vCores:13422>, share=, w= weight=1.0>]
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:779)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1231)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:680)
> at java.lang.Thread.run(Thread.java:745)
> 2016-05-24 23:20:47,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e04_1464073905025_15410_01_001759 Container Transitioned from 
> ACQUIRED to RELEASED
> 2016-05-24 23:20:47,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5136) Error in handling event type APP_ATTEMPT_REMOVED to the scheduler

2016-11-15 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668678#comment-15668678
 ] 

Wilfred Spiegelenburg commented on YARN-5136:
-

I was thrown of track a bit with all the changes that were made to the locking 
in the scheduler in YARN-3139.

After analysis it shows that the issue is not resolved yet and we have two 
situations that can cause a the above mentioned problem:
# if a call for a {{removeApplicationAttempt}} and a {{moveApplication}} for 
the same attempt are processed in that order in short succession the 
application attempt will still contain a queue reference but is already removed 
from the list of applications for the queue
# if two calls to {{removeApplicationAttempt}} come in in short succession the 
application will still contain a queue reference but is already removed from 
the list of applications for the queue

In both cases the 2nd call must come in before the {{removeApplication}} call 
is made.

> Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
> -
>
> Key: YARN-5136
> URL: https://issues.apache.org/jira/browse/YARN-5136
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: Wilfred Spiegelenburg
>
> move app cause rm exit
> {noformat}
> 2016-05-24 23:20:47,202 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_REMOVED to the scheduler
> java.lang.IllegalStateException: Given app to remove 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt@ea94c3b
>  does not exist in queue [root.bdp_xx.bdp_mart_xx_formal, 
> demand=, running= vCores:13422>, share=, w= weight=1.0>]
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:779)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1231)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:680)
> at java.lang.Thread.run(Thread.java:745)
> 2016-05-24 23:20:47,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e04_1464073905025_15410_01_001759 Container Transitioned from 
> ACQUIRED to RELEASED
> 2016-05-24 23:20:47,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5722) FairScheduler hides group resolution exceptions when assigning queue

2016-11-15 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15666594#comment-15666594
 ] 

Wilfred Spiegelenburg commented on YARN-5722:
-

TestTokenClientRMService failure is tracked in YARN-5875 and is not caused by 
this change.
As commented earlier: there is no new test because it only exposes the error 
message properly

> FairScheduler hides group resolution exceptions when assigning queue 
> -
>
> Key: YARN-5722
> URL: https://issues.apache.org/jira/browse/YARN-5722
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.6.5, 3.0.0-alpha1
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>  Labels: oct16-easy
> Attachments: YARN-5722.1.patch, YARN-5722.2.patch
>
>
> When a group based placement rule is used and the user does not have any 
> groups the reason for rejecting the application is hidden. An assignment will 
> fail as follows:
> {code}
>  
>  
> {code}
> The error logged on the client side:
> {code}
> 09/30 15:59:27 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/test_user/.staging/job_1475223610304_6043 
> 16/09/30 15:59:27 WARN security.UserGroupInformation: 
> PriviledgedActionException as:test_user (auth:SIMPLE) 
> cause:java.io.IOException: Failed to run job : Error assigning app to queue 
> default 
> java.io.IOException: Failed to run job : Error assigning app to queue default 
> at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301) 
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:244)
>  
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307) 
> {code}
> The {{default}} queue name is passed in as part of the application submission 
> and not really the queue that is tried.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5722) FairScheduler hides group resolution exceptions when assigning queue

2016-11-14 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-5722:

Attachment: YARN-5722.2.patch

After discussion with [~templedf] off-line we came to the conclusion that a 
slight change of the message makes sense. Since we do not know the queue we now 
say "Error assigning app to a queue: " to remove the ambiguity around the queue.

updating the patch with the agreed change

> FairScheduler hides group resolution exceptions when assigning queue 
> -
>
> Key: YARN-5722
> URL: https://issues.apache.org/jira/browse/YARN-5722
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.6.5, 3.0.0-alpha1
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>  Labels: oct16-easy
> Attachments: YARN-5722.1.patch, YARN-5722.2.patch
>
>
> When a group based placement rule is used and the user does not have any 
> groups the reason for rejecting the application is hidden. An assignment will 
> fail as follows:
> {code}
>  
>  
> {code}
> The error logged on the client side:
> {code}
> 09/30 15:59:27 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/test_user/.staging/job_1475223610304_6043 
> 16/09/30 15:59:27 WARN security.UserGroupInformation: 
> PriviledgedActionException as:test_user (auth:SIMPLE) 
> cause:java.io.IOException: Failed to run job : Error assigning app to queue 
> default 
> java.io.IOException: Failed to run job : Error assigning app to queue default 
> at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301) 
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:244)
>  
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307) 
> {code}
> The {{default}} queue name is passed in as part of the application submission 
> and not really the queue that is tried.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5722) FairScheduler hides group resolution exceptions when assigning queue

2016-11-13 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15662132#comment-15662132
 ] 

Wilfred Spiegelenburg commented on YARN-5722:
-

Sorry for the late reply. I was doing other internal work for a couple of days.

The queue has not been assigned yet because we have failed in one of the 
placement rules.

That means the queue as we know it at that point will be the passed in queue to 
the {{assingToQueue}} call. This will most likely be the "default" queue unless 
the user has passed in a queue name in the configuration when submitted. In 
neither case the queue name will add or explain anything and it might even be 
confusing since the queue is irrelevant.

Let me know if you still want to add the queue name to the error message.

> FairScheduler hides group resolution exceptions when assigning queue 
> -
>
> Key: YARN-5722
> URL: https://issues.apache.org/jira/browse/YARN-5722
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.6.5, 3.0.0-alpha1
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>  Labels: oct16-easy
> Attachments: YARN-5722.1.patch
>
>
> When a group based placement rule is used and the user does not have any 
> groups the reason for rejecting the application is hidden. An assignment will 
> fail as follows:
> {code}
>  
>  
> {code}
> The error logged on the client side:
> {code}
> 09/30 15:59:27 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/test_user/.staging/job_1475223610304_6043 
> 16/09/30 15:59:27 WARN security.UserGroupInformation: 
> PriviledgedActionException as:test_user (auth:SIMPLE) 
> cause:java.io.IOException: Failed to run job : Error assigning app to queue 
> default 
> java.io.IOException: Failed to run job : Error assigning app to queue default 
> at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301) 
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:244)
>  
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307) 
> {code}
> The {{default}} queue name is passed in as part of the application submission 
> and not really the queue that is tried.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5722) FairScheduler hides group resolution exceptions when assigning queue

2016-10-11 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567666#comment-15567666
 ] 

Wilfred Spiegelenburg commented on YARN-5722:
-

There is no new test because it only exposes the error message properly in the 
log and responses

> FairScheduler hides group resolution exceptions when assigning queue 
> -
>
> Key: YARN-5722
> URL: https://issues.apache.org/jira/browse/YARN-5722
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.6.5, 3.0.0-alpha1
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-5722.1.patch
>
>
> When a group based placement rule is used and the user does not have any 
> groups the reason for rejecting the application is hidden. An assignment will 
> fail as follows:
> {code}
>  
>  
> {code}
> The error logged on the client side:
> {code}
> 09/30 15:59:27 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/test_user/.staging/job_1475223610304_6043 
> 16/09/30 15:59:27 WARN security.UserGroupInformation: 
> PriviledgedActionException as:test_user (auth:SIMPLE) 
> cause:java.io.IOException: Failed to run job : Error assigning app to queue 
> default 
> java.io.IOException: Failed to run job : Error assigning app to queue default 
> at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301) 
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:244)
>  
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307) 
> {code}
> The {{default}} queue name is passed in as part of the application submission 
> and not really the queue that is tried.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5722) FairScheduler hides group resolution exceptions when assigning queue

2016-10-11 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-5722:

Issue Type: Improvement  (was: Bug)

> FairScheduler hides group resolution exceptions when assigning queue 
> -
>
> Key: YARN-5722
> URL: https://issues.apache.org/jira/browse/YARN-5722
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.6.5, 3.0.0-alpha1
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-5722.1.patch
>
>
> When a group based placement rule is used and the user does not have any 
> groups the reason for rejecting the application is hidden. An assignment will 
> fail as follows:
> {code}
>  
>  
> {code}
> The error logged on the client side:
> {code}
> 09/30 15:59:27 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/test_user/.staging/job_1475223610304_6043 
> 16/09/30 15:59:27 WARN security.UserGroupInformation: 
> PriviledgedActionException as:test_user (auth:SIMPLE) 
> cause:java.io.IOException: Failed to run job : Error assigning app to queue 
> default 
> java.io.IOException: Failed to run job : Error assigning app to queue default 
> at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301) 
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:244)
>  
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307) 
> {code}
> The {{default}} queue name is passed in as part of the application submission 
> and not really the queue that is tried.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5722) FairScheduler hides group resolution exceptions when assigning queue

2016-10-11 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-5722:

Attachment: YARN-5722.1.patch

Simple change to pass back the message from the IOException and not the 
meaningless queue name.

> FairScheduler hides group resolution exceptions when assigning queue 
> -
>
> Key: YARN-5722
> URL: https://issues.apache.org/jira/browse/YARN-5722
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.5, 3.0.0-alpha1
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-5722.1.patch
>
>
> When a group based placement rule is used and the user does not have any 
> groups the reason for rejecting the application is hidden. An assignment will 
> fail as follows:
> {code}
>  
>  
> {code}
> The error logged on the client side:
> {code}
> 09/30 15:59:27 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/test_user/.staging/job_1475223610304_6043 
> 16/09/30 15:59:27 WARN security.UserGroupInformation: 
> PriviledgedActionException as:test_user (auth:SIMPLE) 
> cause:java.io.IOException: Failed to run job : Error assigning app to queue 
> default 
> java.io.IOException: Failed to run job : Error assigning app to queue default 
> at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301) 
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:244)
>  
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307) 
> {code}
> The {{default}} queue name is passed in as part of the application submission 
> and not really the queue that is tried.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5722) FairScheduler hides group resolution exceptions when assigning queue

2016-10-11 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-5722:
---

 Summary: FairScheduler hides group resolution exceptions when 
assigning queue 
 Key: YARN-5722
 URL: https://issues.apache.org/jira/browse/YARN-5722
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 3.0.0-alpha1, 2.6.5
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


When a group based placement rule is used and the user does not have any groups 
the reason for rejecting the application is hidden. An assignment will fail as 
follows:

{code}
 
 
{code}

The error logged on the client side:
{code}
09/30 15:59:27 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
/user/test_user/.staging/job_1475223610304_6043 
16/09/30 15:59:27 WARN security.UserGroupInformation: 
PriviledgedActionException as:test_user (auth:SIMPLE) 
cause:java.io.IOException: Failed to run job : Error assigning app to queue 
default 
java.io.IOException: Failed to run job : Error assigning app to queue default 
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301) 
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:244)
 
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307) 
{code}

The {{default}} queue name is passed in as part of the application submission 
and not really the queue that is tried.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue

2016-10-04 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-5554:

Attachment: YARN-5554.9.patch

updated the text in the messages, it does make sense to include it not just in 
the message of the queue manager. Does the message look OK [~bibinchundatt] ?

> MoveApplicationAcrossQueues does not check user permission on the target queue
> --
>
> Key: YARN-5554
> URL: https://issues.apache.org/jira/browse/YARN-5554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-5554.2.patch, YARN-5554.3.patch, YARN-5554.4.patch, 
> YARN-5554.5.patch, YARN-5554.6.patch, YARN-5554.7.patch, YARN-5554.8.patch, 
> YARN-5554.9.patch
>
>
> moveApplicationAcrossQueues operation currently does not check user 
> permission on the target queue. This incorrectly allows one user to move 
> his/her own applications to a queue that the user has no access to



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue

2016-10-04 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-5554:

Attachment: YARN-5554.8.patch

Thanks [~kasha] I changed the return value to just a false instead of throwing 
and updated all the code that relies on it.

> MoveApplicationAcrossQueues does not check user permission on the target queue
> --
>
> Key: YARN-5554
> URL: https://issues.apache.org/jira/browse/YARN-5554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-5554.2.patch, YARN-5554.3.patch, YARN-5554.4.patch, 
> YARN-5554.5.patch, YARN-5554.6.patch, YARN-5554.7.patch, YARN-5554.8.patch
>
>
> moveApplicationAcrossQueues operation currently does not check user 
> permission on the target queue. This incorrectly allows one user to move 
> his/her own applications to a queue that the user has no access to



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-2093) Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP

2016-09-29 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YARN-2093.
-
Resolution: Duplicate

> Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP
> ---
>
> Key: YARN-2093
> URL: https://issues.apache.org/jira/browse/YARN-2093
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.1
>Reporter: Jon Bringhurst
>Assignee: Wilfred Spiegelenburg
>
> After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup:
> {noformat}
> 21:19:34,308  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0003_09 State change from SUBMITTED to SCHEDULED
> 21:19:34,309  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0004_08 State change from SUBMITTED to SCHEDULED
> 21:19:34,310  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0003_10 State change from SUBMITTED to SCHEDULED
> 21:19:34,310  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0003_11 State change from SUBMITTED to SCHEDULED
> 21:19:34,317  INFO FairScheduler:673 - Added Application Attempt 
> appattempt_1400092144371_0004_09 to scheduler from user: 
> samza-perf-playground
> 21:19:34,318  INFO FairScheduler:673 - Added Application Attempt 
> appattempt_1400092144371_0004_10 to scheduler from user: 
> samza-perf-playground
> 21:19:34,318  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0004_09 State change from SUBMITTED to SCHEDULED
> 21:19:34,318  INFO FairScheduler:733 - Application 
> appattempt_1400092144371_0003_05 is done. finalState=FAILED
> 21:19:34,319  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0004_10 State change from SUBMITTED to SCHEDULED
> 21:19:34,319  INFO AppSchedulingInfo:108 - Application 
> application_1400092144371_0003 requests cleared
> 21:19:34,319  INFO FairScheduler:673 - Added Application Attempt 
> appattempt_1400092144371_0004_11 to scheduler from user: 
> samza-perf-playground
> 21:19:34,320  INFO FairScheduler:733 - Application 
> appattempt_1400092144371_0003_06 is done. finalState=FAILED
> 21:19:34,320  INFO AppSchedulingInfo:108 - Application 
> application_1400092144371_0003 requests cleared
> 21:19:34,320  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0004_11 State change from SUBMITTED to SCHEDULED
> 21:19:34,323 FATAL ResourceManager:600 - Error in handling event type 
> APP_ATTEMPT_REMOVED to the scheduler
> java.lang.IllegalStateException: Given app to remove 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@429f809d
>  does not exist in queue [root.samza-perf-playground, demand= vCores:0>, running=, share=, 
> w=]
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:93)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:774)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1201)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>   at java.lang.Thread.run(Thread.java:744)
> 21:19:34,330  INFO ResourceManager:604 - Exiting, bbye..
> 21:19:34,335  INFO log:67 - Stopped SelectChannelConnector@:8088
> 21:19:34,437  INFO Server:2398 - Stopping server on 8033
> 21:19:34,438  INFO Server:694 - Stopping IPC Server listener on 8033
> {noformat}
> Last commit message for this build is (branch-2.4 on 
> github.com/apache/hadoop-common):
> {noformat}
> commit 09e24d5519187c0db67aacc1992be5d43829aa1e
> Author: Arpit Agarwal 
> Date:   Tue May 20 20:18:46 2014 +
> HADOOP-10562. Fix CHANGES.txt entry again
> 
> git-svn-id: 
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1596389 
> 13f79535-47bb-0310-9956-ffa450edef68
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2093) Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP

2016-09-29 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532508#comment-15532508
 ] 

Wilfred Spiegelenburg commented on YARN-2093:
-

This looks like a duplicate of YARN-5136.
I will provide a fix for this through that new jira.

> Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP
> ---
>
> Key: YARN-2093
> URL: https://issues.apache.org/jira/browse/YARN-2093
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.1
>Reporter: Jon Bringhurst
>Assignee: Wilfred Spiegelenburg
>
> After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup:
> {noformat}
> 21:19:34,308  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0003_09 State change from SUBMITTED to SCHEDULED
> 21:19:34,309  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0004_08 State change from SUBMITTED to SCHEDULED
> 21:19:34,310  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0003_10 State change from SUBMITTED to SCHEDULED
> 21:19:34,310  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0003_11 State change from SUBMITTED to SCHEDULED
> 21:19:34,317  INFO FairScheduler:673 - Added Application Attempt 
> appattempt_1400092144371_0004_09 to scheduler from user: 
> samza-perf-playground
> 21:19:34,318  INFO FairScheduler:673 - Added Application Attempt 
> appattempt_1400092144371_0004_10 to scheduler from user: 
> samza-perf-playground
> 21:19:34,318  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0004_09 State change from SUBMITTED to SCHEDULED
> 21:19:34,318  INFO FairScheduler:733 - Application 
> appattempt_1400092144371_0003_05 is done. finalState=FAILED
> 21:19:34,319  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0004_10 State change from SUBMITTED to SCHEDULED
> 21:19:34,319  INFO AppSchedulingInfo:108 - Application 
> application_1400092144371_0003 requests cleared
> 21:19:34,319  INFO FairScheduler:673 - Added Application Attempt 
> appattempt_1400092144371_0004_11 to scheduler from user: 
> samza-perf-playground
> 21:19:34,320  INFO FairScheduler:733 - Application 
> appattempt_1400092144371_0003_06 is done. finalState=FAILED
> 21:19:34,320  INFO AppSchedulingInfo:108 - Application 
> application_1400092144371_0003 requests cleared
> 21:19:34,320  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0004_11 State change from SUBMITTED to SCHEDULED
> 21:19:34,323 FATAL ResourceManager:600 - Error in handling event type 
> APP_ATTEMPT_REMOVED to the scheduler
> java.lang.IllegalStateException: Given app to remove 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@429f809d
>  does not exist in queue [root.samza-perf-playground, demand= vCores:0>, running=, share=, 
> w=]
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:93)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:774)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1201)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>   at java.lang.Thread.run(Thread.java:744)
> 21:19:34,330  INFO ResourceManager:604 - Exiting, bbye..
> 21:19:34,335  INFO log:67 - Stopped SelectChannelConnector@:8088
> 21:19:34,437  INFO Server:2398 - Stopping server on 8033
> 21:19:34,438  INFO Server:694 - Stopping IPC Server listener on 8033
> {noformat}
> Last commit message for this build is (branch-2.4 on 
> github.com/apache/hadoop-common):
> {noformat}
> commit 09e24d5519187c0db67aacc1992be5d43829aa1e
> Author: Arpit Agarwal 
> Date:   Tue May 20 20:18:46 2014 +
> HADOOP-10562. Fix CHANGES.txt entry again
> 
> git-svn-id: 
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1596389 
> 13f79535-47bb-0310-9956-ffa450edef68
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-2093) Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP

2016-09-29 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg reassigned YARN-2093:
---

Assignee: Wilfred Spiegelenburg

> Fair Scheduler IllegalStateException after upgrade from 2.2.0 to 2.4.1-SNAP
> ---
>
> Key: YARN-2093
> URL: https://issues.apache.org/jira/browse/YARN-2093
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.1
>Reporter: Jon Bringhurst
>Assignee: Wilfred Spiegelenburg
>
> After upgrading from 2.2.0 to 2.4.1-SNAP, I ran into the following on startup:
> {noformat}
> 21:19:34,308  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0003_09 State change from SUBMITTED to SCHEDULED
> 21:19:34,309  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0004_08 State change from SUBMITTED to SCHEDULED
> 21:19:34,310  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0003_10 State change from SUBMITTED to SCHEDULED
> 21:19:34,310  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0003_11 State change from SUBMITTED to SCHEDULED
> 21:19:34,317  INFO FairScheduler:673 - Added Application Attempt 
> appattempt_1400092144371_0004_09 to scheduler from user: 
> samza-perf-playground
> 21:19:34,318  INFO FairScheduler:673 - Added Application Attempt 
> appattempt_1400092144371_0004_10 to scheduler from user: 
> samza-perf-playground
> 21:19:34,318  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0004_09 State change from SUBMITTED to SCHEDULED
> 21:19:34,318  INFO FairScheduler:733 - Application 
> appattempt_1400092144371_0003_05 is done. finalState=FAILED
> 21:19:34,319  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0004_10 State change from SUBMITTED to SCHEDULED
> 21:19:34,319  INFO AppSchedulingInfo:108 - Application 
> application_1400092144371_0003 requests cleared
> 21:19:34,319  INFO FairScheduler:673 - Added Application Attempt 
> appattempt_1400092144371_0004_11 to scheduler from user: 
> samza-perf-playground
> 21:19:34,320  INFO FairScheduler:733 - Application 
> appattempt_1400092144371_0003_06 is done. finalState=FAILED
> 21:19:34,320  INFO AppSchedulingInfo:108 - Application 
> application_1400092144371_0003 requests cleared
> 21:19:34,320  INFO RMAppAttemptImpl:659 - 
> appattempt_1400092144371_0004_11 State change from SUBMITTED to SCHEDULED
> 21:19:34,323 FATAL ResourceManager:600 - Error in handling event type 
> APP_ATTEMPT_REMOVED to the scheduler
> java.lang.IllegalStateException: Given app to remove 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp@429f809d
>  does not exist in queue [root.samza-perf-playground, demand= vCores:0>, running=, share=, 
> w=]
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:93)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:774)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1201)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
>   at java.lang.Thread.run(Thread.java:744)
> 21:19:34,330  INFO ResourceManager:604 - Exiting, bbye..
> 21:19:34,335  INFO log:67 - Stopped SelectChannelConnector@:8088
> 21:19:34,437  INFO Server:2398 - Stopping server on 8033
> 21:19:34,438  INFO Server:694 - Stopping IPC Server listener on 8033
> {noformat}
> Last commit message for this build is (branch-2.4 on 
> github.com/apache/hadoop-common):
> {noformat}
> commit 09e24d5519187c0db67aacc1992be5d43829aa1e
> Author: Arpit Agarwal 
> Date:   Tue May 20 20:18:46 2014 +
> HADOOP-10562. Fix CHANGES.txt entry again
> 
> git-svn-id: 
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1596389 
> 13f79535-47bb-0310-9956-ffa450edef68
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5136) Error in handling event type APP_ATTEMPT_REMOVED to the scheduler

2016-09-29 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15531990#comment-15531990
 ] 

Wilfred Spiegelenburg commented on YARN-5136:
-

Hi [~tangshangwen] do you mind if I assign this to myself? I have just run into 
the same issue and would like to provide a fix for this.

> Error in handling event type APP_ATTEMPT_REMOVED to the scheduler
> -
>
> Key: YARN-5136
> URL: https://issues.apache.org/jira/browse/YARN-5136
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: tangshangwen
>Assignee: tangshangwen
>
> move app cause rm exit
> {noformat}
> 2016-05-24 23:20:47,202 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_REMOVED to the scheduler
> java.lang.IllegalStateException: Given app to remove 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt@ea94c3b
>  does not exist in queue [root.bdp_xx.bdp_mart_xx_formal, 
> demand=, running= vCores:13422>, share=, w= weight=1.0>]
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.removeApp(FSLeafQueue.java:119)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:779)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1231)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:114)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:680)
> at java.lang.Thread.run(Thread.java:745)
> 2016-05-24 23:20:47,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e04_1464073905025_15410_01_001759 Container Transitioned from 
> ACQUIRED to RELEASED
> 2016-05-24 23:20:47,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue

2016-09-27 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-5554:

Attachment: YARN-5554.7.patch

Forgot to check the newly created java doc based on the feedback, patch updated 
to fix it

> MoveApplicationAcrossQueues does not check user permission on the target queue
> --
>
> Key: YARN-5554
> URL: https://issues.apache.org/jira/browse/YARN-5554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-5554.2.patch, YARN-5554.3.patch, YARN-5554.4.patch, 
> YARN-5554.5.patch, YARN-5554.6.patch, YARN-5554.7.patch
>
>
> moveApplicationAcrossQueues operation currently does not check user 
> permission on the target queue. This incorrectly allows one user to move 
> his/her own applications to a queue that the user has no access to



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue

2016-09-27 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-5554:

Attachment: YARN-5554.6.patch

[~yufeigu] updated the patch with the feedback, all 3 points have been 
integrated in the new patch

> MoveApplicationAcrossQueues does not check user permission on the target queue
> --
>
> Key: YARN-5554
> URL: https://issues.apache.org/jira/browse/YARN-5554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-5554.2.patch, YARN-5554.3.patch, YARN-5554.4.patch, 
> YARN-5554.5.patch, YARN-5554.6.patch
>
>
> moveApplicationAcrossQueues operation currently does not check user 
> permission on the target queue. This incorrectly allows one user to move 
> his/her own applications to a queue that the user has no access to



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue

2016-09-26 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523246#comment-15523246
 ] 

Wilfred Spiegelenburg commented on YARN-5554:
-

The test failure is logged as YARN-5043 and i snot related to the changes made

> MoveApplicationAcrossQueues does not check user permission on the target queue
> --
>
> Key: YARN-5554
> URL: https://issues.apache.org/jira/browse/YARN-5554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-5554.2.patch, YARN-5554.3.patch, YARN-5554.4.patch, 
> YARN-5554.5.patch
>
>
> moveApplicationAcrossQueues operation currently does not check user 
> permission on the target queue. This incorrectly allows one user to move 
> his/her own applications to a queue that the user has no access to



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5554) MoveApplicationAcrossQueues does not check user permission on the target queue

2016-09-26 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523246#comment-15523246
 ] 

Wilfred Spiegelenburg edited comment on YARN-5554 at 9/26/16 2:38 PM:
--

The test failure is logged as YARN-5043 and is not related to the changes made


was (Author: wilfreds):
The test failure is logged as YARN-5043 and i snot related to the changes made

> MoveApplicationAcrossQueues does not check user permission on the target queue
> --
>
> Key: YARN-5554
> URL: https://issues.apache.org/jira/browse/YARN-5554
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-5554.2.patch, YARN-5554.3.patch, YARN-5554.4.patch, 
> YARN-5554.5.patch
>
>
> moveApplicationAcrossQueues operation currently does not check user 
> permission on the target queue. This incorrectly allows one user to move 
> his/her own applications to a queue that the user has no access to



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5674) FairScheduler handles "dots" in user names inconsistently in the config

2016-09-26 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-5674:
---

 Summary: FairScheduler handles "dots" in user names inconsistently 
in the config
 Key: YARN-5674
 URL: https://issues.apache.org/jira/browse/YARN-5674
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


A user name can contain a dot because it could be used as the queue name we 
replace the dot with a defined separator. When defining queues in the 
configuration for users containing a dot we expect that the dot is replaced by 
the "\_dot\_" string.
In the user limits we do not do that and user limits need a normal dot in the 
user name. This is confusing when you create a scheduler configuration in some 
places you need to replace in others you do not. This can cause issue when user 
limits are not enforced as expected.

We should use one way to specify the user and since the queue naming can not be 
changed we should also use the same "\_dot\_" in the user limits and enforce 
correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5672) FairScheduler: wrong queue name in log when adding application

2016-09-26 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-5672:

Attachment: YARN-5672.1.patch

> FairScheduler: wrong queue name in log when adding application
> --
>
> Key: YARN-5672
> URL: https://issues.apache.org/jira/browse/YARN-5672
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
> Attachments: YARN-5672.1.patch
>
>
> The FairScheduler logs the passed in queue name when adding an application  
> instead of the queue returned by the policy. Later log entries show the 
> correct info:
> {code}
> INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Accepted application application_1471982804173_6181 from user: wilfred, in 
> queue: default, currently num of applications: 1
> ...
> INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary:
>  
> appId=application_1471982804173_6181,name=oozie:launcher:XXX,user=wilfred,queue=root.wilfred,state=FAILED,trackingUrl=https://10.10.10.10:8088/cluster/app/application_1471982804173_6181,appMasterHost=N/A,startTime=1473580802079,finishTime=1473580809148,finalStatus=FAILED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5672) FairScheduler: wrong queue name in log when adding application

2016-09-26 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-5672:
---

 Summary: FairScheduler: wrong queue name in log when adding 
application
 Key: YARN-5672
 URL: https://issues.apache.org/jira/browse/YARN-5672
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg
Priority: Minor


The FairScheduler logs the passed in queue name when adding an application  
instead of the queue returned by the policy. Later log entries show the correct 
info:
{code}
INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Accepted application application_1471982804173_6181 from user: wilfred, in 
queue: default, currently num of applications: 1
...
INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: 
appId=application_1471982804173_6181,name=oozie:launcher:XXX,user=wilfred,queue=root.wilfred,state=FAILED,trackingUrl=https://10.10.10.10:8088/cluster/app/application_1471982804173_6181,appMasterHost=N/A,startTime=1473580802079,finishTime=1473580809148,finalStatus=FAILED
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



<    1   2   3   4   5   6   >