[jira] [Commented] (YARN-4654) Yarn node label CLI should parse "=" correctly when trying to remove all labels on a node

2016-02-15 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148101#comment-15148101
 ] 

Rohith Sharma K S commented on YARN-4654:
-

+1 for the latest patch. I will wait couple of days before committing it.

> Yarn node label CLI should parse "=" correctly when trying to remove all 
> labels on a node
> -
>
> Key: YARN-4654
> URL: https://issues.apache.org/jira/browse/YARN-4654
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-4654.v1.001.patch, YARN-4654.v1.002.patch, 
> YARN-4654.v1.003.patch
>
>
> Currently, when adding labels to nodes, user can run:
> {{yarn rmadmin -replaceLabelsOnNode "host1=x host2=y"}}
> However, when removing labels from a node, user has to run:
> {{yarn rmadmin -replaceLabelsOnNode "host1 host2"}}
> Instead of:
> {{yarn rmadmin -replaceLabelsOnNode "host1= host2="}}
> We should handle both of "=" exists/not-exists case when removing labels on a 
> node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4412) Create ClusterMonitor to compute ordered list of preferred NMs for OPPORTUNITIC containers

2016-02-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148062#comment-15148062
 ] 

Hadoop QA commented on YARN-4412:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} 
| {color:red} YARN-4412 does not apply to yarn-2877. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12782131/YARN-4412-yarn-2877.v2.patch
 |
| JIRA Issue | YARN-4412 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10568/console |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Create ClusterMonitor to compute ordered list of preferred NMs for 
> OPPORTUNITIC containers
> --
>
> Key: YARN-4412
> URL: https://issues.apache.org/jira/browse/YARN-4412
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-4412-yarn-2877.v1.patch, 
> YARN-4412-yarn-2877.v2.patch
>
>
> Introduce a Cluster Monitor that aggregates load information from individual 
> Node Managers and computes an ordered list of preferred Node managers to be 
> used as target Nodes for OPPORTUNISTIC container allocations. 
> This list can be pushed out to the Node Manager (specifically the AMRMProxy 
> running on the Node) via the Allocate Response. This will be used to make 
> local Scheduling decisions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2887) AM policies for choosing type of containers

2016-02-15 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh resolved YARN-2887.
---
Resolution: Not A Problem

This is more an AM / application specific change and should not be part of core 
YARN

> AM policies for choosing type of containers
> ---
>
> Key: YARN-2887
> URL: https://issues.apache.org/jira/browse/YARN-2887
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>
> Each AM can employ policies that determine what type of container 
> (guaranteed-start or queueable) should be requested for each task. 
> An example policy may be to use only guaranteed-start or only queueable 
> containers, or to randomly pick a percentage of the requests to be queueable, 
> or to choose the container type based on the characteristics of the tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4693) FSNamesystem class GetCorruptFiles function in the catch log description is not accurate.

2016-02-15 Thread zhutengyu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147985#comment-15147985
 ] 

zhutengyu commented on YARN-4693:
-

I have re created.
https://issues.apache.org/jira/browse/HDFS-9811

> FSNamesystem class GetCorruptFiles function in the catch log description is 
> not accurate.
> -
>
> Key: YARN-4693
> URL: https://issues.apache.org/jira/browse/YARN-4693
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Affects Versions: 2.7.0
>Reporter: zhutengyu
>
>  LOG.warn("Get corrupt file blocks returned error: " + e.getMessage());
> "Error" key to the location of the key staff to mislead, intends to replace 
> the "Resoult" keyword



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4693) FSNamesystem class GetCorruptFiles function in the catch log description is not accurate.

2016-02-15 Thread zhutengyu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147986#comment-15147986
 ] 

zhutengyu commented on YARN-4693:
-

I have re created.
https://issues.apache.org/jira/browse/HDFS-9811

> FSNamesystem class GetCorruptFiles function in the catch log description is 
> not accurate.
> -
>
> Key: YARN-4693
> URL: https://issues.apache.org/jira/browse/YARN-4693
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Affects Versions: 2.7.0
>Reporter: zhutengyu
>
>  LOG.warn("Get corrupt file blocks returned error: " + e.getMessage());
> "Error" key to the location of the key staff to mislead, intends to replace 
> the "Resoult" keyword



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4648) Move preemption related tests from TestFairScheduler to TestFairSchedulerPreemption

2016-02-15 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147930#comment-15147930
 ] 

Tsuyoshi Ozawa commented on YARN-4648:
--

[~kaisasak] Instead of changing the sequence of initialization, how about 
changing the name of {{startResourceManagerWithoutThreshold}}? I think the name 
of {{startResourceManagerWithoutThreshold}} looks confusing since 
the behaviour of the method named {{startResourceManagerWithoutThreshold()}} 
looks to be equals to startResourceManager(1.1f). What do you think?

> Move preemption related tests from TestFairScheduler to 
> TestFairSchedulerPreemption
> ---
>
> Key: YARN-4648
> URL: https://issues.apache.org/jira/browse/YARN-4648
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Kai Sasaki
>  Labels: newbie++
> Attachments: YARN-4648.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4545) Allow YARN distributed shell to use ATS v1.5 APIs

2016-02-15 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147706#comment-15147706
 ] 

Steve Loughran commented on YARN-4545:
--

the new {{hadoop-yarn-server-timeline-pluginstorage}} JAR needs to be a 
dependency of yarn-client if it is required client-side. Why? So that things 
who declare that as their dependency will automatically get it up. Once this is 
done, there's no need to add the new plugin dependency to dshell's Pom. If it's 
only test-time, then miniyarncluster will probably need it instead.

we know a new ATS version is coming in 2.x. Are there already plans for the API 
to change? What about the probe 
{{YarnConfiguration.timelineServiceV1_5Enabled}}? As it stands, that probe 
returns true for any version >= 1.5. Is that an implicit guarantee that ATS v3 
will still support the 1.5 client APIs?

The predicate is telling you something the timeline client knows, currently via 
the config, possibly in future after authenticating with the server.

I'd place the method into {{TimelineClient}} itself. 

then you could go

{code}
if (timelineClient.supportsApiV1_5()) {
  ...
 }
{code}

* The code in {{TestDistributedShell.setupInternal()}} to set up ATS v1.5 looks 
like some boilerplate that every app testing against
ATS will need. This should be made re-usable, either in MiniYarnCluster or 
nearby.

* The {{TestDistributedShell.isTestWithTimelineV1_5()}} is not a bit of a hack, 
as it now adds some magic behaviour to test classes, depending on their
name...this is precisely the thing that JUnit 4 moved off. I can't suggest an 
alternative that isn't more complex (e.g. a private annotation)

some general {{TestDistributedShell}} comments, which are probably time to 
address

* drop the per test timeout and add a single Rule for timeout; you may as well 
consolidate things
* the distributed shell should run in a thread with a name, for ease of log4j 
analysis
* if that thread fails to set things up, it will throw an exception, but that 
exception won't propagate into the test thread/test report. it needs to be 
written to a field of the test suite, then, after the t.join(), raised if 
non-null
* the {{Assert.assertTrue}} call above {{t.join()}} should then go after that 
check& rethrow, so if there is some dshell failure, the test runner waits for 
the thread to complete before failing (this change should guarantee that the 
thread of one test case completes before the next test case is started)..
* the checks for text in exceptions should move to 
{{GenericTestUtils.assertExceptionContains}}, as that will not lose the 
original exception message or data


> Allow YARN distributed shell to use ATS v1.5 APIs
> -
>
> Key: YARN-4545
> URL: https://issues.apache.org/jira/browse/YARN-4545
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4545-YARN-4265.001.patch, 
> YARN-4545-trunk.001.patch, YARN-4545-trunk.002.patch, 
> YARN-4545-trunk.003.patch, YARN-4545-trunk.004.patch, 
> YARN-4545-trunk.005.patch
>
>
> We can use YARN distributed shell as a demo for the ATS v1.5 APIs. We need to 
> allow distributed shell post data with ATS v1.5 API if 1.5 is enabled in the 
> system. We also need to provide a sample plugin to read those data out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request

2016-02-15 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147605#comment-15147605
 ] 

Sunil G commented on YARN-4108:
---

Hi [~leftnoteasy]
Thanks for sharing full version of updated patch, really appreciate the 
efforts. I have few dbts/comments in this patch. Pls help to check the same. :-)

1.
{{updateResToObtainAndKillableContainers}} tries to raise cancel preemption 
back to scheduler if container resource is empty. So could we also remove from 
{{resToObtainByPartition}} here?
{code}
+if (!partitionsHasResToObtain.contains(partition)) {
+  // When we don't need to preempt any resources from the 
queue/partition
+  // , cancel all killable containers from the queue
+  rmContext.getDispatcher().getEventHandler().handle(
+  new ContainerPreemptEvent(
+  killableContainer.getApplicationAttemptId(),
+  killableContainer,
+  SchedulerEventType.MARK_CONTAINER_FOR_NONKILLABLE))
{code}

2. If {{conf.getLazyPreemptionEnabled()}} is disabled, do we need to handle 
events like MARK_CONTAINER_FOR_NONKILLABLE in scheduler?

3. {{cleanupCompletedKillableContainers()}} in 
ProportionalCapacityPreemptionPolicy operates on full live containers in 
cluster. And this can be a very big number, so we may hold the concurrent map's 
access on each iteration of preemption policy. Will this be a pblm? Can we 
optimize this further. For eg, could we set a boolean state within RMContainer 
to indicate its not a live container?

4. {{killToPreemptContainers}} in LeafQueue invokes completedContainer on 
application's leafQueue. I am thinking on some corner case where one 
{{LeafQueue#killToPreemptContainers}} invocation results in a call to 
completedContainer of another leafQueue. Do you see some chance for same, may 
be a moveQueue operation?


> CapacityScheduler: Improve preemption to preempt only those containers that 
> would satisfy the incoming request
> --
>
> Key: YARN-4108
> URL: https://issues.apache.org/jira/browse/YARN-4108
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4108-design-doc-V3.pdf, 
> YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, YARN-4108.1.patch, 
> YARN-4108.2.patch, YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch, 
> YARN-4108.poc.3-WIP.patch, YARN-4108.poc.4-WIP.patch
>
>
> This is sibling JIRA for YARN-2154. We should make sure container preemption 
> is more effective.
> *Requirements:*:
> 1) Can handle case of user-limit preemption
> 2) Can handle case of resource placement requirements, such as: hard-locality 
> (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I 
> don't want to use rack1 and host\[1-3\])
> 3) Can handle preemption within a queue: cross user preemption (YARN-2113), 
> cross applicaiton preemption (such as priority-based (YARN-1963) / 
> fairness-based (YARN-3319)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4694) Document ATS v1.5

2016-02-15 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-4694:

Affects Version/s: 2.8.0

> Document ATS v1.5
> -
>
> Key: YARN-4694
> URL: https://issues.apache.org/jira/browse/YARN-4694
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4501) Document new put APIs in TimelineClient for ATS 1.5

2016-02-15 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147594#comment-15147594
 ] 

Xuan Gong commented on YARN-4501:
-

Will document the new write APIs in 
https://issues.apache.org/jira/browse/YARN-4694.
Close this as duplicate

> Document new put APIs in TimelineClient for ATS 1.5
> ---
>
> Key: YARN-4501
> URL: https://issues.apache.org/jira/browse/YARN-4501
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Junping Du
>Assignee: Xuan Gong
>
> In YARN-4234, we are adding new put APIs in TimelineClient, we should 
> document it properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4501) Document new put APIs in TimelineClient for ATS 1.5

2016-02-15 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong resolved YARN-4501.
-
Resolution: Duplicate

> Document new put APIs in TimelineClient for ATS 1.5
> ---
>
> Key: YARN-4501
> URL: https://issues.apache.org/jira/browse/YARN-4501
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Junping Du
>Assignee: Xuan Gong
>
> In YARN-4234, we are adding new put APIs in TimelineClient, we should 
> document it properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4694) Document ATS v1.5

2016-02-15 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-4694:
---

 Summary: Document ATS v1.5
 Key: YARN-4694
 URL: https://issues.apache.org/jira/browse/YARN-4694
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4654) Yarn node label CLI should parse "=" correctly when trying to remove all labels on a node

2016-02-15 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147455#comment-15147455
 ] 

Naganarasimha G R commented on YARN-4654:
-

YARN-4351 captures the test case failures for 
{{hadoop.yarn.client.TestGetGroups}} and other timed out testcases are also 
related to the build machine so i think the jenkins report is fine!, 
[~rohithsharma]/[~wangda], can one of you have a look at the latest patch ?

> Yarn node label CLI should parse "=" correctly when trying to remove all 
> labels on a node
> -
>
> Key: YARN-4654
> URL: https://issues.apache.org/jira/browse/YARN-4654
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-4654.v1.001.patch, YARN-4654.v1.002.patch, 
> YARN-4654.v1.003.patch
>
>
> Currently, when adding labels to nodes, user can run:
> {{yarn rmadmin -replaceLabelsOnNode "host1=x host2=y"}}
> However, when removing labels from a node, user has to run:
> {{yarn rmadmin -replaceLabelsOnNode "host1 host2"}}
> Instead of:
> {{yarn rmadmin -replaceLabelsOnNode "host1= host2="}}
> We should handle both of "=" exists/not-exists case when removing labels on a 
> node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4680) TimerTasks leak in ATS V1.5 Writer

2016-02-15 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147385#comment-15147385
 ] 

Steve Loughran commented on YARN-4680:
--

LGTM, though needs an entry in yarn-defaults.xml for the documentation

> TimerTasks leak in ATS V1.5 Writer
> --
>
> Key: YARN-4680
> URL: https://issues.apache.org/jira/browse/YARN-4680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4680.1.patch, YARN-4680.20160108.patch, 
> YARN-4680.20160109.patch
>
>
> We have seen TimerTasks leak which could cause application server done (such 
> as oozie server done due to too many active threads)
> Although we have fixed some potentially leak situations in upper application 
> level, such as
> https://issues.apache.org/jira/browse/MAPREDUCE-6618
> https://issues.apache.org/jira/browse/MAPREDUCE-6621, we still can not 
> guarantee that we fixed the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4545) Allow YARN distributed shell to use ATS v1.5 APIs

2016-02-15 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147374#comment-15147374
 ] 

Steve Loughran commented on YARN-4545:
--

Or to put it differently, "why do you need to use UGI here at all. In an 
insecure cluster it is unimportant, and in a secure cluster, the login user 
will have the relevant credentials.

> Allow YARN distributed shell to use ATS v1.5 APIs
> -
>
> Key: YARN-4545
> URL: https://issues.apache.org/jira/browse/YARN-4545
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4545-YARN-4265.001.patch, 
> YARN-4545-trunk.001.patch, YARN-4545-trunk.002.patch, 
> YARN-4545-trunk.003.patch, YARN-4545-trunk.004.patch, 
> YARN-4545-trunk.005.patch
>
>
> We can use YARN distributed shell as a demo for the ATS v1.5 APIs. We need to 
> allow distributed shell post data with ATS v1.5 API if 1.5 is enabled in the 
> system. We also need to provide a sample plugin to read those data out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4545) Allow YARN distributed shell to use ATS v1.5 APIs

2016-02-15 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147356#comment-15147356
 ] 

Steve Loughran commented on YARN-4545:
--

{publishContainerEndEvent}} doesn't try to put the event in a {{ugi.doAs}} 
clause, even though the container start event is put this way. Why the 
difference?

> Allow YARN distributed shell to use ATS v1.5 APIs
> -
>
> Key: YARN-4545
> URL: https://issues.apache.org/jira/browse/YARN-4545
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4545-YARN-4265.001.patch, 
> YARN-4545-trunk.001.patch, YARN-4545-trunk.002.patch, 
> YARN-4545-trunk.003.patch, YARN-4545-trunk.004.patch, 
> YARN-4545-trunk.005.patch
>
>
> We can use YARN distributed shell as a demo for the ATS v1.5 APIs. We need to 
> allow distributed shell post data with ATS v1.5 API if 1.5 is enabled in the 
> system. We also need to provide a sample plugin to read those data out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3344) procfs stat file is not in the expected format warning

2016-02-15 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147290#comment-15147290
 ] 

Akira AJISAKA commented on YARN-3344:
-

Hi [~ravindra.naik], how is this issue going? If you don't have time to update 
the patch, I'd like to take over your work.

> procfs stat file is not in the expected format warning
> --
>
> Key: YARN-3344
> URL: https://issues.apache.org/jira/browse/YARN-3344
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jon Bringhurst
>Assignee: Ravindra Kumar Naik
> Attachments: YARN-3344-trunk.005.patch
>
>
> Although this doesn't appear to be causing any functional issues, it is 
> spamming our log files quite a bit. :)
> It appears that the regex in ProcfsBasedProcessTree doesn't work for all 
> /proc//stat files.
> Here's the error I'm seeing:
> {noformat}
> "source_host": "asdf",
> "method": "constructProcessInfo",
> "level": "WARN",
> "message": "Unexpected: procfs stat file is not in the expected format 
> for process with pid 6953"
> "file": "ProcfsBasedProcessTree.java",
> "line_number": "514",
> "class": "org.apache.hadoop.yarn.util.ProcfsBasedProcessTree",
> {noformat}
> And here's the basic info on process with pid 6953:
> {noformat}
> [asdf ~]$ cat /proc/6953/stat
> 6953 (python2.6 /expo) S 1871 1871 1871 0 -1 4202496 9364 1080 0 0 25 3 0 0 
> 20 0 1 0 144918696 205295616 5856 18446744073709551615 1 1 0 0 0 0 0 16781312 
> 2 18446744073709551615 0 0 17 13 0 0 0 0 0
> [asdf ~]$ ps aux|grep 6953
> root  6953  0.0  0.0 200484 23424 ?S21:44   0:00 python2.6 
> /export/apps/salt/minion-scripts/module-sync.py
> jbringhu 13481  0.0  0.0 105312   872 pts/0S+   22:13   0:00 grep -i 6953
> [asdf ~]$ 
> {noformat}
> This is using 2.6.32-431.11.2.el6.x86_64 in RHEL 6.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)