[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader

2015-11-06 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994758#comment-14994758
 ] 

Sangjin Lee commented on YARN-3862:
---

[~jrottinghuis] and I went over the patch in some more detail, and have a few 
high level suggestions.

I don't think the qualifiers that are being created currently in 
TimelineEntityReader.constructFilterListBasedOnFields() are quite right. We 
already talked about breaking down and pushing down the logic into its 
appropriate specific entity reader implementations. In addition, instead of 
trying to compute the byte arrays using the raw ingredients like Separators, we 
should rely on the \*ColumnPrefix classes to give you the byte arrays. That 
would lead to more properly encapsulated (and correct) code.

ColumnPrefix classes already do something like the following (see 
ApplicationColumnPrefix.store() for example):
{code}
byte[] columnQualifier =
ColumnHelper.getColumnQualifier(this.columnPrefixBytes, qualifier);
{code}

We could expose a new method on ColumnPrefix like
{code}
public interface ColumnPrefix {
  ...
  byte[] getColumnQualifierBytes(String qualifier);
  ...
}
{code}
And specific implementations can implement that method. That way, all the 
proper column prefix handling is managed and encapsulated by ColumnPrefix 
classes.

When we move the logic of creating the filter list to its appropriate entity 
reader classes, those classes already know which column prefix they're dealing 
with, and they can simply call these methods to get the bytes back. That will 
make the implementation much cleaner.

Hope this helps. Let me know if you have any questions... Thanks!

> Decide which contents to retrieve and send back in response in TimelineReader
> -
>
> Key: YARN-3862
> URL: https://issues.apache.org/jira/browse/YARN-3862
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-3862-YARN-2928.wip.01.patch, 
> YARN-3862-YARN-2928.wip.02.patch
>
>
> Currently, we will retrieve all the contents of the field if that field is 
> specified in the query API. In case of configs and metrics, this can become a 
> lot of data even though the user doesn't need it. So we need to provide a way 
> to query only a set of configs or metrics.
> As a comma spearated list of configs/metrics to be returned will be quite 
> cumbersome to specify, we have to support either of the following options :
> # Prefix match
> # Regex
> # Group the configs/metrics and query that group.
> We also need a facility to specify a metric time window to return metrics in 
> a that window. This may be useful in plotting graphs 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage

2015-11-06 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994603#comment-14994603
 ] 

Vrushali C commented on YARN-4053:
--

Thanks [~varun_saxena] for the patch and [~djp] , [~gtCarrera], 
[~Naganarasimha], [~sjlee0] and [~jrottinghuis] for the discussion so far!

[~jrottinghuis] , [~sjlee0] and I had an offline discussion on this yesterday. 
We discussed at length along the following vectors:
- metric datatype: long, double, either or, both?
- metric type storage and retrieval for: single values vs timeseries
- metrics in the context of aggregation: how to indicate whether to aggregate 
or no.
- operations on metrics: sum vs average, min/max

To summarize the discussion:

- Our proposal is to proceed with supporting only longs for now. We went over 
several situations of how to store and query for decimal numbers: as Doubles or 
as numerator/denominator, how to use filters while scanning for such stored 
values,  how would aggregation look at it etc. We thought about which metrics 
are to be stored as Doubles and how the precision might affect aggregation. We 
finally concluded that we should start with storing longs only and make the 
code strictly accept longs (not even ints or shorts).

- For single value vs time series, we suggest using a column prefix to 
distinguish them. For the read path, we can assume it is a single value unless 
specifically specified by the client as a time series (as clients would need to 
intend to read time series explicitly).

- Regarding indicating whether to aggregate or not, we suggest to rely mostly 
on the flow run aggregation. For those use cases that need to access metrics 
off of tables other than the flow run table (e.g. time-based aggregation), we 
need to explore ways to specify this information as input (config, etc.) 

- So, the current patch is along the lines of our proposal of using longs for 
metrics. But we are considering a different approach of creating a "converter" 
type and implementation. For other non metric columns, a "generic" converter 
that uses the GenericObjectMapper can be created and used implicitly. For the 
numeric (long) columns, a long converter would be used explicitly. We also need 
to revisit how it's done in FlowScanner (it missed one of the places in the 
current patch for example). We need to get at the instances of ColumnPrefix and 
ColumnFamily, etc. and use them to get the converter in the flow scanner.

@Varun Would it be fine if I took over this jira to patch it with the above 
points?

thanks
Vrushali

> Change the way metric values are stored in HBase Storage
> 
>
> Key: YARN-4053
> URL: https://issues.apache.org/jira/browse/YARN-4053
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4053-YARN-2928.01.patch, 
> YARN-4053-YARN-2928.02.patch
>
>
> Currently HBase implementation uses GenericObjectMapper to convert and store 
> values in backend HBase storage. This converts everything into a string 
> representation(ASCII/UTF-8 encoded byte array).
> While this is fine in most cases, it does not quite serve our use case for 
> metrics. 
> So we need to decide how are we going to encode and decode metric values and 
> store them in HBase.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler

2015-11-06 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994799#comment-14994799
 ] 

Wangda Tan commented on YARN-3980:
--

Just noticed this ticket. [~sunilg], since this patch is almost ready to go, if 
you think it's fine, could you update YARN-4292 to base on this one? And could 
you take a look at this patch to see if there's any changes need to merge?

[~goiri], thanks for working on this, two comments
{code}
352   public void setContainersUtilization(
353   ResourceUtilization containersUtilization) {
354 if (containersUtilization != null) {
355   this.containersUtilization = containersUtilization;
356 }
357   }
{code}
It seems SchedulerNode cannot update containerUtilization since it's always not 
null. I think you should directly update utilization when set.
And I suggest to update SchedulerNode.containersUtilization/nodeUtilization to 
use violate.

And for naming, I suggest to change getContainersUtilization in 
RMNode/SchedulerNode/RMNodeStatusEvent to get*Aggregated*ContainersUtilization, 
which is more straight forward to me.

> Plumb resource-utilization info in node heartbeat through to the scheduler
> --
>
> Key: YARN-3980
> URL: https://issues.apache.org/jira/browse/YARN-3980
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.7.1
>Reporter: Karthik Kambatla
>Assignee: Inigo Goiri
> Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, 
> YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch
>
>
> YARN-1012 and YARN-3534 collect resource utilization information for all 
> containers and the node respectively and send it to the RM on node heartbeat. 
> We should plumb it through to the scheduler so the scheduler can make use of 
> it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler

2015-11-06 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994811#comment-14994811
 ] 

Inigo Goiri commented on YARN-3980:
---

OK, I'll try it for both.

> Plumb resource-utilization info in node heartbeat through to the scheduler
> --
>
> Key: YARN-3980
> URL: https://issues.apache.org/jira/browse/YARN-3980
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.7.1
>Reporter: Karthik Kambatla
>Assignee: Inigo Goiri
> Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, 
> YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch
>
>
> YARN-1012 and YARN-3534 collect resource utilization information for all 
> containers and the node respectively and send it to the RM on node heartbeat. 
> We should plumb it through to the scheduler so the scheduler can make use of 
> it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4184) Remove update reservation state api from state store as its not used by ReservationSystem

2015-11-06 Thread Sean Po (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Po updated YARN-4184:
--
Attachment: YARN-4184.v1.patch

> Remove update reservation state api from state store as its not used by 
> ReservationSystem
> -
>
> Key: YARN-4184
> URL: https://issues.apache.org/jira/browse/YARN-4184
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Sean Po
> Attachments: YARN-4184.v1.patch
>
>
> ReservationSystem uses remove/add for updates and thus update api in state 
> store is not needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler

2015-11-06 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994805#comment-14994805
 ] 

Inigo Goiri commented on YARN-3980:
---

How does the violate thing work? Do you have any example?

I'll change the naming for getContainersUtilization.

> Plumb resource-utilization info in node heartbeat through to the scheduler
> --
>
> Key: YARN-3980
> URL: https://issues.apache.org/jira/browse/YARN-3980
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.7.1
>Reporter: Karthik Kambatla
>Assignee: Inigo Goiri
> Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, 
> YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch
>
>
> YARN-1012 and YARN-3534 collect resource utilization information for all 
> containers and the node respectively and send it to the RM on node heartbeat. 
> We should plumb it through to the scheduler so the scheduler can make use of 
> it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler

2015-11-06 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994807#comment-14994807
 ] 

Wangda Tan commented on YARN-3980:
--

Sorry, it's typo, I meant: volatile.

> Plumb resource-utilization info in node heartbeat through to the scheduler
> --
>
> Key: YARN-3980
> URL: https://issues.apache.org/jira/browse/YARN-3980
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.7.1
>Reporter: Karthik Kambatla
>Assignee: Inigo Goiri
> Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, 
> YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch
>
>
> YARN-1012 and YARN-3534 collect resource utilization information for all 
> containers and the node respectively and send it to the RM on node heartbeat. 
> We should plumb it through to the scheduler so the scheduler can make use of 
> it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4184) Remove update reservation state api from state store as its not used by ReservationSystem

2015-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995004#comment-14995004
 ] 

Hadoop QA commented on YARN-4184:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 5s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
9s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 5s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 8s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 128m 32s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_60 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
| JDK v1.7.0_79 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.7.1 Server=1.7.1 
Image:test-patch-base-hadoop-date2015-11-07 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12771140/YARN-4184.v1.patch |
| JIRA Issue | YARN-4184 |
| Optional Tests |  asflicense  javac  javadoc  mvninstall  unit  findbugs  
checkstyle  compile  |
| uname | Linux 75b2139115a8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | 

[jira] [Updated] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188

2015-11-06 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4320:
--
Fix Version/s: 2.6.3

> TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to 
> default port 8188
> ---
>
> Key: YARN-4320
> URL: https://issues.apache.org/jira/browse/YARN-4320
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 3.0.0, 2.8.0, 2.7.2, 2.6.3
>
> Attachments: YARN-4320.01.patch
>
>
> {noformat}
> Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler)
>   Time elapsed: 35.764 sec  <<< ERROR!
> java.lang.RuntimeException: Failed to connect to timeline server. Connection 
> retries limit exceeded. The posted timeline event may be missing
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4320) TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to default port 8188

2015-11-06 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993332#comment-14993332
 ] 

Sangjin Lee commented on YARN-4320:
---

Committed it to branch-2.6 too.

> TestJobHistoryEventHandler fails as AHS in MiniYarnCluster no longer binds to 
> default port 8188
> ---
>
> Key: YARN-4320
> URL: https://issues.apache.org/jira/browse/YARN-4320
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 3.0.0, 2.8.0, 2.7.2, 2.6.3
>
> Attachments: YARN-4320.01.patch
>
>
> {noformat}
> Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 40.256 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
> testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler)
>   Time elapsed: 35.764 sec  <<< ERROR!
> java.lang.RuntimeException: Failed to connect to timeline server. Connection 
> retries limit exceeded. The posted timeline event may be missing
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245)
>   at com.sun.jersey.api.client.Client.handle(Client.java:648)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:719)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:507)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)

2015-11-06 Thread Mohammad Shahid Khan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993412#comment-14993412
 ] 

Mohammad Shahid Khan commented on YARN-3840:


UT failiure --  not related to  current patch
findbugs -- not related to  current patch

> Resource Manager web ui issue when sorting application by id (with 
> application having id > )
> 
>
> Key: YARN-3840
> URL: https://issues.apache.org/jira/browse/YARN-3840
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: LINTE
>Assignee: Mohammad Shahid Khan
> Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch, 
> YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, YARN-3840-6.patch, 
> yarn-3840-7.patch
>
>
> On the WEBUI, the global main view page : 
> http://resourcemanager:8088/cluster/apps doesn't display applications over 
> .
> With command line it works (# yarn application -list).
> Regards,
> Alexandre



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4336) YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP

2015-11-06 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994018#comment-14994018
 ] 

Greg Senia commented on YARN-4336:
--

[~jlowe] just confirmed that HADOOP-12413 does seem to fix it.. Just ran a test 
with the else if block... I think as a safety measure for the time being I may 
still keep my tactical fix but I'll move the regex to compile once as if the 
LDAP storm from our jobs show up again I'm going to have a bad day :)


Confirmed it never was sent out to NSS/LDAP
appattempt_1446307555640_0052_01

> YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP
> 
>
> Key: YARN-4336
> URL: https://issues.apache.org/jira/browse/YARN-4336
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0, 2.4.1, 2.6.0, 2.7.0, 2.6.1, 2.7.1
> Environment: NSS w/ SSSD or Dell/Quest - VASD
>Reporter: Greg Senia
>Assignee: Greg Senia
> Attachments: YARN-4336-tactical.txt
>
>
> Hi folks after performing some debug for our Unix Engineering and Active 
> Directory teams it was discovered that on YARN Container Initialization a 
> call via Hadoop Common AccessControlList.java:
>   for(String group: ugi.getGroupNames()) {
> if (groups.contains(group)) {
>   return true;
> }
>   }
> Unfortunately with the security call to check access on 
> "appattempt_X_X_X" will always return false but will make 
> unnecessary calls to NameSwitch service on linux which will call things like 
> SSSD/Quest VASD which will then initiate LDAP calls looking for non existent 
> userid's causing excessive load on LDAP.
> For now our tactical work around is as follows:
> /**
>* Checks if a user represented by the provided {@link UserGroupInformation}
>* is a member of the Access Control List
>* @param ugi UserGroupInformation to check if contained in the ACL
>* @return true if ugi is member of the list
>*/
>   public final boolean isUserInList(UserGroupInformation ugi) {
> if (allAllowed || users.contains(ugi.getShortUserName())) {
>   return true;
> } else {
> String patternString = "^appattempt_\\d+_\\d+_\\d+$";
> Pattern pattern = Pattern.compile(patternString);
> Matcher matcher = pattern.matcher(ugi.getShortUserName());
> boolean matches = matcher.matches();
> if (matches) {
>   LOG.debug("Bailing !! AppAttempt Matches DONOT call UGI FOR 
> GROUPS!!");;
>   return false;
> }
>   
>   
>   for(String group: ugi.getGroupNames()) {
> if (groups.contains(group)) {
>   return true;
> }
>   }
> }
> return false;
>   }
>   public boolean isUserAllowed(UserGroupInformation ugi) {
> return isUserInList(ugi);
>   }
> Example of VASD Debug log showing the lookups for one task attempt 32 of them:
> One task:
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> 

[jira] [Commented] (YARN-4330) MiniYARNCluster prints multiple Failed to instantiate default resource calculator warning messages

2015-11-06 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994272#comment-14994272
 ] 

Varun Saxena commented on YARN-4330:


Patch does the following :
1. If node resource monitoring interval or container monitoring interval is <= 
0, considering this is as disabling monitoring. Interval <=0 doesnt make much 
sense anyways. Resource calculator plugin(even the default one) wont be 
required if interval is <=0. Have made changes in relevant classes to take care 
of this change. Also, I have set this config to 0 in MiniYARNCluster. Dummy 
plugin wont be required in this case.
2. In NodeManagerHardwareUtils, we take the memory and CPU from config if 
hardware detection is disabled irrespective of whether resource calculator 
plugin can be created or not . Moved around the code in the class to check for 
the config for disable first and returning value from config if its so. In 
MiniYARNCluster have explicitly set it to false. I dont think hardware 
detection is required for tests. 
3. Catching UnsupportedOperationException and logging it at info. No stack 
trace is printed. For other exceptions, stack trace will be printed(keeping it 
consistent with previous behavior). Maybe stack trace in case of other 
unexpected exceptions may be useful.

> MiniYARNCluster prints multiple  Failed to instantiate default resource 
> calculator warning messages
> ---
>
> Key: YARN-4330
> URL: https://issues.apache.org/jira/browse/YARN-4330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test, yarn
>Affects Versions: 2.8.0
> Environment: OSX, JUnit
>Reporter: Steve Loughran
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: YARN-4330.01.patch
>
>
> Whenever I try to start a MiniYARNCluster on Branch-2 (commit #0b61cca), I 
> see multiple stack traces warning me that a resource calculator plugin could 
> not be created
> {code}
> (ResourceCalculatorPlugin.java:getResourceCalculatorPlugin(184)) - 
> java.lang.UnsupportedOperationException: Could not determine OS: Failed to 
> instantiate default resource calculator.
> java.lang.UnsupportedOperationException: Could not determine OS
> {code}
> This is a minicluster. It doesn't need resource calculation. It certainly 
> doesn't need test logs being cluttered with even more stack traces which will 
> only generate false alarms about tests failing. 
> There needs to be a way to turn this off, and the minicluster should have it 
> that way by default.
> Being ruthless and marking as a blocker, because its a fairly major 
> regression for anyone testing with the minicluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4337) Resolve all docs errors in *.apt.vm for YARN

2015-11-06 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994275#comment-14994275
 ] 

Chen He commented on YARN-4337:
---

Thank you for the reply, [~adaniels], I changed the affect version to 2.6.x.

> Resolve all docs errors in *.apt.vm for YARN
> 
>
> Key: YARN-4337
> URL: https://issues.apache.org/jira/browse/YARN-4337
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.6.1
>Reporter: Chen He
>Priority: Minor
>  Labels: documentation, newbie
>
> This is a newbie++ docs ticket.
> Simple example, In WebServiceInfo.apt.vm
> *** JSON response with single resource
>   HTTP Request:
>   GET 
> http://rmhost.domain:8088/ws/v1/cluster/app/application_1324057493980_0001
>   Response Status Line:
>   HTTP/1.1 200 OK
>   Response Header:
> +---+
> The URI
> http://rmhost.domain:8088/ws/v1/cluster/app/application_1324057493980_0001 is 
> invalid. It should be "apps" instead of "app" in the URI. It may mislead 
> first time users to think that YARN REST API does not work. Similarly, we 
> should remove all similar typos or minor errors in *.apt.vm file for YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3842) NMProxy should retry on NMNotYetReadyException

2015-11-06 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3842:
--
Target Version/s: 2.7.1, 2.6.3  (was: 2.7.1)

Running into this in a couple of places, we should get this into 2.6.3.

> NMProxy should retry on NMNotYetReadyException
> --
>
> Key: YARN-3842
> URL: https://issues.apache.org/jira/browse/YARN-3842
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Robert Kanter
>Priority: Critical
> Fix For: 2.7.1
>
> Attachments: MAPREDUCE-6409.001.patch, MAPREDUCE-6409.002.patch, 
> YARN-3842.001.patch, YARN-3842.002.patch
>
>
> Consider the following scenario:
> 1. RM assigns a container on node N to an app A.
> 2. Node N is restarted
> 3. A tries to launch container on node N.
> 3 could lead to an NMNotYetReadyException depending on whether NM N has 
> registered with the RM. In MR, this is considered a task attempt failure. A 
> few of these could lead to a task/job failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4337) Resolve all docs errors in *.apt.vm for YARN

2015-11-06 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994283#comment-14994283
 ] 

Daniel Templeton commented on YARN-4337:


Yep, the error is there in 2.6.1, but it's fixed in 2.7.0 by YARN-3436.  Any 
reason not to close this as a dup?

> Resolve all docs errors in *.apt.vm for YARN
> 
>
> Key: YARN-4337
> URL: https://issues.apache.org/jira/browse/YARN-4337
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.6.1
>Reporter: Chen He
>Priority: Minor
>  Labels: documentation, newbie
>
> This is a newbie++ docs ticket.
> Simple example, In WebServiceInfo.apt.vm
> *** JSON response with single resource
>   HTTP Request:
>   GET 
> http://rmhost.domain:8088/ws/v1/cluster/app/application_1324057493980_0001
>   Response Status Line:
>   HTTP/1.1 200 OK
>   Response Header:
> +---+
> The URI
> http://rmhost.domain:8088/ws/v1/cluster/app/application_1324057493980_0001 is 
> invalid. It should be "apps" instead of "app" in the URI. It may mislead 
> first time users to think that YARN REST API does not work. Similarly, we 
> should remove all similar typos or minor errors in *.apt.vm file for YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4337) Resolve all docs errors in *.apt.vm for YARN

2015-11-06 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He resolved YARN-4337.
---
Resolution: Duplicate

> Resolve all docs errors in *.apt.vm for YARN
> 
>
> Key: YARN-4337
> URL: https://issues.apache.org/jira/browse/YARN-4337
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.6.1
>Reporter: Chen He
>Priority: Minor
>  Labels: documentation, newbie
>
> This is a newbie++ docs ticket.
> Simple example, In WebServiceInfo.apt.vm
> *** JSON response with single resource
>   HTTP Request:
>   GET 
> http://rmhost.domain:8088/ws/v1/cluster/app/application_1324057493980_0001
>   Response Status Line:
>   HTTP/1.1 200 OK
>   Response Header:
> +---+
> The URI
> http://rmhost.domain:8088/ws/v1/cluster/app/application_1324057493980_0001 is 
> invalid. It should be "apps" instead of "app" in the URI. It may mislead 
> first time users to think that YARN REST API does not work. Similarly, we 
> should remove all similar typos or minor errors in *.apt.vm file for YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4337) Resolve all docs errors in *.apt.vm for YARN

2015-11-06 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-4337:
--
Affects Version/s: (was: 2.7.1)
   2.6.1

> Resolve all docs errors in *.apt.vm for YARN
> 
>
> Key: YARN-4337
> URL: https://issues.apache.org/jira/browse/YARN-4337
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.6.1
>Reporter: Chen He
>Priority: Minor
>  Labels: documentation, newbie
>
> This is a newbie++ docs ticket.
> Simple example, In WebServiceInfo.apt.vm
> *** JSON response with single resource
>   HTTP Request:
>   GET 
> http://rmhost.domain:8088/ws/v1/cluster/app/application_1324057493980_0001
>   Response Status Line:
>   HTTP/1.1 200 OK
>   Response Header:
> +---+
> The URI
> http://rmhost.domain:8088/ws/v1/cluster/app/application_1324057493980_0001 is 
> invalid. It should be "apps" instead of "app" in the URI. It may mislead 
> first time users to think that YARN REST API does not work. Similarly, we 
> should remove all similar typos or minor errors in *.apt.vm file for YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4337) Resolve all docs errors in *.apt.vm for YARN

2015-11-06 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994279#comment-14994279
 ] 

Chen He commented on YARN-4337:
---

I think after branch-2.7, we are not suing apt.vm file anymore. Those *.apt.vm 
are in 2.6.x

> Resolve all docs errors in *.apt.vm for YARN
> 
>
> Key: YARN-4337
> URL: https://issues.apache.org/jira/browse/YARN-4337
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.6.1
>Reporter: Chen He
>Priority: Minor
>  Labels: documentation, newbie
>
> This is a newbie++ docs ticket.
> Simple example, In WebServiceInfo.apt.vm
> *** JSON response with single resource
>   HTTP Request:
>   GET 
> http://rmhost.domain:8088/ws/v1/cluster/app/application_1324057493980_0001
>   Response Status Line:
>   HTTP/1.1 200 OK
>   Response Header:
> +---+
> The URI
> http://rmhost.domain:8088/ws/v1/cluster/app/application_1324057493980_0001 is 
> invalid. It should be "apps" instead of "app" in the URI. It may mislead 
> first time users to think that YARN REST API does not work. Similarly, we 
> should remove all similar typos or minor errors in *.apt.vm file for YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4241) Typo in yarn-default.xml

2015-11-06 Thread Anthony Rojas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994278#comment-14994278
 ] 

Anthony Rojas commented on YARN-4241:
-

Linking to YARN-3943 to add the 
"yarn.nodemanager.disk-health-checker.disk-utilization-watermark-low-per-disk-percentage"
 property in yarn-default.xml

> Typo in yarn-default.xml
> 
>
> Key: YARN-4241
> URL: https://issues.apache.org/jira/browse/YARN-4241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, yarn
>Reporter: Anthony Rojas
>Assignee: Anthony Rojas
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-4241.patch, YARN-4241.patch.1
>
>
> Typo in description section of yarn-default.xml, under the properties:
> yarn.nodemanager.disk-health-checker.min-healthy-disks
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
> yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb
> The reference to yarn-nodemanager.local-dirs should be 
> yarn.nodemanager.local-dirs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4241) Typo in yarn-default.xml

2015-11-06 Thread Anthony Rojas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994297#comment-14994297
 ] 

Anthony Rojas commented on YARN-4241:
-

Correction, please disregard this comment.


> Typo in yarn-default.xml
> 
>
> Key: YARN-4241
> URL: https://issues.apache.org/jira/browse/YARN-4241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, yarn
>Reporter: Anthony Rojas
>Assignee: Anthony Rojas
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-4241.patch, YARN-4241.patch.1
>
>
> Typo in description section of yarn-default.xml, under the properties:
> yarn.nodemanager.disk-health-checker.min-healthy-disks
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
> yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb
> The reference to yarn-nodemanager.local-dirs should be 
> yarn.nodemanager.local-dirs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4336) YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP

2015-11-06 Thread Greg Senia (JIRA)
Greg Senia created YARN-4336:


 Summary: YARN NodeManager - Container Initialization - Excessive 
load on NSS/LDAP
 Key: YARN-4336
 URL: https://issues.apache.org/jira/browse/YARN-4336
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1, 2.6.1, 2.7.0, 2.6.0, 2.4.1, 2.4.0
 Environment: NSS w/ SSSD or Dell/Quest - VASD
Reporter: Greg Senia


Hi folks after performing some debug for our Unix Engineering and Active 
Directory teams it was discovered that on YARN Container Initialization a call 
via Hadoop Common AccessControlList.java:

  for(String group: ugi.getGroupNames()) {
if (groups.contains(group)) {
  return true;
}
  }

Unfortunately with the security call to check access on 
"appattempt_X_X_X" will always return false but will make 
unnecessary calls to NameSwitch service on linux which will call things like 
SSSD/Quest VASD which will then initiate LDAP calls looking for non existent 
userid's causing excessive load on LDAP.

For now our tactical work around is as follows:

/**
   * Checks if a user represented by the provided {@link UserGroupInformation}
   * is a member of the Access Control List
   * @param ugi UserGroupInformation to check if contained in the ACL
   * @return true if ugi is member of the list
   */
  public final boolean isUserInList(UserGroupInformation ugi) {
if (allAllowed || users.contains(ugi.getShortUserName())) {
  return true;
} else {
String patternString = "^appattempt_\\d+_\\d+_\\d+$";

Pattern pattern = Pattern.compile(patternString);

Matcher matcher = pattern.matcher(ugi.getShortUserName());
boolean matches = matcher.matches();
if (matches) {
LOG.debug("Bailing !! AppAttempt Matches DONOT call UGI FOR 
GROUPS!!");;
return false;
}


  for(String group: ugi.getGroupNames()) {
if (groups.contains(group)) {
  return true;
}
  }
}
return false;
  }

  public boolean isUserAllowed(UserGroupInformation ugi) {
return isUserInList(ugi);
  }


Example of VASD Debug log showing the lookups for one task attempt 32 of them:

One task:
Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
 with 
filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
 base=<>, scope=
Oct 30 22:57:18 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching GC 
for host service domain EXNSD.EXA.EXAMPLE.COM with filter 

[jira] [Updated] (YARN-4336) YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP

2015-11-06 Thread Greg Senia (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Senia updated YARN-4336:
-
Attachment: YARN-4336-tactical.txt

tactical fix

> YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP
> 
>
> Key: YARN-4336
> URL: https://issues.apache.org/jira/browse/YARN-4336
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0, 2.4.1, 2.6.0, 2.7.0, 2.6.1, 2.7.1
> Environment: NSS w/ SSSD or Dell/Quest - VASD
>Reporter: Greg Senia
>Assignee: Greg Senia
> Attachments: YARN-4336-tactical.txt
>
>
> Hi folks after performing some debug for our Unix Engineering and Active 
> Directory teams it was discovered that on YARN Container Initialization a 
> call via Hadoop Common AccessControlList.java:
>   for(String group: ugi.getGroupNames()) {
> if (groups.contains(group)) {
>   return true;
> }
>   }
> Unfortunately with the security call to check access on 
> "appattempt_X_X_X" will always return false but will make 
> unnecessary calls to NameSwitch service on linux which will call things like 
> SSSD/Quest VASD which will then initiate LDAP calls looking for non existent 
> userid's causing excessive load on LDAP.
> For now our tactical work around is as follows:
> /**
>* Checks if a user represented by the provided {@link UserGroupInformation}
>* is a member of the Access Control List
>* @param ugi UserGroupInformation to check if contained in the ACL
>* @return true if ugi is member of the list
>*/
>   public final boolean isUserInList(UserGroupInformation ugi) {
> if (allAllowed || users.contains(ugi.getShortUserName())) {
>   return true;
> } else {
> String patternString = "^appattempt_\\d+_\\d+_\\d+$";
> Pattern pattern = Pattern.compile(patternString);
> Matcher matcher = pattern.matcher(ugi.getShortUserName());
> boolean matches = matcher.matches();
> if (matches) {
>   LOG.debug("Bailing !! AppAttempt Matches DONOT call UGI FOR 
> GROUPS!!");;
>   return false;
> }
>   
>   
>   for(String group: ugi.getGroupNames()) {
> if (groups.contains(group)) {
>   return true;
> }
>   }
> }
> return false;
>   }
>   public boolean isUserAllowed(UserGroupInformation ugi) {
> return isUserInList(ugi);
>   }
> Example of VASD Debug log showing the lookups for one task attempt 32 of them:
> One task:
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> 

[jira] [Commented] (YARN-4336) YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP

2015-11-06 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993836#comment-14993836
 ] 

Jason Lowe commented on YARN-4336:
--

I believe this is a duplicate of YARN-3452.  We fixed it by reverting  
HADOOP-10650 in our internal build since we don't need the blacklisting 
functionality added by that feature, and that's what caused the excess lookups. 
 IMHO the real fix is to have YARN not use bogus user names, but I don't know 
if that's going to be an easy change to make.

> YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP
> 
>
> Key: YARN-4336
> URL: https://issues.apache.org/jira/browse/YARN-4336
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0, 2.4.1, 2.6.0, 2.7.0, 2.6.1, 2.7.1
> Environment: NSS w/ SSSD or Dell/Quest - VASD
>Reporter: Greg Senia
>Assignee: Greg Senia
> Attachments: YARN-4336-tactical.txt
>
>
> Hi folks after performing some debug for our Unix Engineering and Active 
> Directory teams it was discovered that on YARN Container Initialization a 
> call via Hadoop Common AccessControlList.java:
>   for(String group: ugi.getGroupNames()) {
> if (groups.contains(group)) {
>   return true;
> }
>   }
> Unfortunately with the security call to check access on 
> "appattempt_X_X_X" will always return false but will make 
> unnecessary calls to NameSwitch service on linux which will call things like 
> SSSD/Quest VASD which will then initiate LDAP calls looking for non existent 
> userid's causing excessive load on LDAP.
> For now our tactical work around is as follows:
> /**
>* Checks if a user represented by the provided {@link UserGroupInformation}
>* is a member of the Access Control List
>* @param ugi UserGroupInformation to check if contained in the ACL
>* @return true if ugi is member of the list
>*/
>   public final boolean isUserInList(UserGroupInformation ugi) {
> if (allAllowed || users.contains(ugi.getShortUserName())) {
>   return true;
> } else {
> String patternString = "^appattempt_\\d+_\\d+_\\d+$";
> Pattern pattern = Pattern.compile(patternString);
> Matcher matcher = pattern.matcher(ugi.getShortUserName());
> boolean matches = matcher.matches();
> if (matches) {
>   LOG.debug("Bailing !! AppAttempt Matches DONOT call UGI FOR 
> GROUPS!!");;
>   return false;
> }
>   
>   
>   for(String group: ugi.getGroupNames()) {
> if (groups.contains(group)) {
>   return true;
> }
>   }
> }
> return false;
>   }
>   public boolean isUserAllowed(UserGroupInformation ugi) {
> return isUserInList(ugi);
>   }
> Example of VASD Debug log showing the lookups for one task attempt 32 of them:
> One task:
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 

[jira] [Commented] (YARN-4219) New levelDB cache storage for timeline v1.5

2015-11-06 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993878#comment-14993878
 ] 

Jason Lowe commented on YARN-4219:
--

Shouldn't we commit some form of YARN-3942 before committing this?  I got the 
impression this was an underlying storage to be used by the 
EntityFileTimelineStore, or is there a use-case for this outside of the v1.5 
core code in YARN-3942?

Patch looks pretty good except for one thing I noticed: The pom file should not 
have hardcoded versions in it.  It should omit the versions and leave it up to 
hadoop-project/pom.xml to define that.  Otherwise we risk having different 
portions of code needing different versions of the same dependency.


> New levelDB cache storage for timeline v1.5
> ---
>
> Key: YARN-4219
> URL: https://issues.apache.org/jira/browse/YARN-4219
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4219-trunk.001.patch, YARN-4219-trunk.002.patch, 
> YARN-4219-trunk.003.patch
>
>
> We need to have an "offline" caching storage for timeline server v1.5 after 
> the changes in YARN-3942. The in memory timeline storage may run into OOM 
> issues when used as a cache storage for entity file timeline storage. We can 
> refactor the code and have a level db based caching storage for this use 
> case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4336) YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP

2015-11-06 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993899#comment-14993899
 ] 

Greg Senia commented on YARN-4336:
--

[~jlowe] I dumped stack traces below and it seems to match what was done in 
Hadoop-10650.. Do you see an issue with my workaround for now in my own env 
until HWX can provide a final solution?

Seems like this could also be related...

https://issues.apache.org/jira/browse/HADOOP-12413

Stack Trace:
2015-11-06 11:25:52,313 DEBUG ipc.Server (Server.java:processOneRpc(1762)) -  
got #-33
2015-11-06 11:25:52,313 DEBUG security.SaslRpcServer 
(SaslRpcServer.java:create(174)) - Created SASL server with mechanism = DIGEST-
MD5
2015-11-06 11:25:52,314 DEBUG ipc.Server (Server.java:doSaslReply(1424)) - 
Sending sasl message state: NEGOTIATE
auths {
  method: "TOKEN"
  mechanism: "DIGEST-MD5"
  protocol: ""
  serverId: "default"
  challenge: 
"realm=\"default\",nonce=\"389ZufpXfkC6CKunYceHayMBI3KM7v3keu9nPC/b\",qop=\"auth\",charset=utf-8,algorithm=md5-sess"
}
auths {
  method: "KERBEROS"
  mechanism: "GSSAPI"
  protocol: "nm"
  serverId: "xhadoopm5d.example.com"
}

2015-11-06 11:25:52,314 DEBUG ipc.Server (Server.java:processResponse(972)) - 
Socket Reader #1 for port 8040: responding to null fro
m 157.121.72.167:64599 Call#-33 Retry#-1
2015-11-06 11:25:52,314 DEBUG ipc.Server (Server.java:processResponse(991)) - 
Socket Reader #1 for port 8040: responding to null fro
m 157.121.72.167:64599 Call#-33 Retry#-1 Wrote 212 bytes.
2015-11-06 11:25:52,343 DEBUG ipc.Server (Server.java:processOneRpc(1762)) -  
got #-33
2015-11-06 11:25:52,343 DEBUG ipc.Server (Server.java:processSaslToken(1393)) - 
Have read input token of size 246 for processing by 
saslServer.evaluateResponse()
2015-11-06 11:25:52,344 DEBUG security.SaslRpcServer 
(SaslRpcServer.java:handle(308)) - SASL server DIGEST-MD5 callback: setting pas
sword for client: testing (auth:SIMPLE)
2015-11-06 11:25:52,344 DEBUG security.SaslRpcServer 
(SaslRpcServer.java:handle(325)) - SASL server DIGEST-MD5 callback: setting can
onicalized client ID: testing
2015-11-06 11:25:52,345 DEBUG ipc.Server (Server.java:buildSaslResponse(1410)) 
- Will send SUCCESS token of size 40 from saslServer.
2015-11-06 11:25:52,345 DEBUG ipc.Server (Server.java:saslProcess(1298)) - SASL 
server context established. Negotiated QoP is auth
2015-11-06 11:25:52,345 DEBUG ipc.Server (Server.java:saslProcess(1303)) - SASL 
server successfully authenticated client: testing (a
uth:SIMPLE)
2015-11-06 11:25:52,345 INFO  ipc.Server (Server.java:saslProcess(1306)) - Auth 
successful for testing (auth:SIMPLE)
2015-11-06 11:25:52,345 DEBUG ipc.Server (Server.java:doSaslReply(1424)) - 
Sending sasl message state: SUCCESS
token: "rspauth=9bfdf3e61c489664e885d7043b352c24"

2015-11-06 11:25:52,345 DEBUG ipc.Server (Server.java:processResponse(972)) - 
Socket Reader #1 for port 8040: responding to null fro
m 157.121.72.167:64599 Call#-33 Retry#-1
2015-11-06 11:25:52,346 DEBUG ipc.Server (Server.java:processResponse(991)) - 
Socket Reader #1 for port 8040: responding to null fro
m 157.121.72.167:64599 Call#-33 Retry#-1 Wrote 64 bytes.
2015-11-06 11:25:52,357 DEBUG ipc.Server (Server.java:processOneRpc(1762)) -  
got #-3
2015-11-06 11:25:52,357 DEBUG security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1488)) - java.lang.Thread.getSt
ackTrace(Thread.java:1589)
2015-11-06 11:25:52,357 DEBUG security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1488)) - 
org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1487)
2015-11-06 11:25:52,357 DEBUG security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1488)) - 
org.apache.hadoop.security.authorize.AccessControlList.isUserInList(AccessControlList.java:252)
2015-11-06 11:25:52,357 DEBUG security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1488)) - 
org.apache.hadoop.security.authorize.AccessControlList.isUserAllowed(AccessControlList.java:262)
2015-11-06 11:25:52,357 DEBUG security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1488)) - 
org.apache.hadoop.security.authorize.ServiceAuthorizationManager.authorize(ServiceAuthorizationManager.java:110)
2015-11-06 11:25:52,357 DEBUG security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1488)) - 
org.apache.hadoop.ipc.Server.authorize(Server.java:2507)
2015-11-06 11:25:52,358 DEBUG security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1488)) - 
org.apache.hadoop.ipc.Server.access$3300(Server.java:135)
2015-11-06 11:25:52,358 DEBUG security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1488)) - 
org.apache.hadoop.ipc.Server$Connection.authorizeConnection(Server.java:1923)
2015-11-06 11:25:52,358 DEBUG security.UserGroupInformation 
(UserGroupInformation.java:getGroupNames(1488)) - 

[jira] [Commented] (YARN-3452) Bogus token usernames cause many invalid group lookups

2015-11-06 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993927#comment-14993927
 ] 

Allen Wittenauer commented on YARN-3452:


I wonder what happens when they *do* resolve...  

> Bogus token usernames cause many invalid group lookups
> --
>
> Key: YARN-3452
> URL: https://issues.apache.org/jira/browse/YARN-3452
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Reporter: Jason Lowe
>
> YARN uses a number of bogus usernames for tokens, like application attempt 
> IDs for NM tokens or even the hardcoded "testing" for the container localizer 
> token.  These tokens cause the RPC layer to do group lookups on these bogus 
> usernames which will never succeed but can take a long time to perform.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4336) YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP

2015-11-06 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993932#comment-14993932
 ] 

Jason Lowe commented on YARN-4336:
--

bq. Seems like this could also be related...  
https://issues.apache.org/jira/browse/HADOOP-12413
Nice find!  I totally missed that when it went by.  I'll pull that fix into the 
2.6 and 2.7 lines.  I think that could eliminate the bogus lookups in practice 
when the reverse ACL isn't being used.

bq.  Do you see an issue with my workaround for now in my own env until HWX can 
provide a final solution?
It will work.  Nit: it's pricey to compile the pattern every time, could just 
compile it once.  Or as I mentioned above, I think pulling in HADOOP-12413 to 
your build could also eliminate the bogus lookups (assuming you don't use the 
reverse ACL feature).


> YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP
> 
>
> Key: YARN-4336
> URL: https://issues.apache.org/jira/browse/YARN-4336
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0, 2.4.1, 2.6.0, 2.7.0, 2.6.1, 2.7.1
> Environment: NSS w/ SSSD or Dell/Quest - VASD
>Reporter: Greg Senia
>Assignee: Greg Senia
> Attachments: YARN-4336-tactical.txt
>
>
> Hi folks after performing some debug for our Unix Engineering and Active 
> Directory teams it was discovered that on YARN Container Initialization a 
> call via Hadoop Common AccessControlList.java:
>   for(String group: ugi.getGroupNames()) {
> if (groups.contains(group)) {
>   return true;
> }
>   }
> Unfortunately with the security call to check access on 
> "appattempt_X_X_X" will always return false but will make 
> unnecessary calls to NameSwitch service on linux which will call things like 
> SSSD/Quest VASD which will then initiate LDAP calls looking for non existent 
> userid's causing excessive load on LDAP.
> For now our tactical work around is as follows:
> /**
>* Checks if a user represented by the provided {@link UserGroupInformation}
>* is a member of the Access Control List
>* @param ugi UserGroupInformation to check if contained in the ACL
>* @return true if ugi is member of the list
>*/
>   public final boolean isUserInList(UserGroupInformation ugi) {
> if (allAllowed || users.contains(ugi.getShortUserName())) {
>   return true;
> } else {
> String patternString = "^appattempt_\\d+_\\d+_\\d+$";
> Pattern pattern = Pattern.compile(patternString);
> Matcher matcher = pattern.matcher(ugi.getShortUserName());
> boolean matches = matcher.matches();
> if (matches) {
>   LOG.debug("Bailing !! AppAttempt Matches DONOT call UGI FOR 
> GROUPS!!");;
>   return false;
> }
>   
>   
>   for(String group: ugi.getGroupNames()) {
> if (groups.contains(group)) {
>   return true;
> }
>   }
> }
> return false;
>   }
>   public boolean isUserAllowed(UserGroupInformation ugi) {
> return isUserInList(ugi);
>   }
> Example of VASD Debug log showing the lookups for one task attempt 32 of them:
> One task:
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> 

[jira] [Commented] (YARN-3452) Bogus token usernames cause many invalid group lookups

2015-11-06 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993941#comment-14993941
 ] 

Jason Lowe commented on YARN-3452:
--

Yeah, probably nothing good.  [~gss2002] pointed out HADOOP-12413 which I think 
will also remove the bogus lookups in practice when users aren't using the 
reverse ACL feature that was added in HADOOP-10650.  I'll pull that into 2.6 
and 2.7, since I think most users won't be using that new feature.  We'll still 
need to stop using the bogus usernames for those that are using that 
reverse-ACL feature or if someone else tries to do something with the ugi 
assuming it actually was a valid user.

> Bogus token usernames cause many invalid group lookups
> --
>
> Key: YARN-3452
> URL: https://issues.apache.org/jira/browse/YARN-3452
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Reporter: Jason Lowe
>
> YARN uses a number of bogus usernames for tokens, like application attempt 
> IDs for NM tokens or even the hardcoded "testing" for the container localizer 
> token.  These tokens cause the RPC layer to do group lookups on these bogus 
> usernames which will never succeed but can take a long time to perform.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4326) Fix TestDistributedShell timeout as AHS in MiniYarnCluster no longer binds to default port 8188

2015-11-06 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4326:
--
Fix Version/s: 2.7.3
   2.6.3

Pulled the fix to branch-2.7 and branch-2.6.

> Fix TestDistributedShell timeout as AHS in MiniYarnCluster no longer binds to 
> default port 8188
> ---
>
> Key: YARN-4326
> URL: https://issues.apache.org/jira/browse/YARN-4326
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: MENG DING
>Assignee: MENG DING
> Fix For: 2.8.0, 2.6.3, 2.7.3
>
> Attachments: YARN-4326.patch
>
>
> The timeout originates in ApplicationMaster, where it fails to connect to 
> timeline server, and retry exceeds limits:
> {code}
> 2015-11-02 21:57:38,066 INFO  [main] impl.TimelineClientImpl 
> (TimelineClientImpl.java:serviceInit(299)) - Timeline service address: 
> http://mdinglin02:0/ws/v1/timeline/
> 2015-11-02 21:57:38,099 INFO  [main] impl.TimelineClientImpl 
> (TimelineClientImpl.java:logException(213)) - Exception caught by 
> TimelineClientConnectionRetry, will try 30 more time(s).
> ...
> ...
> java.lang.RuntimeException: Failed to connect to timeline server. Connection 
> retries limit exceeded. The posted timeline event may be missing
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:206)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245)
> at com.sun.jersey.api.client.Client.handle(Client.java:648)
> at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
> at 
> com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
> at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:477)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:326)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:323)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:308)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1184)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:571)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:302)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4330) MiniYARNCluster prints multiple Failed to instantiate default resource calculator warning messages

2015-11-06 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993711#comment-14993711
 ] 

Steve Loughran commented on YARN-4330:
--

+1 for downgrading the stack trace to DEBUG level; anything at INFO/WARN should 
include the calculator plugin conf value in case that is the problem.

and another +1 for having a way to turn this off for minicluster tests. Having 
a dummy plugin would be more generally useful, and avoid having yet another 
config option

> MiniYARNCluster prints multiple  Failed to instantiate default resource 
> calculator warning messages
> ---
>
> Key: YARN-4330
> URL: https://issues.apache.org/jira/browse/YARN-4330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test, yarn
>Affects Versions: 2.8.0
> Environment: OSX, JUnit
>Reporter: Steve Loughran
>Assignee: Varun Saxena
>Priority: Blocker
>
> Whenever I try to start a MiniYARNCluster on Branch-2 (commit #0b61cca), I 
> see multiple stack traces warning me that a resource calculator plugin could 
> not be created
> {code}
> (ResourceCalculatorPlugin.java:getResourceCalculatorPlugin(184)) - 
> java.lang.UnsupportedOperationException: Could not determine OS: Failed to 
> instantiate default resource calculator.
> java.lang.UnsupportedOperationException: Could not determine OS
> {code}
> This is a minicluster. It doesn't need resource calculation. It certainly 
> doesn't need test logs being cluttered with even more stack traces which will 
> only generate false alarms about tests failing. 
> There needs to be a way to turn this off, and the minicluster should have it 
> that way by default.
> Being ruthless and marking as a blocker, because its a fairly major 
> regression for anyone testing with the minicluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3452) Bogus token usernames cause many invalid group lookups

2015-11-06 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994017#comment-14994017
 ] 

Greg Senia commented on YARN-3452:
--

[~jlowe] just confirmed that HADOOP-12413 does seem to fix it.. Just ran a test 
with the else if block... I think as a safety measure for the time being I may 
still keep my tactical fix but I'll move the regex to compile once as if the 
LDAP storm from our jobs show up again I'm going to have a bad day :)


Confirmed it never was sent out to NSS/LDAP
appattempt_1446307555640_0052_01

> Bogus token usernames cause many invalid group lookups
> --
>
> Key: YARN-3452
> URL: https://issues.apache.org/jira/browse/YARN-3452
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Reporter: Jason Lowe
>
> YARN uses a number of bogus usernames for tokens, like application attempt 
> IDs for NM tokens or even the hardcoded "testing" for the container localizer 
> token.  These tokens cause the RPC layer to do group lookups on these bogus 
> usernames which will never succeed but can take a long time to perform.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4219) New levelDB cache storage for timeline v1.5

2015-11-06 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994141#comment-14994141
 ] 

Li Lu commented on YARN-4219:
-

Yes, this is a caching storage for ATS v1.5 design. We can prioritize any 
"up-level" storage that uses this caching storage. I'll fix the maven problem 
soon. 

> New levelDB cache storage for timeline v1.5
> ---
>
> Key: YARN-4219
> URL: https://issues.apache.org/jira/browse/YARN-4219
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4219-trunk.001.patch, YARN-4219-trunk.002.patch, 
> YARN-4219-trunk.003.patch
>
>
> We need to have an "offline" caching storage for timeline server v1.5 after 
> the changes in YARN-3942. The in memory timeline storage may run into OOM 
> issues when used as a cache storage for entity file timeline storage. We can 
> refactor the code and have a level db based caching storage for this use 
> case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4330) MiniYARNCluster prints multiple Failed to instantiate default resource calculator warning messages

2015-11-06 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4330:
---
Attachment: YARN-4330.01.patch

> MiniYARNCluster prints multiple  Failed to instantiate default resource 
> calculator warning messages
> ---
>
> Key: YARN-4330
> URL: https://issues.apache.org/jira/browse/YARN-4330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test, yarn
>Affects Versions: 2.8.0
> Environment: OSX, JUnit
>Reporter: Steve Loughran
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: YARN-4330.01.patch
>
>
> Whenever I try to start a MiniYARNCluster on Branch-2 (commit #0b61cca), I 
> see multiple stack traces warning me that a resource calculator plugin could 
> not be created
> {code}
> (ResourceCalculatorPlugin.java:getResourceCalculatorPlugin(184)) - 
> java.lang.UnsupportedOperationException: Could not determine OS: Failed to 
> instantiate default resource calculator.
> java.lang.UnsupportedOperationException: Could not determine OS
> {code}
> This is a minicluster. It doesn't need resource calculation. It certainly 
> doesn't need test logs being cluttered with even more stack traces which will 
> only generate false alarms about tests failing. 
> There needs to be a way to turn this off, and the minicluster should have it 
> that way by default.
> Being ruthless and marking as a blocker, because its a fairly major 
> regression for anyone testing with the minicluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4337) Resolve all docs errors in *.apt.vm for YARN

2015-11-06 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994137#comment-14994137
 ] 

Daniel Templeton commented on YARN-4337:


I'm looking at the WebServicesIntro.md file, and I don't see that error:

{code}
 JSON response with single resource

HTTP Request: GET 
http://rmhost.domain:8088/ws/v1/cluster/apps/application\_1324057493980\_0001

Response Status Line: HTTP/1.1 200 OK

Response Header:
{code}

Are you looking at the current version of the docs?

> Resolve all docs errors in *.apt.vm for YARN
> 
>
> Key: YARN-4337
> URL: https://issues.apache.org/jira/browse/YARN-4337
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Chen He
>Priority: Minor
>  Labels: documentation, newbie
>
> This is a newbie++ docs ticket.
> Simple example, In WebServiceInfo.apt.vm
> *** JSON response with single resource
>   HTTP Request:
>   GET 
> http://rmhost.domain:8088/ws/v1/cluster/app/application_1324057493980_0001
>   Response Status Line:
>   HTTP/1.1 200 OK
>   Response Header:
> +---+
> The URI
> http://rmhost.domain:8088/ws/v1/cluster/app/application_1324057493980_0001 is 
> invalid. It should be "apps" instead of "app" in the URI. It may mislead 
> first time users to think that YARN REST API does not work. Similarly, we 
> should remove all similar typos or minor errors in *.apt.vm file for YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server

2015-11-06 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994140#comment-14994140
 ] 

Xuan Gong commented on YARN-2556:
-

Thanks for the work.
[~sjlee0] and [~lichangleo]  Could you give us some instruction on how to run 
this performance tool ? Maybe add a document on related ats md and at least 
give us an example command to run this tool.

> Tool to measure the performance of the timeline server
> --
>
> Key: YARN-2556
> URL: https://issues.apache.org/jira/browse/YARN-2556
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Jonathan Eagles
>Assignee: Chang Li
>  Labels: BB2015-05-TBR
> Fix For: 2.8.0
>
> Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, 
> YARN-2556.1.patch, YARN-2556.10.patch, YARN-2556.11.patch, 
> YARN-2556.12.patch, YARN-2556.13.patch, YARN-2556.13.whitespacefix.patch, 
> YARN-2556.14.patch, YARN-2556.14.whitespacefix.patch, YARN-2556.15.patch, 
> YARN-2556.2.patch, YARN-2556.3.patch, YARN-2556.4.patch, YARN-2556.5.patch, 
> YARN-2556.6.patch, YARN-2556.7.patch, YARN-2556.8.patch, YARN-2556.9.patch, 
> YARN-2556.patch, yarn2556.patch, yarn2556.patch, yarn2556_wip.patch
>
>
> We need to be able to understand the capacity model for the timeline server 
> to give users the tools they need to deploy a timeline server with the 
> correct capacity.
> I propose we create a mapreduce job that can measure timeline server write 
> and read performance. Transactions per second, I/O for both read and write 
> would be a good start.
> This could be done as an example or test job that could be tied into gridmix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4337) Resolve all docs errors in *.apt.vm for YARN

2015-11-06 Thread Chen He (JIRA)
Chen He created YARN-4337:
-

 Summary: Resolve all docs errors in *.apt.vm for YARN
 Key: YARN-4337
 URL: https://issues.apache.org/jira/browse/YARN-4337
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.7.1
Reporter: Chen He
Priority: Minor


This is a newbie++ docs ticket.
Simple example, In WebServiceInfo.apt.vm

*** JSON response with single resource
  HTTP Request:
  GET http://rmhost.domain:8088/ws/v1/cluster/app/application_1324057493980_0001
  Response Status Line:
  HTTP/1.1 200 OK
  Response Header:
+---+

The URI
http://rmhost.domain:8088/ws/v1/cluster/app/application_1324057493980_0001 is 
invalid. It should be "apps" instead of "app" in the URI. It may mislead first 
time users to think that YARN REST API does not work. Similarly, we should 
remove all similar typos or minor errors in *.apt.vm file for YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4336) YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP

2015-11-06 Thread Greg Senia (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Senia updated YARN-4336:
-
Attachment: tactical_defense.patch

> YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP
> 
>
> Key: YARN-4336
> URL: https://issues.apache.org/jira/browse/YARN-4336
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0, 2.4.1, 2.6.0, 2.7.0, 2.6.1, 2.7.1
> Environment: NSS w/ SSSD or Dell/Quest - VASD
>Reporter: Greg Senia
>Assignee: Greg Senia
> Attachments: YARN-4336-tactical.txt, tactical_defense.patch
>
>
> Hi folks after performing some debug for our Unix Engineering and Active 
> Directory teams it was discovered that on YARN Container Initialization a 
> call via Hadoop Common AccessControlList.java:
>   for(String group: ugi.getGroupNames()) {
> if (groups.contains(group)) {
>   return true;
> }
>   }
> Unfortunately with the security call to check access on 
> "appattempt_X_X_X" will always return false but will make 
> unnecessary calls to NameSwitch service on linux which will call things like 
> SSSD/Quest VASD which will then initiate LDAP calls looking for non existent 
> userid's causing excessive load on LDAP.
> For now our tactical work around is as follows:
> /**
>* Checks if a user represented by the provided {@link UserGroupInformation}
>* is a member of the Access Control List
>* @param ugi UserGroupInformation to check if contained in the ACL
>* @return true if ugi is member of the list
>*/
>   public final boolean isUserInList(UserGroupInformation ugi) {
> if (allAllowed || users.contains(ugi.getShortUserName())) {
>   return true;
> } else {
> String patternString = "^appattempt_\\d+_\\d+_\\d+$";
> Pattern pattern = Pattern.compile(patternString);
> Matcher matcher = pattern.matcher(ugi.getShortUserName());
> boolean matches = matcher.matches();
> if (matches) {
>   LOG.debug("Bailing !! AppAttempt Matches DONOT call UGI FOR 
> GROUPS!!");;
>   return false;
> }
>   
>   
>   for(String group: ugi.getGroupNames()) {
> if (groups.contains(group)) {
>   return true;
> }
>   }
> }
> return false;
>   }
>   public boolean isUserAllowed(UserGroupInformation ugi) {
> return isUserInList(ugi);
>   }
> Example of VASD Debug log showing the lookups for one task attempt 32 of them:
> One task:
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> 

[jira] [Commented] (YARN-4330) MiniYARNCluster prints multiple Failed to instantiate default resource calculator warning messages

2015-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994386#comment-14994386
 ] 

Hadoop QA commented on YARN-4330:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 5s 
{color} | {color:blue} docker + precommit patch detected. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 17s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
trunk has 3 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s 
{color} | {color:green} trunk passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s 
{color} | {color:green} trunk passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
7s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 23s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 29, now 29). {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s 
{color} | {color:green} the patch passed with JDK v1.8.0_60 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 46s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_60. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 23s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_60. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 5m 8s {color} | 
{color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_60. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 2s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_79. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 51s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_79. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 5m 10s {color} 
| {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.7.0_79. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | 

[jira] [Updated] (YARN-3452) Bogus token usernames cause many invalid group lookups

2015-11-06 Thread Greg Senia (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Senia updated YARN-3452:
-
Attachment: tactical_defense.patch

> Bogus token usernames cause many invalid group lookups
> --
>
> Key: YARN-3452
> URL: https://issues.apache.org/jira/browse/YARN-3452
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Reporter: Jason Lowe
> Attachments: tactical_defense.patch
>
>
> YARN uses a number of bogus usernames for tokens, like application attempt 
> IDs for NM tokens or even the hardcoded "testing" for the container localizer 
> token.  These tokens cause the RPC layer to do group lookups on these bogus 
> usernames which will never succeed but can take a long time to perform.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4336) YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP

2015-11-06 Thread Greg Senia (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Senia updated YARN-4336:
-
Attachment: (was: YARN-4336-tactical.txt)

> YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP
> 
>
> Key: YARN-4336
> URL: https://issues.apache.org/jira/browse/YARN-4336
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0, 2.4.1, 2.6.0, 2.7.0, 2.6.1, 2.7.1
> Environment: NSS w/ SSSD or Dell/Quest - VASD
>Reporter: Greg Senia
>Assignee: Greg Senia
> Attachments: tactical_defense.patch
>
>
> Hi folks after performing some debug for our Unix Engineering and Active 
> Directory teams it was discovered that on YARN Container Initialization a 
> call via Hadoop Common AccessControlList.java:
>   for(String group: ugi.getGroupNames()) {
> if (groups.contains(group)) {
>   return true;
> }
>   }
> Unfortunately with the security call to check access on 
> "appattempt_X_X_X" will always return false but will make 
> unnecessary calls to NameSwitch service on linux which will call things like 
> SSSD/Quest VASD which will then initiate LDAP calls looking for non existent 
> userid's causing excessive load on LDAP.
> For now our tactical work around is as follows:
> /**
>* Checks if a user represented by the provided {@link UserGroupInformation}
>* is a member of the Access Control List
>* @param ugi UserGroupInformation to check if contained in the ACL
>* @return true if ugi is member of the list
>*/
>   public final boolean isUserInList(UserGroupInformation ugi) {
> if (allAllowed || users.contains(ugi.getShortUserName())) {
>   return true;
> } else {
> String patternString = "^appattempt_\\d+_\\d+_\\d+$";
> Pattern pattern = Pattern.compile(patternString);
> Matcher matcher = pattern.matcher(ugi.getShortUserName());
> boolean matches = matcher.matches();
> if (matches) {
>   LOG.debug("Bailing !! AppAttempt Matches DONOT call UGI FOR 
> GROUPS!!");;
>   return false;
> }
>   
>   
>   for(String group: ugi.getGroupNames()) {
> if (groups.contains(group)) {
>   return true;
> }
>   }
> }
> return false;
>   }
>   public boolean isUserAllowed(UserGroupInformation ugi) {
> return isUserInList(ugi);
>   }
> Example of VASD Debug log showing the lookups for one task attempt 32 of them:
> One task:
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> 

[jira] [Updated] (YARN-4241) Typo in yarn-default.xml

2015-11-06 Thread Anthony Rojas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Rojas updated YARN-4241:

Attachment: YARN-4241.002.patch

> Typo in yarn-default.xml
> 
>
> Key: YARN-4241
> URL: https://issues.apache.org/jira/browse/YARN-4241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, yarn
>Reporter: Anthony Rojas
>Assignee: Anthony Rojas
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-4241.002.patch, YARN-4241.patch, YARN-4241.patch.1
>
>
> Typo in description section of yarn-default.xml, under the properties:
> yarn.nodemanager.disk-health-checker.min-healthy-disks
> yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
> yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb
> The reference to yarn-nodemanager.local-dirs should be 
> yarn.nodemanager.local-dirs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4336) YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP

2015-11-06 Thread Greg Senia (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Senia updated YARN-4336:
-
Affects Version/s: (was: 2.4.1)
   (was: 2.4.0)

> YARN NodeManager - Container Initialization - Excessive load on NSS/LDAP
> 
>
> Key: YARN-4336
> URL: https://issues.apache.org/jira/browse/YARN-4336
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.0, 2.6.1, 2.7.1
> Environment: NSS w/ SSSD or Dell/Quest - VASD
>Reporter: Greg Senia
>Assignee: Greg Senia
> Attachments: tactical_defense.patch
>
>
> Hi folks after performing some debug for our Unix Engineering and Active 
> Directory teams it was discovered that on YARN Container Initialization a 
> call via Hadoop Common AccessControlList.java:
>   for(String group: ugi.getGroupNames()) {
> if (groups.contains(group)) {
>   return true;
> }
>   }
> Unfortunately with the security call to check access on 
> "appattempt_X_X_X" will always return false but will make 
> unnecessary calls to NameSwitch service on linux which will call things like 
> SSSD/Quest VASD which will then initiate LDAP calls looking for non existent 
> userid's causing excessive load on LDAP.
> For now our tactical work around is as follows:
> /**
>* Checks if a user represented by the provided {@link UserGroupInformation}
>* is a member of the Access Control List
>* @param ugi UserGroupInformation to check if contained in the ACL
>* @return true if ugi is member of the list
>*/
>   public final boolean isUserInList(UserGroupInformation ugi) {
> if (allAllowed || users.contains(ugi.getShortUserName())) {
>   return true;
> } else {
> String patternString = "^appattempt_\\d+_\\d+_\\d+$";
> Pattern pattern = Pattern.compile(patternString);
> Matcher matcher = pattern.matcher(ugi.getShortUserName());
> boolean matches = matcher.matches();
> if (matches) {
>   LOG.debug("Bailing !! AppAttempt Matches DONOT call UGI FOR 
> GROUPS!!");;
>   return false;
> }
>   
>   
>   for(String group: ugi.getGroupNames()) {
> if (groups.contains(group)) {
>   return true;
> }
>   }
> }
> return false;
>   }
>   public boolean isUserAllowed(UserGroupInformation ugi) {
> return isUserInList(ugi);
>   }
> Example of VASD Debug log showing the lookups for one task attempt 32 of them:
> One task:
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:55:43 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:56:15 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> filter=<(&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))>,
>  base=<>, scope=
> Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:56:45 xhadoopm5d vasd[20741]: _vasug_user_namesearch_gc: searching 
> GC for host service domain EXNSD.EXA.EXAMPLE.COM with filter 
> (&(objectCategory=Person)(samaccountname=appattempt_1446145939879_0022_01))
> Oct 30 22:56:45 xhadoopm5d vasd[20741]: libvas_attrs_find_uri: Searching 
>  with 
> 

[jira] [Comment Edited] (YARN-2556) Tool to measure the performance of the timeline server

2015-11-06 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994140#comment-14994140
 ] 

Xuan Gong edited comment on YARN-2556 at 11/6/15 9:57 PM:
--

Thanks for the work.
[~sjlee0] and [~lichangleo]  Could you give us some instruction on how to run 
this performance tool ? Maybe add a document on related ats docs and at least 
give us an example command to run this tool.


was (Author: xgong):
Thanks for the work.
[~sjlee0] and [~lichangleo]  Could you give us some instruction on how to run 
this performance tool ? Maybe add a document on related ats md and at least 
give us an example command to run this tool.

> Tool to measure the performance of the timeline server
> --
>
> Key: YARN-2556
> URL: https://issues.apache.org/jira/browse/YARN-2556
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Jonathan Eagles
>Assignee: Chang Li
>  Labels: BB2015-05-TBR
> Fix For: 2.8.0
>
> Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, 
> YARN-2556.1.patch, YARN-2556.10.patch, YARN-2556.11.patch, 
> YARN-2556.12.patch, YARN-2556.13.patch, YARN-2556.13.whitespacefix.patch, 
> YARN-2556.14.patch, YARN-2556.14.whitespacefix.patch, YARN-2556.15.patch, 
> YARN-2556.2.patch, YARN-2556.3.patch, YARN-2556.4.patch, YARN-2556.5.patch, 
> YARN-2556.6.patch, YARN-2556.7.patch, YARN-2556.8.patch, YARN-2556.9.patch, 
> YARN-2556.patch, yarn2556.patch, yarn2556.patch, yarn2556_wip.patch
>
>
> We need to be able to understand the capacity model for the timeline server 
> to give users the tools they need to deploy a timeline server with the 
> correct capacity.
> I propose we create a mapreduce job that can measure timeline server write 
> and read performance. Transactions per second, I/O for both read and write 
> would be a good start.
> This could be done as an example or test job that could be tied into gridmix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)