[jira] [Commented] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size

2018-11-07 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679303#comment-16679303
 ] 

Bibin A Chundatt commented on YARN-8972:


[~giovanni.fumarola]

ZK state store  was validating the applicationStateData and 
ApplicationSubmissionContext is size check is applicable for other store 
implementations too , so i think we shouldn't reuse the property.

Thoughts??


> [Router] Add support to prevent DoS attack over ApplicationSubmissionContext 
> size
> -
>
> Key: YARN-8972
> URL: https://issues.apache.org/jira/browse/YARN-8972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8972.v1.patch, YARN-8972.v2.patch, 
> YARN-8972.v3.patch, YARN-8972.v4.patch
>
>
> This jira tracks the effort to add a new interceptor in the Router to prevent 
> user to submit applications with oversized ASC.
> This avoid YARN cluster to failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8925) Updating distributed node attributes only when necessary

2018-11-07 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679371#comment-16679371
 ] 

Tao Yang edited comment on YARN-8925 at 11/8/18 7:23 AM:
-

Thanks [~cheersyang] for the review and comments.
{quote}NodeLabelUtil#isNodeAttributesEquals
 if leftNodeAttributes is a subset of rightNodeAttributes seems also equals.
 And except for the name and value, we also need to compare prefix right?
 It would be good if we have a separate UT for this method, to verify various 
of cases.
{quote}
Comparison between two node attributes sets has considered set size through 
this clause: {{leftNodeAttributes.size() != rightNodeAttributes.size())}} , and 
considered attribute prefix in comparison inside for NodeAttributeKey 
(reference code is {{nodeAttributes.stream().anyMatch(e -> 
e.equals(checkNodeAttribute)}} in NodeLabelUtil#isNodeAttributeIncludes). 
 Separate UT for this method makes sense to me, I will add it in next patch.
{quote}HeartbeatSyncIfNeededHandler
 Can we rename this to CachedNodeDescriptorHandler? As this class caches the 
last value of node label/attribute and leverages the cache to reduce the 
overhead.
{quote}
Agree, CachedNodeDescriptorHandler is a better name.
{quote}TestResourceTrackerService#testNodeRegistrationWithAttributes
 File tempDir = File.createTempFile("nattr", ".tmp");
 can we put tmp dir under TEMP_DIR that to be consistent with rest of tests.
{quote}
Make sense to me, I copied this code from testNodeHeartbeatWithNodeAttributes 
and will update this method too.
{quote}TestNodeStatusUpdaterForAttributes
 waitTillHeartbeat/waitTillHeartbeat
 can these methods be simplified with GenericTestUtils.waitFor?
{quote}
Make sense to me.

I will upload a new patch a few hours later. 


was (Author: tao yang):
Thanks [~cheersyang] for the review and comments.
{quote}NodeLabelUtil#isNodeAttributesEquals
 if leftNodeAttributes is a subset of rightNodeAttributes seems also equals.
 And except for the name and value, we also need to compare prefix right?
 It would be good if we have a separate UT for this method, to verify various 
of cases.
{quote}
Comparison between two node attributes sets has considered set size through 
this clause: {{leftNodeAttributes.size() != rightNodeAttributes.size())}} , and 
considered attribute name in comparison for NodeAttribute (reference code is 
{{nodeAttributes.stream().anyMatch(e -> e.equals(checkNodeAttribute)}} in 
NodeLabelUtil#isNodeAttributeIncludes). 
 Separate UT for this method makes sense to me, I will add it in next patch.
{quote}HeartbeatSyncIfNeededHandler
 Can we rename this to CachedNodeDescriptorHandler? As this class caches the 
last value of node label/attribute and leverages the cache to reduce the 
overhead.
{quote}
Agree, CachedNodeDescriptorHandler is a better name.
{quote}TestResourceTrackerService#testNodeRegistrationWithAttributes
 File tempDir = File.createTempFile("nattr", ".tmp");
 can we put tmp dir under TEMP_DIR that to be consistent with rest of tests.
{quote}
Make sense to me, I copied this code from testNodeHeartbeatWithNodeAttributes 
and will update this method too.
{quote}TestNodeStatusUpdaterForAttributes
 waitTillHeartbeat/waitTillHeartbeat
 can these methods be simplified with GenericTestUtils.waitFor?
{quote}
Make sense to me.

I will upload a new patch a few hours later. 

> Updating distributed node attributes only when necessary
> 
>
> Key: YARN-8925
> URL: https://issues.apache.org/jira/browse/YARN-8925
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.2.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: performance
> Attachments: YARN-8925.001.patch, YARN-8925.002.patch, 
> YARN-8925.003.patch
>
>
> Currently if distributed node attributes exist, even though there is no 
> change, updating for distributed node attributes will happen in every 
> heartbeat between NM and RM. Updating process will hold 
> NodeAttributesManagerImpl#writeLock and may have some influence in a large 
> cluster. We have found nodes UI of a large cluster is opened slowly and most 
> time it's waiting for the lock in NodeAttributesManagerImpl. I think this 
> updating should be called only when necessary to enhance the performance of 
> related process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8880) Add configurations for pluggable plugin framework

2018-11-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679292#comment-16679292
 ] 

Hudson commented on YARN-8880:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15385 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15385/])
YARN-8880. Add configurations for pluggable plugin framework. (wwei: rev 
f8c72d7b3acca8285bbc3024f491c4586805be1e)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/ResourcePluginManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/TestResourcePluginManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


> Add configurations for pluggable plugin framework
> -
>
> Key: YARN-8880
> URL: https://issues.apache.org/jira/browse/YARN-8880
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-8880-trunk.001.patch, YARN-8880-trunk.002.patch, 
> YARN-8880-trunk.003.patch, YARN-8880-trunk.004.patch, 
> YARN-8880-trunk.005.patch
>
>
> Added two configurations for the pluggable device framework.
> {code:java}
> 
>  yarn.nodemanager.pluggable-device-framework.enabled
>  true/false
>  
>  
>  yarn.nodemanager.pluggable-device-framework.device-classes
>  com.cmp1.hdw1,...
>  {code}
> The admin needs to know the register resource name of every plugin classes 
> configured. And declare them in resource-types.xml.
> Please note that the count value defined in node-resource.xml will be 
> overridden by plugin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null

2018-11-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679318#comment-16679318
 ] 

Hadoop QA commented on YARN-8233:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.1 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
56s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
35s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
57s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
14s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
18s{color} | {color:green} branch-3.1 passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 16m 
40s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} branch-3.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 11m 
19s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
22s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  1m 27s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 93m 40s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:080e9d0 |
| JIRA Issue | YARN-8233 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12947343/YARN-8233.001-branch-3.1-test.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  xml  findbugs  

[jira] [Updated] (YARN-8880) Add configurations for pluggable plugin framework

2018-11-07 Thread Zhankun Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-8880:
---
Attachment: YARN-8880-trunk.005.patch

> Add configurations for pluggable plugin framework
> -
>
> Key: YARN-8880
> URL: https://issues.apache.org/jira/browse/YARN-8880
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8880-trunk.001.patch, YARN-8880-trunk.002.patch, 
> YARN-8880-trunk.003.patch, YARN-8880-trunk.004.patch, 
> YARN-8880-trunk.005.patch
>
>
> Added two configurations for the pluggable device framework.
> {code:java}
> 
>  yarn.nodemanager.pluggable-device-framework.enabled
>  true/false
>  
>  
>  yarn.nodemanager.pluggable-device-framework.device-classes
>  com.cmp1.hdw1,...
>  {code}
> The admin needs to know the register resource name of every plugin classes 
> configured. And declare them in resource-types.xml.
> Please note that the count value defined in node-resource.xml will be 
> overridden by plugin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8985) FSParentQueue: debug log missing when assigning container

2018-11-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679348#comment-16679348
 ] 

Hadoop QA commented on YARN-8985:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 12m 
21s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 12m 
12s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
22s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  1m 36s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 53m 32s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSSchedulerNode)
 does not release lock on all exception paths  At FSParentQueue.java:on all 
exception paths  At FSParentQueue.java:[line 214] |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-8985 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12947346/YARN-8985.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 543364109e4e 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f8c72d7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | 

[jira] [Commented] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null

2018-11-07 Thread Akira Ajisaka (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679159#comment-16679159
 ] 

Akira Ajisaka commented on YARN-8233:
-

Thanks [~Tao Yang] for the additional patches. They look good to me.

bq. anything wrong on branch-3.1?
Kicked precommit job again manually 
https://builds.apache.org/job/PreCommit-YARN-Build/22453/

> NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal 
> whose allocatedOrReservedContainer is null
> -
>
> Key: YARN-8233
> URL: https://issues.apache.org/jira/browse/YARN-8233
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Fix For: 3.3.0, 3.2.1
>
> Attachments: YARN-8233.001.branch-2.patch, 
> YARN-8233.001.branch-3.0.patch, YARN-8233.001.branch-3.1.patch, 
> YARN-8233.001.patch, YARN-8233.002.patch, YARN-8233.003.patch
>
>
> Recently we saw a NPE problem in CapacityScheduler#tryCommit when try to find 
> the attemptId by calling {{c.getAllocatedOrReservedContainer().get...}} from 
> an allocate/reserve proposal. But got null allocatedOrReservedContainer and 
> thrown NPE.
> Reference code:
> {code:java}
> // find the application to accept and apply the ResourceCommitRequest
> if (request.anythingAllocatedOrReserved()) {
>   ContainerAllocationProposal c =
>   request.getFirstAllocatedOrReservedContainer();
>   attemptId =
>   c.getAllocatedOrReservedContainer().getSchedulerApplicationAttempt()
>   .getApplicationAttemptId();   //NPE happens here
> } else { ...
> {code}
> The proposal was constructed in 
> {{CapacityScheduler#createResourceCommitRequest}} and 
> allocatedOrReservedContainer is possibly null in async-scheduling process 
> when node was lost or application was finished (details in 
> {{CapacityScheduler#getSchedulerContainer}}).
> Reference code:
> {code:java}
>   // Allocated something
>   List allocations =
>   csAssignment.getAssignmentInformation().getAllocationDetails();
>   if (!allocations.isEmpty()) {
> RMContainer rmContainer = allocations.get(0).rmContainer;
> allocated = new ContainerAllocationProposal<>(
> getSchedulerContainer(rmContainer, true),   //possibly null
> getSchedulerContainersToRelease(csAssignment),
> 
> getSchedulerContainer(csAssignment.getFulfilledReservedContainer(),
> false), csAssignment.getType(),
> csAssignment.getRequestLocalityType(),
> csAssignment.getSchedulingMode() != null ?
> csAssignment.getSchedulingMode() :
> SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY,
> csAssignment.getResource());
>   }
> {code}
> I think we should add null check for allocateOrReserveContainer before create 
> allocate/reserve proposals. Besides the allocation process has increase 
> unconfirmed resource of app when creating an allocate assignment, so if this 
> check is null, we should decrease the unconfirmed resource of live app.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size

2018-11-07 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679064#comment-16679064
 ] 

Íñigo Goiri commented on YARN-8972:
---

{quote}
No, there is no need to add options in yarn-default.
{quote}
{{TestYarnConfigurationFields}} disagrees.

> [Router] Add support to prevent DoS attack over ApplicationSubmissionContext 
> size
> -
>
> Key: YARN-8972
> URL: https://issues.apache.org/jira/browse/YARN-8972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8972.v1.patch, YARN-8972.v2.patch, 
> YARN-8972.v3.patch, YARN-8972.v4.patch
>
>
> This jira tracks the effort to add a new interceptor in the Router to prevent 
> user to submit applications with oversized ASC.
> This avoid YARN cluster to failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size

2018-11-07 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679068#comment-16679068
 ] 

Giovanni Matteo Fumarola edited comment on YARN-8972 at 11/7/18 11:30 PM:
--

Thanks [~elgoiri] for the comment.
There is no need to add options in yarn-default since I will reuse a 
configuration. The change was already done in  [^YARN-8972.v4.patch] .


was (Author: giovanni.fumarola):
Thanks [~elgoiri] for the comment.
There is no need to add options in yarn-default since I will reuse a 
configuration.

> [Router] Add support to prevent DoS attack over ApplicationSubmissionContext 
> size
> -
>
> Key: YARN-8972
> URL: https://issues.apache.org/jira/browse/YARN-8972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8972.v1.patch, YARN-8972.v2.patch, 
> YARN-8972.v3.patch, YARN-8972.v4.patch
>
>
> This jira tracks the effort to add a new interceptor in the Router to prevent 
> user to submit applications with oversized ASC.
> This avoid YARN cluster to failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size

2018-11-07 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679068#comment-16679068
 ] 

Giovanni Matteo Fumarola commented on YARN-8972:


Thanks [~elgoiri] for the comment.
There is no need to add options in yarn-default since I will reuse a 
configuration.

> [Router] Add support to prevent DoS attack over ApplicationSubmissionContext 
> size
> -
>
> Key: YARN-8972
> URL: https://issues.apache.org/jira/browse/YARN-8972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8972.v1.patch, YARN-8972.v2.patch, 
> YARN-8972.v3.patch, YARN-8972.v4.patch
>
>
> This jira tracks the effort to add a new interceptor in the Router to prevent 
> user to submit applications with oversized ASC.
> This avoid YARN cluster to failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8933) [AMRMProxy] Fix potential empty AvailableResource and NumClusterNode in allocation response

2018-11-07 Thread Botong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8933:
---
Attachment: YARN-8933.v3.patch

> [AMRMProxy] Fix potential empty AvailableResource and NumClusterNode in 
> allocation response
> ---
>
> Key: YARN-8933
> URL: https://issues.apache.org/jira/browse/YARN-8933
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy, federation
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8933.v1.patch, YARN-8933.v2.patch, 
> YARN-8933.v3.patch
>
>
> After YARN-8696, the allocate response by FederationInterceptor is merged 
> from the responses from a random subset of all sub-clusters, depending on the 
> async heartbeat timing. As a result, cluster-wide information fields in the 
> response, e.g. AvailableResources and NumClusterNodes, are not consistent at 
> all. It can even be null/zero because the specific response is merged from an 
> empty set of sub-cluster responses. 
> In this patch, we let FederationInterceptor remember the last allocate 
> response from all known sub-clusters, and always construct the cluster-wide 
> info fields from all of them. We also moved sub-cluster timeout from 
> LocalityMulticastAMRMProxyPolicy to FederationInterceptor, so that 
> sub-clusters that expired (haven't had a successful allocate response for a 
> while) won't be included in the computation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size

2018-11-07 Thread Giovanni Matteo Fumarola (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8972:
---
Attachment: YARN-8972.v4.patch

> [Router] Add support to prevent DoS attack over ApplicationSubmissionContext 
> size
> -
>
> Key: YARN-8972
> URL: https://issues.apache.org/jira/browse/YARN-8972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8972.v1.patch, YARN-8972.v2.patch, 
> YARN-8972.v3.patch, YARN-8972.v4.patch
>
>
> This jira tracks the effort to add a new interceptor in the Router to prevent 
> user to submit applications with oversized ASC.
> This avoid YARN cluster to failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8933) [AMRMProxy] Fix potential empty fields in allocation response, move SubClusterTimeout to FederationInterceptor

2018-11-07 Thread Botong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8933:
---
Summary: [AMRMProxy] Fix potential empty fields in allocation response, 
move SubClusterTimeout to FederationInterceptor  (was: [AMRMProxy] Fix 
potential empty AvailableResource and NumClusterNode in allocation response)

> [AMRMProxy] Fix potential empty fields in allocation response, move 
> SubClusterTimeout to FederationInterceptor
> --
>
> Key: YARN-8933
> URL: https://issues.apache.org/jira/browse/YARN-8933
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy, federation
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8933.v1.patch, YARN-8933.v2.patch, 
> YARN-8933.v3.patch
>
>
> After YARN-8696, the allocate response by FederationInterceptor is merged 
> from the responses from a random subset of all sub-clusters, depending on the 
> async heartbeat timing. As a result, cluster-wide information fields in the 
> response, e.g. AvailableResources and NumClusterNodes, are not consistent at 
> all. It can even be null/zero because the specific response is merged from an 
> empty set of sub-cluster responses. 
> In this patch, we let FederationInterceptor remember the last allocate 
> response from all known sub-clusters, and always construct the cluster-wide 
> info fields from all of them. We also moved sub-cluster timeout from 
> LocalityMulticastAMRMProxyPolicy to FederationInterceptor, so that 
> sub-clusters that expired (haven't had a successful allocate response for a 
> while) won't be included in the computation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8983) YARN container with docker: hostname entry not in /etc/hosts

2018-11-07 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678924#comment-16678924
 ] 

Eric Yang commented on YARN-8983:
-

[~oliverhuh...@gmail.com] I don't know any reason that might cause the entry to 
be removed from YARN point of view.  When docker is started with --net=host, 
the host /etc/hosts file is used.  Is this a probable cause for the missing 
entry?

> YARN container with docker: hostname entry not in /etc/hosts
> 
>
> Key: YARN-8983
> URL: https://issues.apache.org/jira/browse/YARN-8983
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.1
>Reporter: Keqiu Hu
>Priority: Critical
>
> I'm experimenting to use Hadoop 2.9.1 to launch applications with docker 
> containers. Inside the container task, we try to get the hostname of the 
> container using
> {code:java}
> InetAddress.getLocalHost().getHostName(){code}
> This works fine with LXC, however it throws the following exception when I 
> enable docker container using: 
> {code:java}
> YARN_CONTAINER_RUNTIME_TYPE=docker 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=test4
> {code}
> The exception:
>  
> {noformat}
> java.net.UnknownHostException: ctr-1541488751855-0023-01-03: 
> ctr-1541488751855-0023-01-03: Temporary failure in name resolution at 
> java.net.InetAddress.getLocalHost(InetAddress.java:1506)
>  at 
> com.linkedin.tony.TaskExecutor.registerAndGetClusterSpec(TaskExecutor.java:204)
>  
> at com.linkedin.tony.TaskExecutor.main(TaskExecutor.java:109) Caused by: 
> java.net.UnknownHostException: ctr-1541488751855-0023-01-03: Temporary 
> failure in name resolution at 
> java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) 
> at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) 
> at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at 
> java.net.InetAddress.getLocalHost(InetAddress.java:1501) ... 2 more
> {noformat}
>  
> Did some research online, it seems to be related to missing entry in 
> /etc/hosts on the hostname. So I took a look at the /etc/hosts, it is missing 
> the entry : 
> {noformat}
> pi@pi-aw:~/docker/$ docker ps
> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
> 71e3e9df8bc6 test4 "/entrypoint.sh bash..." 1 second ago Up Less than a 
> second container_1541488751855_0028_01_01
> 29d31f0327d1 test3 "/entrypoint.sh bash" 18 hours ago Up 18 hours 
> blissful_turing
> pi@pi-aw:~/docker/$ de 71e3e9df8bc6
> groups: cannot find name for group ID 1000
> groups: cannot find name for group ID 116
> groups: cannot find name for group ID 126
> To run a command as administrator (user "root"), use "sudo ".
> See "man sudo_root" for details.
> pi@ctr-1541488751855-0028-01-01:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_01$
>  cat /etc/hosts
> 127.0.0.1 localhost
> 192.168.0.14 pi-aw
> # The following lines are desirable for IPv6 capable hosts
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> pi@ctr-1541488751855-0028-01-01:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_01$
> {noformat}
> If I launch the image without YARN, I saw the entry in /etc/hosts:
> {noformat}
> pi@61f173f95631:~$ cat /etc/hosts
> 127.0.0.1 localhost
> ::1 localhost ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> 172.17.0.3 61f173f95631 {noformat}
> Here is my container-executor.cfg
> {code:java}
>  1 min.user.id=100
>  2 yarn.nodemanager.linux-container-executor.group=hadoop
>  3 [docker]
>  4 module.enabled=true
>  5 docker.binary=/usr/bin/docker
>  6 
> docker.allowed.capabilities=SYS_CHROOT,MKNOD,SETFCAP,SETPCAP,FSETID,CHOWN,AUDIT_WRITE,SETGID,NET_RAW,FOWNER,SETUID,DAC_OVERRIDE,KILL,NET_BIND_SERVICE
>  7 docker.allowed.networks=bridge,host,none
>  8 
> docker.allowed.rw-mounts=/tmp,/etc/hadoop/logs/,/private/etc/hadoop-2.9.1/logs/{code}
>  Since I'm using an older version of Hadoop 2.9.1, let me know if this is 
> something already fixed in later version :) 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8983) YARN container with docker: hostname entry not in /etc/hosts

2018-11-07 Thread Keqiu Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678892#comment-16678892
 ] 

Keqiu Hu commented on YARN-8983:


Thanks guys for the replies!

[~tangzhankun] That also misses a line like this:

 
{code:java}
172.17.0.3 61f173f95631 {code}
Which is a mapping between ip address and hostname (from /etc/hostname). I 
guess that is used by the Java networking's 
InetAddress.getLocalHost()
API to get the local host name & ip address.

[~eyang], yah, it is better to piggyback on RegistryDNS resolution for 
hostnames. However, as mentioned by you, it is only available post 3.x :(, 
which we can't upgrade to in short term. I'll check the Docker overlay network.

I'm still curious why do we remove that [IP HOSTNAME] line from */etc/hosts* ? 
Is that intentional, cause by default if you launch a docker container, it is 
there.

 

 

> YARN container with docker: hostname entry not in /etc/hosts
> 
>
> Key: YARN-8983
> URL: https://issues.apache.org/jira/browse/YARN-8983
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.1
>Reporter: Keqiu Hu
>Priority: Critical
>
> I'm experimenting to use Hadoop 2.9.1 to launch applications with docker 
> containers. Inside the container task, we try to get the hostname of the 
> container using
> {code:java}
> InetAddress.getLocalHost().getHostName(){code}
> This works fine with LXC, however it throws the following exception when I 
> enable docker container using: 
> {code:java}
> YARN_CONTAINER_RUNTIME_TYPE=docker 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=test4
> {code}
> The exception:
>  
> {noformat}
> java.net.UnknownHostException: ctr-1541488751855-0023-01-03: 
> ctr-1541488751855-0023-01-03: Temporary failure in name resolution at 
> java.net.InetAddress.getLocalHost(InetAddress.java:1506)
>  at 
> com.linkedin.tony.TaskExecutor.registerAndGetClusterSpec(TaskExecutor.java:204)
>  
> at com.linkedin.tony.TaskExecutor.main(TaskExecutor.java:109) Caused by: 
> java.net.UnknownHostException: ctr-1541488751855-0023-01-03: Temporary 
> failure in name resolution at 
> java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) 
> at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) 
> at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at 
> java.net.InetAddress.getLocalHost(InetAddress.java:1501) ... 2 more
> {noformat}
>  
> Did some research online, it seems to be related to missing entry in 
> /etc/hosts on the hostname. So I took a look at the /etc/hosts, it is missing 
> the entry : 
> {noformat}
> pi@pi-aw:~/docker/$ docker ps
> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
> 71e3e9df8bc6 test4 "/entrypoint.sh bash..." 1 second ago Up Less than a 
> second container_1541488751855_0028_01_01
> 29d31f0327d1 test3 "/entrypoint.sh bash" 18 hours ago Up 18 hours 
> blissful_turing
> pi@pi-aw:~/docker/$ de 71e3e9df8bc6
> groups: cannot find name for group ID 1000
> groups: cannot find name for group ID 116
> groups: cannot find name for group ID 126
> To run a command as administrator (user "root"), use "sudo ".
> See "man sudo_root" for details.
> pi@ctr-1541488751855-0028-01-01:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_01$
>  cat /etc/hosts
> 127.0.0.1 localhost
> 192.168.0.14 pi-aw
> # The following lines are desirable for IPv6 capable hosts
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> pi@ctr-1541488751855-0028-01-01:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_01$
> {noformat}
> If I launch the image without YARN, I saw the entry in /etc/hosts:
> {noformat}
> pi@61f173f95631:~$ cat /etc/hosts
> 127.0.0.1 localhost
> ::1 localhost ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> 172.17.0.3 61f173f95631 {noformat}
> Here is my container-executor.cfg
> {code:java}
>  1 min.user.id=100
>  2 yarn.nodemanager.linux-container-executor.group=hadoop
>  3 [docker]
>  4 module.enabled=true
>  5 docker.binary=/usr/bin/docker
>  6 
> docker.allowed.capabilities=SYS_CHROOT,MKNOD,SETFCAP,SETPCAP,FSETID,CHOWN,AUDIT_WRITE,SETGID,NET_RAW,FOWNER,SETUID,DAC_OVERRIDE,KILL,NET_BIND_SERVICE
>  7 docker.allowed.networks=bridge,host,none
>  8 
> docker.allowed.rw-mounts=/tmp,/etc/hadoop/logs/,/private/etc/hadoop-2.9.1/logs/{code}
>  Since I'm using an older version of Hadoop 2.9.1, let me know if this is 
> something already fixed in later version :) 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (YARN-8983) YARN container with docker: hostname entry not in /etc/hosts

2018-11-07 Thread Keqiu Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678892#comment-16678892
 ] 

Keqiu Hu edited comment on YARN-8983 at 11/7/18 10:33 PM:
--

Thanks guys for the replies!

[~tangzhankun] That also misses a line like this:

 
{code:java}
172.17.0.3 61f173f95631 {code}
Which is a mapping between ip address and hostname (from /etc/hostname). I 
guess that is used by the Java networking's 
 InetAddress.getLocalHost()
 API to get the local host name & ip address.

[~eyang], yah, it is better to piggyback on RegistryDNS resolution for 
hostnames. However, as mentioned by you, it is only available post 3.x :(, 
which we can't upgrade to in short term. I'll check the Docker overlay network.

I'm still curious why do we remove that [IP HOSTNAME] line from */etc/hosts,* 
is that intentional? Cause by default if you launch a docker container, it is 
there.

 

 


was (Author: oliverhuh...@gmail.com):
Thanks guys for the replies!

[~tangzhankun] That also misses a line like this:

 
{code:java}
172.17.0.3 61f173f95631 {code}
Which is a mapping between ip address and hostname (from /etc/hostname). I 
guess that is used by the Java networking's 
InetAddress.getLocalHost()
API to get the local host name & ip address.

[~eyang], yah, it is better to piggyback on RegistryDNS resolution for 
hostnames. However, as mentioned by you, it is only available post 3.x :(, 
which we can't upgrade to in short term. I'll check the Docker overlay network.

I'm still curious why do we remove that [IP HOSTNAME] line from */etc/hosts* ? 
Is that intentional, cause by default if you launch a docker container, it is 
there.

 

 

> YARN container with docker: hostname entry not in /etc/hosts
> 
>
> Key: YARN-8983
> URL: https://issues.apache.org/jira/browse/YARN-8983
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.1
>Reporter: Keqiu Hu
>Priority: Critical
>
> I'm experimenting to use Hadoop 2.9.1 to launch applications with docker 
> containers. Inside the container task, we try to get the hostname of the 
> container using
> {code:java}
> InetAddress.getLocalHost().getHostName(){code}
> This works fine with LXC, however it throws the following exception when I 
> enable docker container using: 
> {code:java}
> YARN_CONTAINER_RUNTIME_TYPE=docker 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=test4
> {code}
> The exception:
>  
> {noformat}
> java.net.UnknownHostException: ctr-1541488751855-0023-01-03: 
> ctr-1541488751855-0023-01-03: Temporary failure in name resolution at 
> java.net.InetAddress.getLocalHost(InetAddress.java:1506)
>  at 
> com.linkedin.tony.TaskExecutor.registerAndGetClusterSpec(TaskExecutor.java:204)
>  
> at com.linkedin.tony.TaskExecutor.main(TaskExecutor.java:109) Caused by: 
> java.net.UnknownHostException: ctr-1541488751855-0023-01-03: Temporary 
> failure in name resolution at 
> java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) 
> at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) 
> at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at 
> java.net.InetAddress.getLocalHost(InetAddress.java:1501) ... 2 more
> {noformat}
>  
> Did some research online, it seems to be related to missing entry in 
> /etc/hosts on the hostname. So I took a look at the /etc/hosts, it is missing 
> the entry : 
> {noformat}
> pi@pi-aw:~/docker/$ docker ps
> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
> 71e3e9df8bc6 test4 "/entrypoint.sh bash..." 1 second ago Up Less than a 
> second container_1541488751855_0028_01_01
> 29d31f0327d1 test3 "/entrypoint.sh bash" 18 hours ago Up 18 hours 
> blissful_turing
> pi@pi-aw:~/docker/$ de 71e3e9df8bc6
> groups: cannot find name for group ID 1000
> groups: cannot find name for group ID 116
> groups: cannot find name for group ID 126
> To run a command as administrator (user "root"), use "sudo ".
> See "man sudo_root" for details.
> pi@ctr-1541488751855-0028-01-01:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_01$
>  cat /etc/hosts
> 127.0.0.1 localhost
> 192.168.0.14 pi-aw
> # The following lines are desirable for IPv6 capable hosts
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> pi@ctr-1541488751855-0028-01-01:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_01$
> {noformat}
> If I launch the image without YARN, I saw the entry in /etc/hosts:
> {noformat}
> pi@61f173f95631:~$ cat /etc/hosts
> 127.0.0.1 localhost
> ::1 localhost ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 

[jira] [Commented] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size

2018-11-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678704#comment-16678704
 ] 

Hadoop QA commented on YARN-8972:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 28s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 17s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 48s{color} 
| {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
45s{color} | {color:green} hadoop-yarn-server-router in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 84m 15s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.conf.TestYarnConfigurationFields |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-8972 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12947264/YARN-8972.v3.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a5ce5b07ed37 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c96cbe8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| unit | 

[jira] [Commented] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty

2018-11-07 Thread Konstantinos Karanasos (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678649#comment-16678649
 ] 

Konstantinos Karanasos commented on YARN-8984:
--

If I remember correctly, the Scheduling Requests are used only in case we have 
placement constraints (at least this was the initial design, not sure if things 
have changed recently).

Given that, at successful container allocation, the Container Request of 
unconstrained requests will be properly removed through a different code-path.

That said, if we want to start using Scheduling Request objects even without 
constraints (I don't see a reason why do this urgently, but we can do it in the 
long run), then I think we should fix the code. As [~cheersyang] said, I don't 
think the current fix will work, since the {{schedReqs}} will be null when 
there are no tags, and the map of {{outstandingSchedRequests}} has the 
allocation tags as key.

Adding [~asuresh] (Arun, I think you worked on that code last, so let me know 
if I am missing something) and [~leftnoteasy].

> AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
> --
>
> Key: YARN-8984
> URL: https://issues.apache.org/jira/browse/YARN-8984
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Critical
> Attachments: YARN-8984-001.patch, YARN-8984-002.patch, 
> YARN-8984-003.patch
>
>
> In AMRMClient, outstandingSchedRequests should be removed or decreased when 
> container allocated. However, it could not work when allocation tag is null 
> or empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8963) Add flag to disable interactive shell

2018-11-07 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang reassigned YARN-8963:
---

Assignee: Eric Yang

> Add flag to disable interactive shell
> -
>
> Key: YARN-8963
> URL: https://issues.apache.org/jira/browse/YARN-8963
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8963.001.patch
>
>
> For some production job, application admin might choose to disable debugging 
> to production jobs to prevent developer or system admin from accessing the 
> containers.  It would be nice to add an environment variable flag to disable 
> interactive shell during application submission.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8962) Add ability to use interactive shell with normal yarn container

2018-11-07 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang reassigned YARN-8962:
---

Assignee: Eric Yang

> Add ability to use interactive shell with normal yarn container
> ---
>
> Key: YARN-8962
> URL: https://issues.apache.org/jira/browse/YARN-8962
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8962.001.patch, YARN-8962.002.patch
>
>
> This task is focusing on extending interactive shell capability to yarn 
> container without docker.  This will improve some aspect of debugging 
> mapreduce or spark applications.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty

2018-11-07 Thread Botong Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678606#comment-16678606
 ] 

Botong Huang edited comment on YARN-8984 at 11/7/18 6:27 PM:
-

Took a quick look. It is expected for AMRMClient to re-send all 
outstanding/pending request after an RM master-slave switch. When a container 
is allocated, we should remove it from the outstanding list, which is exactly 
what _removeFromOutstandingSchedulingRequests()_ is doing here. If we are not 
cleaning it up properly, very likely is because RM is not feeding in the proper 
allocationTags in the allocated _Container_ object? So we need to fix this 
instead of removing the null check here? 


was (Author: botong):
Took a quick look. It is expected for AMRMClient to re-send all pending request 
after an RM failover. Whenever a container is allocated, we should remove it 
from the pending list, which is exactly what 
_removeFromOutstandingSchedulingRequests()_ is doing here. If we are not 
cleaning it up properly, very likely is it because RM is not feeding in the 
proper allocationTags in the allocated Container? So we need to fix this 
instead of removing the null check here? 

> AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
> --
>
> Key: YARN-8984
> URL: https://issues.apache.org/jira/browse/YARN-8984
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Critical
> Attachments: YARN-8984-001.patch, YARN-8984-002.patch, 
> YARN-8984-003.patch
>
>
> In AMRMClient, outstandingSchedRequests should be removed or decreased when 
> container allocated. However, it could not work when allocation tag is null 
> or empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size

2018-11-07 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678607#comment-16678607
 ] 

Giovanni Matteo Fumarola commented on YARN-8972:


Thanks [~bibinchundatt].
I pushed  [^YARN-8972.v3.patch]  with the whitespace fix.

> [Router] Add support to prevent DoS attack over ApplicationSubmissionContext 
> size
> -
>
> Key: YARN-8972
> URL: https://issues.apache.org/jira/browse/YARN-8972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8972.v1.patch, YARN-8972.v2.patch, 
> YARN-8972.v3.patch
>
>
> This jira tracks the effort to add a new interceptor in the Router to prevent 
> user to submit applications with oversized ASC.
> This avoid YARN cluster to failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size

2018-11-07 Thread Giovanni Matteo Fumarola (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8972:
---
Attachment: YARN-8972.v3.patch

> [Router] Add support to prevent DoS attack over ApplicationSubmissionContext 
> size
> -
>
> Key: YARN-8972
> URL: https://issues.apache.org/jira/browse/YARN-8972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8972.v1.patch, YARN-8972.v2.patch, 
> YARN-8972.v3.patch
>
>
> This jira tracks the effort to add a new interceptor in the Router to prevent 
> user to submit applications with oversized ASC.
> This avoid YARN cluster to failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty

2018-11-07 Thread Botong Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678606#comment-16678606
 ] 

Botong Huang commented on YARN-8984:


Took a quick look. It is expected for AMRMClient to re-send all pending request 
after an RM failover. Whenever a container is allocated, we should remove it 
from the pending list, which is exactly what 
_removeFromOutstandingSchedulingRequests()_ is doing here. If we are not 
cleaning it up properly, very likely is it because RM is not feeding in the 
proper allocationTags in the allocated Container? So we need to fix this 
instead of removing the null check here? 

> AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
> --
>
> Key: YARN-8984
> URL: https://issues.apache.org/jira/browse/YARN-8984
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Critical
> Attachments: YARN-8984-001.patch, YARN-8984-002.patch, 
> YARN-8984-003.patch
>
>
> In AMRMClient, outstandingSchedRequests should be removed or decreased when 
> container allocated. However, it could not work when allocation tag is null 
> or empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8914) Add xtermjs to YARN UI2

2018-11-07 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678592#comment-16678592
 ] 

Eric Yang commented on YARN-8914:
-

Patch 006 added user.name query parameter for non-secure cluster.

> Add xtermjs to YARN UI2
> ---
>
> Key: YARN-8914
> URL: https://issues.apache.org/jira/browse/YARN-8914
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-ui-v2
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8914.001.patch, YARN-8914.002.patch, 
> YARN-8914.003.patch, YARN-8914.004.patch, YARN-8914.005.patch, 
> YARN-8914.006.patch
>
>
> In the container listing from UI2, we can add a link to connect to docker 
> container using xtermjs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8914) Add xtermjs to YARN UI2

2018-11-07 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8914:

Attachment: YARN-8914.006.patch

> Add xtermjs to YARN UI2
> ---
>
> Key: YARN-8914
> URL: https://issues.apache.org/jira/browse/YARN-8914
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-ui-v2
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8914.001.patch, YARN-8914.002.patch, 
> YARN-8914.003.patch, YARN-8914.004.patch, YARN-8914.005.patch, 
> YARN-8914.006.patch
>
>
> In the container listing from UI2, we can add a link to connect to docker 
> container using xtermjs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8978) For fair scheduler, application with higher priority should also get priority resources for running AM

2018-11-07 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678577#comment-16678577
 ] 

Yufei Gu commented on YARN-8978:


[~qiuliang988], not sure if you still need this jira but you shouldn't make it 
as "fixed".  Please make it as "invalid/won't fix" if you don't need it.

> For fair scheduler, application with higher priority should also get priority 
> resources for running AM
> --
>
> Key: YARN-8978
> URL: https://issues.apache.org/jira/browse/YARN-8978
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: qiuliang
>Priority: Major
> Attachments: YARN-8978.001.patch
>
>
> In order to allow important applications to run earlier, we used priority 
> scheduling in the fair scheduler, and FairSharePolicy uses YARN-6307. 
> Considering this situation, there are two applications (with different 
> priorities) in the same queue and both are accepted. Both applications are 
> demanding and hungry when dispatched to the queue. Next, calculate the weight 
> ratio. Since the used resources of both applications are 0, the weight ratio 
> is also 0. The priority is invalid in this case. Low-priority applications 
> may get resources to run AM earlier than high-priority applications.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out

2018-11-07 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678585#comment-16678585
 ] 

Chandni Singh commented on YARN-8672:
-

[~eyang] yes I looked into having the token filename as an optional argument. 
The problem with that is right now the last argument to {{ContainerLocalizer}} 
is a list of local dirs. Local dirs are from {{argv[5]...argv.length}}. This is 
why we cannot make token file an optional argument because the optional 
argument has to go to the end but then the program will not know if it is a 
local dir or token file.

If we have to make it optional, we have to either do a hack, for eg. if the 
last argument  is "tokenFileName=filename" then it is a token file otherwise it 
is a local dir.

Or we change the way arguments are parsed by {{ContainerLocalizer}}, that is, 
have flags so that order of arguments don't matter. This will be backward 
incompatible.

I think, for now, making the argument mandatory will be better. 

> TestContainerManager#testLocalingResourceWhileContainerRunning occasionally 
> times out
> -
>
> Key: YARN-8672
> URL: https://issues.apache.org/jira/browse/YARN-8672
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Jason Lowe
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8672.001.patch, YARN-8672.002.patch, 
> YARN-8672.003.patch, YARN-8672.004.patch, YARN-8672.005.patch
>
>
> Precommit builds have been failing in 
> TestContainerManager#testLocalingResourceWhileContainerRunning.  I have been 
> able to reproduce the problem without any patch applied if I run the test 
> enough times.  It looks like something is removing container tokens from the 
> nmPrivate area just as a new localizer starts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8880) Add configurations for pluggable plugin framework

2018-11-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678575#comment-16678575
 ] 

Hadoop QA commented on YARN-8880:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
54s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
49s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 28s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 1 new + 222 unchanged - 3 fixed = 223 total (was 225) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  8s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
46s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
20s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
46s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}112m 44s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-8880 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12947245/YARN-8880-trunk.004.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 54e8a5fc8ddb 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 

[jira] [Commented] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out

2018-11-07 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678556#comment-16678556
 ] 

Eric Yang commented on YARN-8672:
-

Option 2 seems like a safer approach to address this issue.  We only need to 
change filename in a few place where we know that rapid creation and deletion 
of token file used during localization.  Would it be possible that if token 
filename is not given, it would use the default pattern?  Existing applications 
depend on containerid.tokens in working directory, can maintain backward 
compatibility.

> TestContainerManager#testLocalingResourceWhileContainerRunning occasionally 
> times out
> -
>
> Key: YARN-8672
> URL: https://issues.apache.org/jira/browse/YARN-8672
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Jason Lowe
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8672.001.patch, YARN-8672.002.patch, 
> YARN-8672.003.patch, YARN-8672.004.patch, YARN-8672.005.patch
>
>
> Precommit builds have been failing in 
> TestContainerManager#testLocalingResourceWhileContainerRunning.  I have been 
> able to reproduce the problem without any patch applied if I run the test 
> enough times.  It looks like something is removing container tokens from the 
> nmPrivate area just as a new localizer starts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty

2018-11-07 Thread Botong Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678554#comment-16678554
 ] 

Botong Huang commented on YARN-8984:


+[~kkaranasos]

> AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
> --
>
> Key: YARN-8984
> URL: https://issues.apache.org/jira/browse/YARN-8984
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Critical
> Attachments: YARN-8984-001.patch, YARN-8984-002.patch, 
> YARN-8984-003.patch
>
>
> In AMRMClient, outstandingSchedRequests should be removed or decreased when 
> container allocated. However, it could not work when allocation tag is null 
> or empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8983) YARN container with docker: hostname entry not in /etc/hosts

2018-11-07 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678519#comment-16678519
 ] 

Eric Yang commented on YARN-8983:
-

[~oliverhuh...@gmail.com] [~tangzhankun] The recommendation is to use 
RegistryDNS to manage hostname or docker overlay network which comes with it's 
own built-in DNS.  This is because the hostname can change more frequently 
between peers, and there is no easy way to update /etc/hosts once the docker 
container is running.  RegistryDNS only exists in Hadoop 3+, and require YARN 
service AM to populate the information.  Therefore, your best bet would be 
using Docker overlay network with built-in DNS on Hadoop 2.9.1.

> YARN container with docker: hostname entry not in /etc/hosts
> 
>
> Key: YARN-8983
> URL: https://issues.apache.org/jira/browse/YARN-8983
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.1
>Reporter: Keqiu Hu
>Priority: Critical
>
> I'm experimenting to use Hadoop 2.9.1 to launch applications with docker 
> containers. Inside the container task, we try to get the hostname of the 
> container using
> {code:java}
> InetAddress.getLocalHost().getHostName(){code}
> This works fine with LXC, however it throws the following exception when I 
> enable docker container using: 
> {code:java}
> YARN_CONTAINER_RUNTIME_TYPE=docker 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=test4
> {code}
> The exception:
>  
> {noformat}
> java.net.UnknownHostException: ctr-1541488751855-0023-01-03: 
> ctr-1541488751855-0023-01-03: Temporary failure in name resolution at 
> java.net.InetAddress.getLocalHost(InetAddress.java:1506)
>  at 
> com.linkedin.tony.TaskExecutor.registerAndGetClusterSpec(TaskExecutor.java:204)
>  
> at com.linkedin.tony.TaskExecutor.main(TaskExecutor.java:109) Caused by: 
> java.net.UnknownHostException: ctr-1541488751855-0023-01-03: Temporary 
> failure in name resolution at 
> java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) 
> at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) 
> at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at 
> java.net.InetAddress.getLocalHost(InetAddress.java:1501) ... 2 more
> {noformat}
>  
> Did some research online, it seems to be related to missing entry in 
> /etc/hosts on the hostname. So I took a look at the /etc/hosts, it is missing 
> the entry : 
> {noformat}
> pi@pi-aw:~/docker/$ docker ps
> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
> 71e3e9df8bc6 test4 "/entrypoint.sh bash..." 1 second ago Up Less than a 
> second container_1541488751855_0028_01_01
> 29d31f0327d1 test3 "/entrypoint.sh bash" 18 hours ago Up 18 hours 
> blissful_turing
> pi@pi-aw:~/docker/$ de 71e3e9df8bc6
> groups: cannot find name for group ID 1000
> groups: cannot find name for group ID 116
> groups: cannot find name for group ID 126
> To run a command as administrator (user "root"), use "sudo ".
> See "man sudo_root" for details.
> pi@ctr-1541488751855-0028-01-01:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_01$
>  cat /etc/hosts
> 127.0.0.1 localhost
> 192.168.0.14 pi-aw
> # The following lines are desirable for IPv6 capable hosts
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> pi@ctr-1541488751855-0028-01-01:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_01$
> {noformat}
> If I launch the image without YARN, I saw the entry in /etc/hosts:
> {noformat}
> pi@61f173f95631:~$ cat /etc/hosts
> 127.0.0.1 localhost
> ::1 localhost ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> 172.17.0.3 61f173f95631 {noformat}
> Here is my container-executor.cfg
> {code:java}
>  1 min.user.id=100
>  2 yarn.nodemanager.linux-container-executor.group=hadoop
>  3 [docker]
>  4 module.enabled=true
>  5 docker.binary=/usr/bin/docker
>  6 
> docker.allowed.capabilities=SYS_CHROOT,MKNOD,SETFCAP,SETPCAP,FSETID,CHOWN,AUDIT_WRITE,SETGID,NET_RAW,FOWNER,SETUID,DAC_OVERRIDE,KILL,NET_BIND_SERVICE
>  7 docker.allowed.networks=bridge,host,none
>  8 
> docker.allowed.rw-mounts=/tmp,/etc/hadoop/logs/,/private/etc/hadoop-2.9.1/logs/{code}
>  Since I'm using an older version of Hadoop 2.9.1, let me know if this is 
> something already fixed in later version :) 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8925) Updating distributed node attributes only when necessary

2018-11-07 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678431#comment-16678431
 ] 

Weiwei Yang edited comment on YARN-8925 at 11/7/18 3:58 PM:


Hi [~Tao Yang]

Thanks for the patch. It's a nice refactor in {{NodeStatusUpdaterImpl}}, looks 
pretty good. And also thanks for adding unit tests, the coverage seems good 
too. Some comments,

*NodeLabelUtil#isNodeAttributesEquals*

if {{leftNodeAttributes}} is a subset of {{rightNodeAttributes}} seems also 
equals.

And except for the name and value, we also need to compare prefix right?

It would be good if we have a separate UT for this method, to verify various of 
cases.

*HeartbeatSyncIfNeededHandler*

Can we rename this to {{CachedNodeDescriptorHandler}}? As this class caches the 
last value of node label/attribute and leverages the cache to reduce the 
overhead.

 *TestResourceTrackerService#testNodeRegistrationWithAttributes*
{code:java}
File tempDir = File.createTempFile("nattr", ".tmp");
{code}
can we put tmp dir under {{TEMP_DIR}} that to be consistent with rest of tests.

*TestNodeStatusUpdaterForAttributes*

waitTillHeartbeat/waitTillHeartbeat

can these methods be simplified with GenericTestUtils.waitFor?

Thanks

 


was (Author: cheersyang):
Hi [~Tao Yang]

Thanks for the patch. It's a nice refactor in {{NodeStatusUpdaterImpl}}, looks 
pretty good. And also thanks for adding unit tests, the coverage seems good 
too. Some comments,

*NodeLabelUtil#isNodeAttributesEquals*

if {{leftNodeAttributes}} is a subset of \{{rightNodeAttributes}} seems also 
equals.

And except for the name and value, we also need to compare prefix right?

It would be good if we have a separate UT for this method, to verify various of 
cases.

*HeartbeatSyncIfNeededHandler*

Can we rename this to \{{CachedNodeDescriptorHandler}}? As this class caches 
the last value of node label/attribute and leverages the cache to reduce the 
overhead.

 

*TestResourceTrackerService#testNodeRegistrationWithAttributes*

{code}

File tempDir = File.createTempFile("nattr", ".tmp");

{code}

can we put tmp dir under \{{TEMP_DIR}} that to be consistent with rest of tests.

*TestNodeStatusUpdaterForAttributes*

waitTillHeartbeat/waitTillHeartbeat

can these methods be simplified with GenericTestUtils.waitFor?

Thanks

 

> Updating distributed node attributes only when necessary
> 
>
> Key: YARN-8925
> URL: https://issues.apache.org/jira/browse/YARN-8925
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.2.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: performance
> Attachments: YARN-8925.001.patch, YARN-8925.002.patch, 
> YARN-8925.003.patch
>
>
> Currently if distributed node attributes exist, even though there is no 
> change, updating for distributed node attributes will happen in every 
> heartbeat between NM and RM. Updating process will hold 
> NodeAttributesManagerImpl#writeLock and may have some influence in a large 
> cluster. We have found nodes UI of a large cluster is opened slowly and most 
> time it's waiting for the lock in NodeAttributesManagerImpl. I think this 
> updating should be called only when necessary to enhance the performance of 
> related process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8880) Add configurations for pluggable plugin framework

2018-11-07 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678434#comment-16678434
 ] 

Zhankun Tang commented on YARN-8880:


{quote}"This settings" -> "setting"
{quote}
Fixed.
{quote}ResourcePluginManager#initializePluggableDevicePlugins currently only 
has the null check. will more checks be added here? Just want to make sure 
initializing problems can be found as early as possible, like class type check 
etc.
{quote}
Yeah. Will add more tests in the next basic framework patch. It will check the 
type and method implemented.
{quote}TestResourcePluginManager: can we make sure mock NMs are stopped in each 
test cases?
{quote}
The NM will be stopped in "teardown", is this ok?
{quote}Checkstyle issues need to be fixed
{quote}
Fixed.
{quote}And there seems to have some unnecessary changes, e.g
{quote}
Fixed

> Add configurations for pluggable plugin framework
> -
>
> Key: YARN-8880
> URL: https://issues.apache.org/jira/browse/YARN-8880
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8880-trunk.001.patch, YARN-8880-trunk.002.patch, 
> YARN-8880-trunk.003.patch, YARN-8880-trunk.004.patch
>
>
> Added two configurations for the pluggable device framework.
> {code:java}
> 
>  yarn.nodemanager.pluggable-device-framework.enabled
>  true/false
>  
>  
>  yarn.nodemanager.pluggable-device-framework.device-classes
>  com.cmp1.hdw1,...
>  {code}
> The admin needs to know the register resource name of every plugin classes 
> configured. And declare them in resource-types.xml.
> Please note that the count value defined in node-resource.xml will be 
> overridden by plugin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8925) Updating distributed node attributes only when necessary

2018-11-07 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678431#comment-16678431
 ] 

Weiwei Yang commented on YARN-8925:
---

Hi [~Tao Yang]

Thanks for the patch. It's a nice refactor in {{NodeStatusUpdaterImpl}}, looks 
pretty good. And also thanks for adding unit tests, the coverage seems good 
too. Some comments,

*NodeLabelUtil#isNodeAttributesEquals*

if {{leftNodeAttributes}} is a subset of \{{rightNodeAttributes}} seems also 
equals.

And except for the name and value, we also need to compare prefix right?

It would be good if we have a separate UT for this method, to verify various of 
cases.

*HeartbeatSyncIfNeededHandler*

Can we rename this to \{{CachedNodeDescriptorHandler}}? As this class caches 
the last value of node label/attribute and leverages the cache to reduce the 
overhead.

 

*TestResourceTrackerService#testNodeRegistrationWithAttributes*

{code}

File tempDir = File.createTempFile("nattr", ".tmp");

{code}

can we put tmp dir under \{{TEMP_DIR}} that to be consistent with rest of tests.

*TestNodeStatusUpdaterForAttributes*

waitTillHeartbeat/waitTillHeartbeat

can these methods be simplified with GenericTestUtils.waitFor?

Thanks

 

> Updating distributed node attributes only when necessary
> 
>
> Key: YARN-8925
> URL: https://issues.apache.org/jira/browse/YARN-8925
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.2.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: performance
> Attachments: YARN-8925.001.patch, YARN-8925.002.patch, 
> YARN-8925.003.patch
>
>
> Currently if distributed node attributes exist, even though there is no 
> change, updating for distributed node attributes will happen in every 
> heartbeat between NM and RM. Updating process will hold 
> NodeAttributesManagerImpl#writeLock and may have some influence in a large 
> cluster. We have found nodes UI of a large cluster is opened slowly and most 
> time it's waiting for the lock in NodeAttributesManagerImpl. I think this 
> updating should be called only when necessary to enhance the performance of 
> related process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8880) Add configurations for pluggable plugin framework

2018-11-07 Thread Zhankun Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-8880:
---
Attachment: YARN-8880-trunk.004.patch

> Add configurations for pluggable plugin framework
> -
>
> Key: YARN-8880
> URL: https://issues.apache.org/jira/browse/YARN-8880
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8880-trunk.001.patch, YARN-8880-trunk.002.patch, 
> YARN-8880-trunk.003.patch, YARN-8880-trunk.004.patch
>
>
> Added two configurations for the pluggable device framework.
> {code:java}
> 
>  yarn.nodemanager.pluggable-device-framework.enabled
>  true/false
>  
>  
>  yarn.nodemanager.pluggable-device-framework.device-classes
>  com.cmp1.hdw1,...
>  {code}
> The admin needs to know the register resource name of every plugin classes 
> configured. And declare them in resource-types.xml.
> Please note that the count value defined in node-resource.xml will be 
> overridden by plugin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8977) Remove unnecessary type casting when calling AbstractYarnScheduler#getSchedulerNode

2018-11-07 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678342#comment-16678342
 ] 

Weiwei Yang commented on YARN-8977:
---

Committed to trunk, cherry picked to branch-3.0, branch-3.1 and branch-3.2. 
Fixed in all 3.x streams. Thanks for the contribution [~jiwq].

> Remove unnecessary type casting when calling 
> AbstractYarnScheduler#getSchedulerNode
> ---
>
> Key: YARN-8977
> URL: https://issues.apache.org/jira/browse/YARN-8977
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Trivial
> Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1
>
> Attachments: YARN-8977.001.patch, YARN-8977.002.patch
>
>
> Due to the AbstractYarnScheduler#getSchedulerNode method return the generic 
> type, so I think don't need explicit type. 
> I found this issue in CapacityScheduler class. The warning message like:
> {quote}Casting 'getSchedulerNode( nonKillableContainer.getAllocatedNode())' 
> to 'FiCaSchedulerNode' is redundant
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8977) Remove unnecessary type casting when calling AbstractYarnScheduler#getSchedulerNode

2018-11-07 Thread Wanqiang Ji (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678349#comment-16678349
 ] 

Wanqiang Ji commented on YARN-8977:
---

Thanks for your review and works [~cheersyang]

> Remove unnecessary type casting when calling 
> AbstractYarnScheduler#getSchedulerNode
> ---
>
> Key: YARN-8977
> URL: https://issues.apache.org/jira/browse/YARN-8977
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Trivial
> Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1
>
> Attachments: YARN-8977.001.patch, YARN-8977.002.patch
>
>
> Due to the AbstractYarnScheduler#getSchedulerNode method return the generic 
> type, so I think don't need explicit type. 
> I found this issue in CapacityScheduler class. The warning message like:
> {quote}Casting 'getSchedulerNode( nonKillableContainer.getAllocatedNode())' 
> to 'FiCaSchedulerNode' is redundant
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8977) Remove unnecessary type casting when calling AbstractYarnScheduler#getSchedulerNode

2018-11-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678343#comment-16678343
 ] 

Hudson commented on YARN-8977:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15384 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15384/])
YARN-8977. Remove unnecessary type casting when calling (wwei: rev 
c96cbe8659587cfc114a96aab1be5cc85029fe44)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestContinuousScheduling.java


> Remove unnecessary type casting when calling 
> AbstractYarnScheduler#getSchedulerNode
> ---
>
> Key: YARN-8977
> URL: https://issues.apache.org/jira/browse/YARN-8977
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Trivial
> Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1
>
> Attachments: YARN-8977.001.patch, YARN-8977.002.patch
>
>
> Due to the AbstractYarnScheduler#getSchedulerNode method return the generic 
> type, so I think don't need explicit type. 
> I found this issue in CapacityScheduler class. The warning message like:
> {quote}Casting 'getSchedulerNode( nonKillableContainer.getAllocatedNode())' 
> to 'FiCaSchedulerNode' is redundant
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8977) Remove unnecessary type casting when calling AbstractYarnScheduler#getSchedulerNode

2018-11-07 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8977:
--
Fix Version/s: (was: 3.0.2)
   3.0.4

> Remove unnecessary type casting when calling 
> AbstractYarnScheduler#getSchedulerNode
> ---
>
> Key: YARN-8977
> URL: https://issues.apache.org/jira/browse/YARN-8977
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Trivial
> Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1
>
> Attachments: YARN-8977.001.patch, YARN-8977.002.patch
>
>
> Due to the AbstractYarnScheduler#getSchedulerNode method return the generic 
> type, so I think don't need explicit type. 
> I found this issue in CapacityScheduler class. The warning message like:
> {quote}Casting 'getSchedulerNode( nonKillableContainer.getAllocatedNode())' 
> to 'FiCaSchedulerNode' is redundant
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8977) Remove unnecessary type casting when calling AbstractYarnScheduler#getSchedulerNode

2018-11-07 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8977:
--
Fix Version/s: 3.0.2

> Remove unnecessary type casting when calling 
> AbstractYarnScheduler#getSchedulerNode
> ---
>
> Key: YARN-8977
> URL: https://issues.apache.org/jira/browse/YARN-8977
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Trivial
> Fix For: 3.0.2, 3.1.2, 3.3.0, 3.2.1
>
> Attachments: YARN-8977.001.patch, YARN-8977.002.patch
>
>
> Due to the AbstractYarnScheduler#getSchedulerNode method return the generic 
> type, so I think don't need explicit type. 
> I found this issue in CapacityScheduler class. The warning message like:
> {quote}Casting 'getSchedulerNode( nonKillableContainer.getAllocatedNode())' 
> to 'FiCaSchedulerNode' is redundant
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8977) Remove unnecessary type casting when calling AbstractYarnScheduler#getSchedulerNode

2018-11-07 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8977:
--
Fix Version/s: 3.1.2

> Remove unnecessary type casting when calling 
> AbstractYarnScheduler#getSchedulerNode
> ---
>
> Key: YARN-8977
> URL: https://issues.apache.org/jira/browse/YARN-8977
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Trivial
> Fix For: 3.1.2, 3.3.0, 3.2.1
>
> Attachments: YARN-8977.001.patch, YARN-8977.002.patch
>
>
> Due to the AbstractYarnScheduler#getSchedulerNode method return the generic 
> type, so I think don't need explicit type. 
> I found this issue in CapacityScheduler class. The warning message like:
> {quote}Casting 'getSchedulerNode( nonKillableContainer.getAllocatedNode())' 
> to 'FiCaSchedulerNode' is redundant
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8880) Add configurations for pluggable plugin framework

2018-11-07 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678245#comment-16678245
 ] 

Weiwei Yang edited comment on YARN-8880 at 11/7/18 2:52 PM:


Hi [~tangzhankun]

Thanks for the patch. It looks good. Some small nits
 # "This settings" -> "setting"
 # ResourcePluginManager#initializePluggableDevicePlugins currently only has 
the null check. will more checks be added here? Just want to make sure 
initializing problems can be found as early as possible, like class type check 
etc.
 # TestResourcePluginManager: can we make sure mock NMs are stopped in each 
test cases?
 # Checkstyle issues need to be fixed

And there seems to have some unnecessary changes, e.g
{code:java}
- ((NMContext)this.getNMContext()).setResourcePluginManager(rpm);
+ ((NMContext) this.getNMContext()).setResourcePluginManager(rpm);
{code}
and
{code:java}
- metrics, diskhandler);
+ metrics, diskhandler);
{code}
thanks.


was (Author: cheersyang):
Hi [~tangzhankun]

Thanks for the patch. It looks good. Some small nits
 # "This settings" -> "setting"
 # ResourcePluginManager#initializePluggableDevicePlugins currently only has 
the null check. will more checks be added here? Just want to make sure 
initializing problems can be found as early as possible, like class type check 
etc.
 # TestResourcePluginManager: can we make sure mock NMs are stopped in each 
test cases?
 # Checkstyle issues need to be fixed

And there seems to have some unnecessary changes, e.g

{code}

- ((NMContext)this.getNMContext()).setResourcePluginManager(rpm);

+ ((NMContext) this.getNMContext()).setResourcePluginManager(rpm);

{code}

and

{code}

- metrics, diskhandler);

+ metrics, diskhandler);

{code}

thanks.

> Add configurations for pluggable plugin framework
> -
>
> Key: YARN-8880
> URL: https://issues.apache.org/jira/browse/YARN-8880
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8880-trunk.001.patch, YARN-8880-trunk.002.patch, 
> YARN-8880-trunk.003.patch
>
>
> Added two configurations for the pluggable device framework.
> {code:java}
> 
>  yarn.nodemanager.pluggable-device-framework.enabled
>  true/false
>  
>  
>  yarn.nodemanager.pluggable-device-framework.device-classes
>  com.cmp1.hdw1,...
>  {code}
> The admin needs to know the register resource name of every plugin classes 
> configured. And declare them in resource-types.xml.
> Please note that the count value defined in node-resource.xml will be 
> overridden by plugin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8977) Remove unnecessary type casting when calling AbstractYarnScheduler#getSchedulerNode

2018-11-07 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8977:
--
Fix Version/s: 3.2.1

> Remove unnecessary type casting when calling 
> AbstractYarnScheduler#getSchedulerNode
> ---
>
> Key: YARN-8977
> URL: https://issues.apache.org/jira/browse/YARN-8977
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Trivial
> Fix For: 3.3.0, 3.2.1
>
> Attachments: YARN-8977.001.patch, YARN-8977.002.patch
>
>
> Due to the AbstractYarnScheduler#getSchedulerNode method return the generic 
> type, so I think don't need explicit type. 
> I found this issue in CapacityScheduler class. The warning message like:
> {quote}Casting 'getSchedulerNode( nonKillableContainer.getAllocatedNode())' 
> to 'FiCaSchedulerNode' is redundant
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty

2018-11-07 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678324#comment-16678324
 ] 

Weiwei Yang edited comment on YARN-8984 at 11/7/18 2:48 PM:


Thanks for the updates [~fly_in_gis], it almost seems good to me. One doubt

if we remove that null check
{code:java}
List schedReqs = 
outstandingSchedRequests.get(container.getAllocationTags());
{code}
now this seems to be possible to throw NPE, when container.getAllocationTags() 
is null.


was (Author: cheersyang):
Thanks for the updates [~fly_in_gis], it almost seems good to me. One thought

if we remove that null check

{code}

List schedReqs = 
outstandingSchedRequests.get(container.getAllocationTags());

{code}

now this seems to be possible to throw NPE, when container.getAllocationTags() 
is null.

> AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
> --
>
> Key: YARN-8984
> URL: https://issues.apache.org/jira/browse/YARN-8984
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Critical
> Attachments: YARN-8984-001.patch, YARN-8984-002.patch, 
> YARN-8984-003.patch
>
>
> In AMRMClient, outstandingSchedRequests should be removed or decreased when 
> container allocated. However, it could not work when allocation tag is null 
> or empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty

2018-11-07 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678324#comment-16678324
 ] 

Weiwei Yang commented on YARN-8984:
---

Thanks for the updates [~fly_in_gis], it almost seems good to me. One thought

if we remove that null check

{code}

List schedReqs = 
outstandingSchedRequests.get(container.getAllocationTags());

{code}

now this seems to be possible to throw NPE, when container.getAllocationTags() 
is null.

> AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
> --
>
> Key: YARN-8984
> URL: https://issues.apache.org/jira/browse/YARN-8984
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Critical
> Attachments: YARN-8984-001.patch, YARN-8984-002.patch, 
> YARN-8984-003.patch
>
>
> In AMRMClient, outstandingSchedRequests should be removed or decreased when 
> container allocated. However, it could not work when allocation tag is null 
> or empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8977) Remove unnecessary type casting when calling AbstractYarnScheduler#getSchedulerNode

2018-11-07 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8977:
--
Priority: Trivial  (was: Major)

> Remove unnecessary type casting when calling 
> AbstractYarnScheduler#getSchedulerNode
> ---
>
> Key: YARN-8977
> URL: https://issues.apache.org/jira/browse/YARN-8977
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Trivial
> Attachments: YARN-8977.001.patch, YARN-8977.002.patch
>
>
> Due to the AbstractYarnScheduler#getSchedulerNode method return the generic 
> type, so I think don't need explicit type. 
> I found this issue in CapacityScheduler class. The warning message like:
> {quote}Casting 'getSchedulerNode( nonKillableContainer.getAllocatedNode())' 
> to 'FiCaSchedulerNode' is redundant
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8977) Remove explicit type when called AbstractYarnScheduler#getSchedulerNode to avoid type casting

2018-11-07 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678272#comment-16678272
 ] 

Weiwei Yang commented on YARN-8977:
---

The UT failure should be irrelevant, I tested locally it can work correctly.

+1 for the v2 patch, committing soon.

> Remove explicit type when called AbstractYarnScheduler#getSchedulerNode to 
> avoid type casting
> -
>
> Key: YARN-8977
> URL: https://issues.apache.org/jira/browse/YARN-8977
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Attachments: YARN-8977.001.patch, YARN-8977.002.patch
>
>
> Due to the AbstractYarnScheduler#getSchedulerNode method return the generic 
> type, so I think don't need explicit type. 
> I found this issue in CapacityScheduler class. The warning message like:
> {quote}Casting 'getSchedulerNode( nonKillableContainer.getAllocatedNode())' 
> to 'FiCaSchedulerNode' is redundant
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8977) Remove unnecessary type casting when calling AbstractYarnScheduler#getSchedulerNode

2018-11-07 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8977:
--
Summary: Remove unnecessary type casting when calling 
AbstractYarnScheduler#getSchedulerNode  (was: Remove explicit type when called 
AbstractYarnScheduler#getSchedulerNode to avoid type casting)

> Remove unnecessary type casting when calling 
> AbstractYarnScheduler#getSchedulerNode
> ---
>
> Key: YARN-8977
> URL: https://issues.apache.org/jira/browse/YARN-8977
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Attachments: YARN-8977.001.patch, YARN-8977.002.patch
>
>
> Due to the AbstractYarnScheduler#getSchedulerNode method return the generic 
> type, so I think don't need explicit type. 
> I found this issue in CapacityScheduler class. The warning message like:
> {quote}Casting 'getSchedulerNode( nonKillableContainer.getAllocatedNode())' 
> to 'FiCaSchedulerNode' is redundant
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null

2018-11-07 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678260#comment-16678260
 ] 

Weiwei Yang commented on YARN-8233:
---

The patch for branch-3.1 looks good, however the jenkins job runs into some 
issues. From 
[https://builds.apache.org/job/PreCommit-YARN-Build/22446/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt],
 I see
{quote}[ERROR] ExecutionException The forked VM terminated without properly 
saying goodbye. VM crash or System.exit called?
{quote}
anything wrong on branch-3.1?

> NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal 
> whose allocatedOrReservedContainer is null
> -
>
> Key: YARN-8233
> URL: https://issues.apache.org/jira/browse/YARN-8233
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Fix For: 3.3.0, 3.2.1
>
> Attachments: YARN-8233.001.branch-2.patch, 
> YARN-8233.001.branch-3.0.patch, YARN-8233.001.branch-3.1.patch, 
> YARN-8233.001.patch, YARN-8233.002.patch, YARN-8233.003.patch
>
>
> Recently we saw a NPE problem in CapacityScheduler#tryCommit when try to find 
> the attemptId by calling {{c.getAllocatedOrReservedContainer().get...}} from 
> an allocate/reserve proposal. But got null allocatedOrReservedContainer and 
> thrown NPE.
> Reference code:
> {code:java}
> // find the application to accept and apply the ResourceCommitRequest
> if (request.anythingAllocatedOrReserved()) {
>   ContainerAllocationProposal c =
>   request.getFirstAllocatedOrReservedContainer();
>   attemptId =
>   c.getAllocatedOrReservedContainer().getSchedulerApplicationAttempt()
>   .getApplicationAttemptId();   //NPE happens here
> } else { ...
> {code}
> The proposal was constructed in 
> {{CapacityScheduler#createResourceCommitRequest}} and 
> allocatedOrReservedContainer is possibly null in async-scheduling process 
> when node was lost or application was finished (details in 
> {{CapacityScheduler#getSchedulerContainer}}).
> Reference code:
> {code:java}
>   // Allocated something
>   List allocations =
>   csAssignment.getAssignmentInformation().getAllocationDetails();
>   if (!allocations.isEmpty()) {
> RMContainer rmContainer = allocations.get(0).rmContainer;
> allocated = new ContainerAllocationProposal<>(
> getSchedulerContainer(rmContainer, true),   //possibly null
> getSchedulerContainersToRelease(csAssignment),
> 
> getSchedulerContainer(csAssignment.getFulfilledReservedContainer(),
> false), csAssignment.getType(),
> csAssignment.getRequestLocalityType(),
> csAssignment.getSchedulingMode() != null ?
> csAssignment.getSchedulingMode() :
> SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY,
> csAssignment.getResource());
>   }
> {code}
> I think we should add null check for allocateOrReserveContainer before create 
> allocate/reserve proposals. Besides the allocation process has increase 
> unconfirmed resource of app when creating an allocate assignment, so if this 
> check is null, we should decrease the unconfirmed resource of live app.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8880) Add configurations for pluggable plugin framework

2018-11-07 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678245#comment-16678245
 ] 

Weiwei Yang commented on YARN-8880:
---

Hi [~tangzhankun]

Thanks for the patch. It looks good. Some small nits
 # "This settings" -> "setting"
 # ResourcePluginManager#initializePluggableDevicePlugins currently only has 
the null check. will more checks be added here? Just want to make sure 
initializing problems can be found as early as possible, like class type check 
etc.
 # TestResourcePluginManager: can we make sure mock NMs are stopped in each 
test cases?
 # Checkstyle issues need to be fixed

And there seems to have some unnecessary changes, e.g

{code}

- ((NMContext)this.getNMContext()).setResourcePluginManager(rpm);

+ ((NMContext) this.getNMContext()).setResourcePluginManager(rpm);

{code}

and

{code}

- metrics, diskhandler);

+ metrics, diskhandler);

{code}

thanks.

> Add configurations for pluggable plugin framework
> -
>
> Key: YARN-8880
> URL: https://issues.apache.org/jira/browse/YARN-8880
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8880-trunk.001.patch, YARN-8880-trunk.002.patch, 
> YARN-8880-trunk.003.patch
>
>
> Added two configurations for the pluggable device framework.
> {code:java}
> 
>  yarn.nodemanager.pluggable-device-framework.enabled
>  true/false
>  
>  
>  yarn.nodemanager.pluggable-device-framework.device-classes
>  com.cmp1.hdw1,...
>  {code}
> The admin needs to know the register resource name of every plugin classes 
> configured. And declare them in resource-types.xml.
> Please note that the count value defined in node-resource.xml will be 
> overridden by plugin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8902) Add volume manager that manages CSI volume lifecycle

2018-11-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678218#comment-16678218
 ] 

Hadoop QA commented on YARN-8902:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m 
22s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  7s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 7 new + 
58 unchanged - 0 fixed = 65 total (was 58) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 50s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
42s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}106m 14s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}187m 36s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-8902 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12947215/YARN-8902.008.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux cbfb9ab2ca25 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8dc1f6d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Commented] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty

2018-11-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678192#comment-16678192
 ] 

Hadoop QA commented on YARN-8984:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
35s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
33s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 14m 
14s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m  
0s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 20s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 24 new + 35 unchanged - 0 fixed = 59 total (was 35) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 11m 
51s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  1m  2s{color} 
| {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 44s{color} 
| {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 75m 39s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-8984 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12947223/YARN-8984-003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux e2095fc2c363 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8dc1f6d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Commented] (YARN-8880) Add configurations for pluggable plugin framework

2018-11-07 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678151#comment-16678151
 ] 

Zhankun Tang commented on YARN-8880:


[~Weiwei Yang] , [~sunilg] , the failed test seems not related. Please help to 
review.

> Add configurations for pluggable plugin framework
> -
>
> Key: YARN-8880
> URL: https://issues.apache.org/jira/browse/YARN-8880
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8880-trunk.001.patch, YARN-8880-trunk.002.patch, 
> YARN-8880-trunk.003.patch
>
>
> Added two configurations for the pluggable device framework.
> {code:java}
> 
>  yarn.nodemanager.pluggable-device-framework.enabled
>  true/false
>  
>  
>  yarn.nodemanager.pluggable-device-framework.device-classes
>  com.cmp1.hdw1,...
>  {code}
> The admin needs to know the register resource name of every plugin classes 
> configured. And declare them in resource-types.xml.
> Please note that the count value defined in node-resource.xml will be 
> overridden by plugin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null

2018-11-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678122#comment-16678122
 ] 

Hadoop QA commented on YARN-8233:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
 0s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} branch-3.1 passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 11m 
45s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} branch-3.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 10m 
34s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  1m 15s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 63m 46s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:080e9d0 |
| JIRA Issue | YARN-8233 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12947218/YARN-8233.001.branch-3.1.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 541f37412400 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-3.1 / eb426db |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/22446/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/22446/testReport/ |
| Max. process+thread count | 100 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 

[jira] [Commented] (YARN-8880) Add configurations for pluggable plugin framework

2018-11-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678111#comment-16678111
 ] 

Hadoop QA commented on YARN-8880:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
58s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 29s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
37s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m  
8s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 51s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 7 new + 222 unchanged - 3 fixed = 229 total (was 225) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 58s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
4s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
13s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 23s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
45s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}142m 57s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestContainerManager |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-8880 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12947204/YARN-8880-trunk.003.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  

[jira] [Updated] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty

2018-11-07 Thread Yang Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated YARN-8984:

Attachment: YARN-8984-003.patch

> AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
> --
>
> Key: YARN-8984
> URL: https://issues.apache.org/jira/browse/YARN-8984
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Critical
> Attachments: YARN-8984-001.patch, YARN-8984-002.patch, 
> YARN-8984-003.patch
>
>
> In AMRMClient, outstandingSchedRequests should be removed or decreased when 
> container allocated. However, it could not work when allocation tag is null 
> or empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty

2018-11-07 Thread Yang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678081#comment-16678081
 ] 

Yang Wang commented on YARN-8984:
-

There's no difference between in a separate class and in 
TestAMRMClientPlacementConstraints.

When set YarnConfiguration.RM_PLACEMENT_CONSTRAINTS_HANDLER to scheduler, we 
could not get rejectedSchedulingRequests from AllocateResponse. It is not set 
by the capacity scheduler. So i add another test in 
TestAMRMClientPlacementConstraints.

[~cheersyang] Please help to review.

> AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
> --
>
> Key: YARN-8984
> URL: https://issues.apache.org/jira/browse/YARN-8984
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Critical
> Attachments: YARN-8984-001.patch, YARN-8984-002.patch
>
>
> In AMRMClient, outstandingSchedRequests should be removed or decreased when 
> container allocated. However, it could not work when allocation tag is null 
> or empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8977) Remove explicit type when called AbstractYarnScheduler#getSchedulerNode to avoid type casting

2018-11-07 Thread Wanqiang Ji (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678065#comment-16678065
 ] 

Wanqiang Ji commented on YARN-8977:
---

I don't think the hadoop-yarn-server-resourcemanager issues are caused by this 
patch. So pending on Jenkins again.

> Remove explicit type when called AbstractYarnScheduler#getSchedulerNode to 
> avoid type casting
> -
>
> Key: YARN-8977
> URL: https://issues.apache.org/jira/browse/YARN-8977
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
> Attachments: YARN-8977.001.patch, YARN-8977.002.patch
>
>
> Due to the AbstractYarnScheduler#getSchedulerNode method return the generic 
> type, so I think don't need explicit type. 
> I found this issue in CapacityScheduler class. The warning message like:
> {quote}Casting 'getSchedulerNode( nonKillableContainer.getAllocatedNode())' 
> to 'FiCaSchedulerNode' is redundant
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null

2018-11-07 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678045#comment-16678045
 ] 

Tao Yang edited comment on YARN-8233 at 11/7/18 11:06 AM:
--

Hi, [~ajisakaa], [~cheersyang]
I have attached 3 patches: 
(1) patch for branch-3.1 just update test case since 
SchedulerApplicationAttempt#hasPendingResourceRequest API has changed in 3.2 
and trunk.
(2) patch for branch-3.0 includes modification above and drop the modification 
in CapacityScheduler#attemptAllocationOnNode since the method is not exist yet.
(3) patch for branch-2 includes modifications above and update UT to add final 
keywords for variables which are used in Mockito#doAnswer.  branch-2.9 can use 
branch-2 patch.
I have applied these patch on my local environment, tried to run UT and did not 
found any problems. Just in case, please help to review these new patches 
before committing, Thanks!


was (Author: tao yang):
Hi, [~ajisakaa], [~cheersyang]
I have attached 3 patches: 
(1) patch for branch-3.1 just update test case since 
SchedulerApplicationAttempt#hasPendingResourceRequest API has changed in 3.2 
and trunk.
(2) patch for branch-3.0 includes modification above and drop the modification 
in CapacityScheduler#attemptAllocationOnNode since the method is not exist yet.
(3) patch for branch-2 includes modifications above and update UT to add final 
keywords for variables which are used in Mockito#doAnswer.  branch-2.9 can use 
branch-2 patch.
Please help to review these new patches before committing, Thanks.

> NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal 
> whose allocatedOrReservedContainer is null
> -
>
> Key: YARN-8233
> URL: https://issues.apache.org/jira/browse/YARN-8233
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Fix For: 3.3.0, 3.2.1
>
> Attachments: YARN-8233.001.branch-2.patch, 
> YARN-8233.001.branch-3.0.patch, YARN-8233.001.branch-3.1.patch, 
> YARN-8233.001.patch, YARN-8233.002.patch, YARN-8233.003.patch
>
>
> Recently we saw a NPE problem in CapacityScheduler#tryCommit when try to find 
> the attemptId by calling {{c.getAllocatedOrReservedContainer().get...}} from 
> an allocate/reserve proposal. But got null allocatedOrReservedContainer and 
> thrown NPE.
> Reference code:
> {code:java}
> // find the application to accept and apply the ResourceCommitRequest
> if (request.anythingAllocatedOrReserved()) {
>   ContainerAllocationProposal c =
>   request.getFirstAllocatedOrReservedContainer();
>   attemptId =
>   c.getAllocatedOrReservedContainer().getSchedulerApplicationAttempt()
>   .getApplicationAttemptId();   //NPE happens here
> } else { ...
> {code}
> The proposal was constructed in 
> {{CapacityScheduler#createResourceCommitRequest}} and 
> allocatedOrReservedContainer is possibly null in async-scheduling process 
> when node was lost or application was finished (details in 
> {{CapacityScheduler#getSchedulerContainer}}).
> Reference code:
> {code:java}
>   // Allocated something
>   List allocations =
>   csAssignment.getAssignmentInformation().getAllocationDetails();
>   if (!allocations.isEmpty()) {
> RMContainer rmContainer = allocations.get(0).rmContainer;
> allocated = new ContainerAllocationProposal<>(
> getSchedulerContainer(rmContainer, true),   //possibly null
> getSchedulerContainersToRelease(csAssignment),
> 
> getSchedulerContainer(csAssignment.getFulfilledReservedContainer(),
> false), csAssignment.getType(),
> csAssignment.getRequestLocalityType(),
> csAssignment.getSchedulingMode() != null ?
> csAssignment.getSchedulingMode() :
> SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY,
> csAssignment.getResource());
>   }
> {code}
> I think we should add null check for allocateOrReserveContainer before create 
> allocate/reserve proposals. Besides the allocation process has increase 
> unconfirmed resource of app when creating an allocate assignment, so if this 
> check is null, we should decrease the unconfirmed resource of live app.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null

2018-11-07 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678045#comment-16678045
 ] 

Tao Yang commented on YARN-8233:


Hi, [~ajisakaa], [~cheersyang]
I have attached 3 patches: 
(1) patch for branch-3.1 just update test case since 
SchedulerApplicationAttempt#hasPendingResourceRequest API has changed in 3.2 
and trunk.
(2) patch for branch-3.0 includes modification above and drop the modification 
in CapacityScheduler#attemptAllocationOnNode since the method is not exist yet.
(3) patch for branch-2 includes modifications above and update UT to add final 
keywords for variables which are used in Mockito#doAnswer.  branch-2.9 can use 
branch-2 patch.
Please help to review these new patches before committing, Thanks.

> NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal 
> whose allocatedOrReservedContainer is null
> -
>
> Key: YARN-8233
> URL: https://issues.apache.org/jira/browse/YARN-8233
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Fix For: 3.3.0, 3.2.1
>
> Attachments: YARN-8233.001.branch-2.patch, 
> YARN-8233.001.branch-3.0.patch, YARN-8233.001.branch-3.1.patch, 
> YARN-8233.001.patch, YARN-8233.002.patch, YARN-8233.003.patch
>
>
> Recently we saw a NPE problem in CapacityScheduler#tryCommit when try to find 
> the attemptId by calling {{c.getAllocatedOrReservedContainer().get...}} from 
> an allocate/reserve proposal. But got null allocatedOrReservedContainer and 
> thrown NPE.
> Reference code:
> {code:java}
> // find the application to accept and apply the ResourceCommitRequest
> if (request.anythingAllocatedOrReserved()) {
>   ContainerAllocationProposal c =
>   request.getFirstAllocatedOrReservedContainer();
>   attemptId =
>   c.getAllocatedOrReservedContainer().getSchedulerApplicationAttempt()
>   .getApplicationAttemptId();   //NPE happens here
> } else { ...
> {code}
> The proposal was constructed in 
> {{CapacityScheduler#createResourceCommitRequest}} and 
> allocatedOrReservedContainer is possibly null in async-scheduling process 
> when node was lost or application was finished (details in 
> {{CapacityScheduler#getSchedulerContainer}}).
> Reference code:
> {code:java}
>   // Allocated something
>   List allocations =
>   csAssignment.getAssignmentInformation().getAllocationDetails();
>   if (!allocations.isEmpty()) {
> RMContainer rmContainer = allocations.get(0).rmContainer;
> allocated = new ContainerAllocationProposal<>(
> getSchedulerContainer(rmContainer, true),   //possibly null
> getSchedulerContainersToRelease(csAssignment),
> 
> getSchedulerContainer(csAssignment.getFulfilledReservedContainer(),
> false), csAssignment.getType(),
> csAssignment.getRequestLocalityType(),
> csAssignment.getSchedulingMode() != null ?
> csAssignment.getSchedulingMode() :
> SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY,
> csAssignment.getResource());
>   }
> {code}
> I think we should add null check for allocateOrReserveContainer before create 
> allocate/reserve proposals. Besides the allocation process has increase 
> unconfirmed resource of app when creating an allocate assignment, so if this 
> check is null, we should decrease the unconfirmed resource of live app.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8983) YARN container with docker: hostname entry not in /etc/hosts

2018-11-07 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678029#comment-16678029
 ] 

Zhankun Tang commented on YARN-8983:


[~oliverhuh...@gmail.com] , Just run a quick test with DistributedShell on yarn 
3.3.0 with the command "cat /etc/hosts" in a Docker container.
{code:java}
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters{code}

> YARN container with docker: hostname entry not in /etc/hosts
> 
>
> Key: YARN-8983
> URL: https://issues.apache.org/jira/browse/YARN-8983
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.1
>Reporter: Keqiu Hu
>Priority: Critical
>
> I'm experimenting to use Hadoop 2.9.1 to launch applications with docker 
> containers. Inside the container task, we try to get the hostname of the 
> container using
> {code:java}
> InetAddress.getLocalHost().getHostName(){code}
> This works fine with LXC, however it throws the following exception when I 
> enable docker container using: 
> {code:java}
> YARN_CONTAINER_RUNTIME_TYPE=docker 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=test4
> {code}
> The exception:
>  
> {noformat}
> java.net.UnknownHostException: ctr-1541488751855-0023-01-03: 
> ctr-1541488751855-0023-01-03: Temporary failure in name resolution at 
> java.net.InetAddress.getLocalHost(InetAddress.java:1506)
>  at 
> com.linkedin.tony.TaskExecutor.registerAndGetClusterSpec(TaskExecutor.java:204)
>  
> at com.linkedin.tony.TaskExecutor.main(TaskExecutor.java:109) Caused by: 
> java.net.UnknownHostException: ctr-1541488751855-0023-01-03: Temporary 
> failure in name resolution at 
> java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) 
> at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) 
> at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at 
> java.net.InetAddress.getLocalHost(InetAddress.java:1501) ... 2 more
> {noformat}
>  
> Did some research online, it seems to be related to missing entry in 
> /etc/hosts on the hostname. So I took a look at the /etc/hosts, it is missing 
> the entry : 
> {noformat}
> pi@pi-aw:~/docker/$ docker ps
> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
> 71e3e9df8bc6 test4 "/entrypoint.sh bash..." 1 second ago Up Less than a 
> second container_1541488751855_0028_01_01
> 29d31f0327d1 test3 "/entrypoint.sh bash" 18 hours ago Up 18 hours 
> blissful_turing
> pi@pi-aw:~/docker/$ de 71e3e9df8bc6
> groups: cannot find name for group ID 1000
> groups: cannot find name for group ID 116
> groups: cannot find name for group ID 126
> To run a command as administrator (user "root"), use "sudo ".
> See "man sudo_root" for details.
> pi@ctr-1541488751855-0028-01-01:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_01$
>  cat /etc/hosts
> 127.0.0.1 localhost
> 192.168.0.14 pi-aw
> # The following lines are desirable for IPv6 capable hosts
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> pi@ctr-1541488751855-0028-01-01:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_01$
> {noformat}
> If I launch the image without YARN, I saw the entry in /etc/hosts:
> {noformat}
> pi@61f173f95631:~$ cat /etc/hosts
> 127.0.0.1 localhost
> ::1 localhost ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> 172.17.0.3 61f173f95631 {noformat}
> Here is my container-executor.cfg
> {code:java}
>  1 min.user.id=100
>  2 yarn.nodemanager.linux-container-executor.group=hadoop
>  3 [docker]
>  4 module.enabled=true
>  5 docker.binary=/usr/bin/docker
>  6 
> docker.allowed.capabilities=SYS_CHROOT,MKNOD,SETFCAP,SETPCAP,FSETID,CHOWN,AUDIT_WRITE,SETGID,NET_RAW,FOWNER,SETUID,DAC_OVERRIDE,KILL,NET_BIND_SERVICE
>  7 docker.allowed.networks=bridge,host,none
>  8 
> docker.allowed.rw-mounts=/tmp,/etc/hadoop/logs/,/private/etc/hadoop-2.9.1/logs/{code}
>  Since I'm using an older version of Hadoop 2.9.1, let me know if this is 
> something already fixed in later version :) 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null

2018-11-07 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-8233:
---
Attachment: YARN-8233.001.branch-3.1.patch
YARN-8233.001.branch-3.0.patch
YARN-8233.001.branch-2.patch

> NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal 
> whose allocatedOrReservedContainer is null
> -
>
> Key: YARN-8233
> URL: https://issues.apache.org/jira/browse/YARN-8233
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Fix For: 3.3.0, 3.2.1
>
> Attachments: YARN-8233.001.branch-2.patch, 
> YARN-8233.001.branch-3.0.patch, YARN-8233.001.branch-3.1.patch, 
> YARN-8233.001.patch, YARN-8233.002.patch, YARN-8233.003.patch
>
>
> Recently we saw a NPE problem in CapacityScheduler#tryCommit when try to find 
> the attemptId by calling {{c.getAllocatedOrReservedContainer().get...}} from 
> an allocate/reserve proposal. But got null allocatedOrReservedContainer and 
> thrown NPE.
> Reference code:
> {code:java}
> // find the application to accept and apply the ResourceCommitRequest
> if (request.anythingAllocatedOrReserved()) {
>   ContainerAllocationProposal c =
>   request.getFirstAllocatedOrReservedContainer();
>   attemptId =
>   c.getAllocatedOrReservedContainer().getSchedulerApplicationAttempt()
>   .getApplicationAttemptId();   //NPE happens here
> } else { ...
> {code}
> The proposal was constructed in 
> {{CapacityScheduler#createResourceCommitRequest}} and 
> allocatedOrReservedContainer is possibly null in async-scheduling process 
> when node was lost or application was finished (details in 
> {{CapacityScheduler#getSchedulerContainer}}).
> Reference code:
> {code:java}
>   // Allocated something
>   List allocations =
>   csAssignment.getAssignmentInformation().getAllocationDetails();
>   if (!allocations.isEmpty()) {
> RMContainer rmContainer = allocations.get(0).rmContainer;
> allocated = new ContainerAllocationProposal<>(
> getSchedulerContainer(rmContainer, true),   //possibly null
> getSchedulerContainersToRelease(csAssignment),
> 
> getSchedulerContainer(csAssignment.getFulfilledReservedContainer(),
> false), csAssignment.getType(),
> csAssignment.getRequestLocalityType(),
> csAssignment.getSchedulingMode() != null ?
> csAssignment.getSchedulingMode() :
> SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY,
> csAssignment.getResource());
>   }
> {code}
> I think we should add null check for allocateOrReserveContainer before create 
> allocate/reserve proposals. Besides the allocation process has increase 
> unconfirmed resource of app when creating an allocate assignment, so if this 
> check is null, we should decrease the unconfirmed resource of live app.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8902) Add volume manager that manages CSI volume lifecycle

2018-11-07 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677981#comment-16677981
 ] 

Weiwei Yang commented on YARN-8902:
---

Hi [~sunilg]

Thanks for the review,
{quote}Does CsiAdaptorClientProtocol need to have an interface for unpublish 
volume?
{quote}
Yes, it will have unpublish volume too, that will be added when we work on 
unpublish stuff.
{quote}CsiAdaptorClientProtocol impl will be done when we do adapter code in nm 
?
{quote}
Correct, actually if you take a look at YARN-8953, it has a more detailed 
implementation. Also a sample workflow can be found here 
https://issues.apache.org/jira/secure/attachment/12947186/csi_adaptor_workflow.png.
{quote}In CsiConstants, there are duplicate issues.
{quote}
Removed 2nd one in v8 patch.
{quote}VolumeCapability.validateCapability checks only minCapacity? Do we need 
to define a range or something similar here.? I am also thinking whether we 
need to normalize unit with min or max capacity and keep a common value. Could 
help to avoid run time conversions.
{quote}
This validation is just an user-input validation, the real validation happens 
at CSI driver side. The resource min/max value is our capacity range (I have 
renamed VolumeCapacity to VolumeCapacityRange accordingly to avoid confusion). 
And I checked CSI spec, the capacity is specified as bytes. To be compatible 
with resource definition, we allow user to set units, but underneath we need to 
convert them to bytes (will handle this when integrating with adaptor code).

Apart from upon, v8 patch also simplifies the interface 
\{{CsiAdaptorClientProtocol}}, since this interface will be fully implemented 
in YARN-8953, lets keep as simple as possible here because it is only used for 
testing within this patch.

Hope it makes sense.

Thanks

> Add volume manager that manages CSI volume lifecycle
> 
>
> Key: YARN-8902
> URL: https://issues.apache.org/jira/browse/YARN-8902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8902.001.patch, YARN-8902.002.patch, 
> YARN-8902.003.patch, YARN-8902.004.patch, YARN-8902.005.patch, 
> YARN-8902.006.patch, YARN-8902.007.patch, YARN-8902.008.patch
>
>
> The CSI volume manager is a service running in RM process, that manages all 
> CSI volumes' lifecycle. The details about volume's lifecycle states can be 
> found in [CSI 
> spec|https://github.com/container-storage-interface/spec/blob/master/spec.md].
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8902) Add volume manager that manages CSI volume lifecycle

2018-11-07 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8902:
--
Attachment: YARN-8902.008.patch

> Add volume manager that manages CSI volume lifecycle
> 
>
> Key: YARN-8902
> URL: https://issues.apache.org/jira/browse/YARN-8902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8902.001.patch, YARN-8902.002.patch, 
> YARN-8902.003.patch, YARN-8902.004.patch, YARN-8902.005.patch, 
> YARN-8902.006.patch, YARN-8902.007.patch, YARN-8902.008.patch
>
>
> The CSI volume manager is a service running in RM process, that manages all 
> CSI volumes' lifecycle. The details about volume's lifecycle states can be 
> found in [CSI 
> spec|https://github.com/container-storage-interface/spec/blob/master/spec.md].
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8866) Fix a parsing error for crossdomain.xml

2018-11-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677935#comment-16677935
 ] 

Hudson commented on YARN-8866:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15382 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15382/])
YARN-8866. Fix a parsing error for crossdomain.xml. (tasanuma: rev 
8dc1f6dbf712a65390a9a6859f62fec0481af31b)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml


> Fix a parsing error for crossdomain.xml
> ---
>
> Key: YARN-8866
> URL: https://issues.apache.org/jira/browse/YARN-8866
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build, yarn-ui-v2
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1
>
> Attachments: YARN-8866.1.patch
>
>
> [QBT|https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/] reports 
> a parsing error for crossdomain.xml in hadoop-yarn-ui.
> {noformat}
> Parsing Error(s): 
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml
>  
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty

2018-11-07 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677917#comment-16677917
 ] 

Weiwei Yang commented on YARN-8984:
---

Hi [~fly_in_gis]

I suppose you should have set to use "scheduler" handler,

+ conf.set(YarnConfiguration.RM_PLACEMENT_CONSTRAINTS_HANDLER, "scheduler");

then I am not sure what is the difference to run it in 
\{{TestAMRMClientPlacementConstraints}} than a separate class. Could u pls take 
a look? Thanks

> AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
> --
>
> Key: YARN-8984
> URL: https://issues.apache.org/jira/browse/YARN-8984
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Critical
> Attachments: YARN-8984-001.patch, YARN-8984-002.patch
>
>
> In AMRMClient, outstandingSchedRequests should be removed or decreased when 
> container allocated. However, it could not work when allocation tag is null 
> or empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8945) Calculation of maximum applications should respect specified and global maximum applications for absolute resource

2018-11-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677914#comment-16677914
 ] 

Hadoop QA commented on YARN-8945:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 21s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}117m 38s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}173m 13s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-8945 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12945697/YARN-8945.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 169cd72c0fbf 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / addec29 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/22443/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/22443/testReport/ |
| Max. process+thread count | 933 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 

[jira] [Commented] (YARN-8866) Fix a parsing error for crossdomain.xml

2018-11-07 Thread Takanobu Asanuma (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677909#comment-16677909
 ] 

Takanobu Asanuma commented on YARN-8866:


Committed to branch-3.0, branch-3.1, branch-3.2, trunk.

> Fix a parsing error for crossdomain.xml
> ---
>
> Key: YARN-8866
> URL: https://issues.apache.org/jira/browse/YARN-8866
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build, yarn-ui-v2
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Attachments: YARN-8866.1.patch
>
>
> [QBT|https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/] reports 
> a parsing error for crossdomain.xml in hadoop-yarn-ui.
> {noformat}
> Parsing Error(s): 
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml
>  
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8866) Fix a parsing error for crossdomain.xml

2018-11-07 Thread Takanobu Asanuma (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677901#comment-16677901
 ] 

Takanobu Asanuma commented on YARN-8866:


Thanks for the review, [~leftnoteasy]! I'd like to commit it now.

> Fix a parsing error for crossdomain.xml
> ---
>
> Key: YARN-8866
> URL: https://issues.apache.org/jira/browse/YARN-8866
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build, yarn-ui-v2
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Attachments: YARN-8866.1.patch
>
>
> [QBT|https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/] reports 
> a parsing error for crossdomain.xml in hadoop-yarn-ui.
> {noformat}
> Parsing Error(s): 
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml
>  
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8880) Add configurations for pluggable plugin framework

2018-11-07 Thread Zhankun Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-8880:
---
Attachment: YARN-8880-trunk.003.patch

> Add configurations for pluggable plugin framework
> -
>
> Key: YARN-8880
> URL: https://issues.apache.org/jira/browse/YARN-8880
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8880-trunk.001.patch, YARN-8880-trunk.002.patch, 
> YARN-8880-trunk.003.patch
>
>
> Added two configurations for the pluggable device framework.
> {code:java}
> 
>  yarn.nodemanager.pluggable-device-framework.enabled
>  true/false
>  
>  
>  yarn.nodemanager.pluggable-device-framework.device-classes
>  com.cmp1.hdw1,...
>  {code}
> The admin needs to know the register resource name of every plugin classes 
> configured. And declare them in resource-types.xml.
> Please note that the count value defined in node-resource.xml will be 
> overridden by plugin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8898) Fix FederationInterceptor#allocate to set application priority in allocateResponse

2018-11-07 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677535#comment-16677535
 ] 

Bibin A Chundatt edited comment on YARN-8898 at 11/7/18 8:35 AM:
-

As per the current implementation UAM application to secondary clusters doesnt 
set priority,tags,type, etc.. .If these fields are not used during submit. 
getApplications api result, container allocation etc might not be as expected.

{quote}
 what are the client APIs that you are referring to
{quote}
Client API's - ApplicationClientProtocol(YarnClient API) and 
WebServiceProtcol(Rest API). For application specific/ Container specific API 
calls filters are based on few of the  above mentioned fields



was (Author: bibinchundatt):
{quote}
 what are the client APIs that you are referring to
{quote}
Client API's - ApplicationClientProtocol(YarnClient API) and 
WebServiceProtcol(Rest API). For application specific/ Container specific API 
calls filters are based on few of the  above mentioned fields


> Fix FederationInterceptor#allocate to set application priority in 
> allocateResponse
> --
>
> Key: YARN-8898
> URL: https://issues.apache.org/jira/browse/YARN-8898
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Major
>
> In case of FederationInterceptor#mergeAllocateResponses skips 
> application_priority in response returned



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8902) Add volume manager that manages CSI volume lifecycle

2018-11-07 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677829#comment-16677829
 ] 

Sunil Govindan commented on YARN-8902:
--

Hi [~cheersyang] . Thank you. Few initial comments

Does CsiAdaptorClientProtocol need to have an interface for unpublish volume?
 2. CsiAdaptorClientProtocol impl will be done when we do adapter code in nm ?
 3. In CsiConstants, there are duplicate issues.
{code:java}
32public static final String CSI_DRIVER_NAME = "driver.name";
34public static final String CSI_VOLUME_DRIVER_NAME = "driver.name";
{code}
4. Does VolumeCapability need to cover access mode's and access capabilites as 
well ?
 5. VolumeCapability.validateCapability checks only minCapacity? Do we need to 
define a range or something similar here.? I am also thinking whether we need 
to normalize unit with min or max capacity and keep a common value. Could help 
to avoid run time conversions.
 6. We can add Evolving and Unstable tags for all interfaces/classes which are 
public

> Add volume manager that manages CSI volume lifecycle
> 
>
> Key: YARN-8902
> URL: https://issues.apache.org/jira/browse/YARN-8902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8902.001.patch, YARN-8902.002.patch, 
> YARN-8902.003.patch, YARN-8902.004.patch, YARN-8902.005.patch, 
> YARN-8902.006.patch, YARN-8902.007.patch
>
>
> The CSI volume manager is a service running in RM process, that manages all 
> CSI volumes' lifecycle. The details about volume's lifecycle states can be 
> found in [CSI 
> spec|https://github.com/container-storage-interface/spec/blob/master/spec.md].
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size

2018-11-07 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677536#comment-16677536
 ] 

Bibin A Chundatt edited comment on YARN-8972 at 11/7/18 8:28 AM:
-

[~giovanni.fumarola] 

Advantage with interceptor at router side is, it will avoids router home 
cluster addition to federation Store , then submit to RM etc..
Since its optional lets add this.



was (Author: bibinchundatt):
[~giovanni.fumarola] 

I advantage with interceptor at router side is, it will avoids router home 
cluster addition, then submit to RM etc..
Since its optional lets add this.


> [Router] Add support to prevent DoS attack over ApplicationSubmissionContext 
> size
> -
>
> Key: YARN-8972
> URL: https://issues.apache.org/jira/browse/YARN-8972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8972.v1.patch, YARN-8972.v2.patch
>
>
> This jira tracks the effort to add a new interceptor in the Router to prevent 
> user to submit applications with oversized ASC.
> This avoid YARN cluster to failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty

2018-11-07 Thread Yang Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated YARN-8984:

Attachment: YARN-8984-002.patch

> AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
> --
>
> Key: YARN-8984
> URL: https://issues.apache.org/jira/browse/YARN-8984
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Critical
> Attachments: YARN-8984-001.patch, YARN-8984-002.patch
>
>
> In AMRMClient, outstandingSchedRequests should be removed or decreased when 
> container allocated. However, it could not work when allocation tag is null 
> or empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty

2018-11-07 Thread Yang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677802#comment-16677802
 ] 

Yang Wang commented on YARN-8984:
-

Hi, [~cheersyang]

I have tried to move the test to TestAMRMClientPlacementConstraints and found 
the case failed. Because containers could not be allocated when allocationTags 
is empty.

I think it is another issue about placement-processor.

> AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
> --
>
> Key: YARN-8984
> URL: https://issues.apache.org/jira/browse/YARN-8984
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Yang Wang
>Priority: Critical
> Attachments: YARN-8984-001.patch
>
>
> In AMRMClient, outstandingSchedRequests should be removed or decreased when 
> container allocated. However, it could not work when allocation tag is null 
> or empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org