[jira] [Commented] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size
[ https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679303#comment-16679303 ] Bibin A Chundatt commented on YARN-8972: [~giovanni.fumarola] ZK state store was validating the applicationStateData and ApplicationSubmissionContext is size check is applicable for other store implementations too , so i think we shouldn't reuse the property. Thoughts?? > [Router] Add support to prevent DoS attack over ApplicationSubmissionContext > size > - > > Key: YARN-8972 > URL: https://issues.apache.org/jira/browse/YARN-8972 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8972.v1.patch, YARN-8972.v2.patch, > YARN-8972.v3.patch, YARN-8972.v4.patch > > > This jira tracks the effort to add a new interceptor in the Router to prevent > user to submit applications with oversized ASC. > This avoid YARN cluster to failover. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8925) Updating distributed node attributes only when necessary
[ https://issues.apache.org/jira/browse/YARN-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679371#comment-16679371 ] Tao Yang edited comment on YARN-8925 at 11/8/18 7:23 AM: - Thanks [~cheersyang] for the review and comments. {quote}NodeLabelUtil#isNodeAttributesEquals if leftNodeAttributes is a subset of rightNodeAttributes seems also equals. And except for the name and value, we also need to compare prefix right? It would be good if we have a separate UT for this method, to verify various of cases. {quote} Comparison between two node attributes sets has considered set size through this clause: {{leftNodeAttributes.size() != rightNodeAttributes.size())}} , and considered attribute prefix in comparison inside for NodeAttributeKey (reference code is {{nodeAttributes.stream().anyMatch(e -> e.equals(checkNodeAttribute)}} in NodeLabelUtil#isNodeAttributeIncludes). Separate UT for this method makes sense to me, I will add it in next patch. {quote}HeartbeatSyncIfNeededHandler Can we rename this to CachedNodeDescriptorHandler? As this class caches the last value of node label/attribute and leverages the cache to reduce the overhead. {quote} Agree, CachedNodeDescriptorHandler is a better name. {quote}TestResourceTrackerService#testNodeRegistrationWithAttributes File tempDir = File.createTempFile("nattr", ".tmp"); can we put tmp dir under TEMP_DIR that to be consistent with rest of tests. {quote} Make sense to me, I copied this code from testNodeHeartbeatWithNodeAttributes and will update this method too. {quote}TestNodeStatusUpdaterForAttributes waitTillHeartbeat/waitTillHeartbeat can these methods be simplified with GenericTestUtils.waitFor? {quote} Make sense to me. I will upload a new patch a few hours later. was (Author: tao yang): Thanks [~cheersyang] for the review and comments. {quote}NodeLabelUtil#isNodeAttributesEquals if leftNodeAttributes is a subset of rightNodeAttributes seems also equals. And except for the name and value, we also need to compare prefix right? It would be good if we have a separate UT for this method, to verify various of cases. {quote} Comparison between two node attributes sets has considered set size through this clause: {{leftNodeAttributes.size() != rightNodeAttributes.size())}} , and considered attribute name in comparison for NodeAttribute (reference code is {{nodeAttributes.stream().anyMatch(e -> e.equals(checkNodeAttribute)}} in NodeLabelUtil#isNodeAttributeIncludes). Separate UT for this method makes sense to me, I will add it in next patch. {quote}HeartbeatSyncIfNeededHandler Can we rename this to CachedNodeDescriptorHandler? As this class caches the last value of node label/attribute and leverages the cache to reduce the overhead. {quote} Agree, CachedNodeDescriptorHandler is a better name. {quote}TestResourceTrackerService#testNodeRegistrationWithAttributes File tempDir = File.createTempFile("nattr", ".tmp"); can we put tmp dir under TEMP_DIR that to be consistent with rest of tests. {quote} Make sense to me, I copied this code from testNodeHeartbeatWithNodeAttributes and will update this method too. {quote}TestNodeStatusUpdaterForAttributes waitTillHeartbeat/waitTillHeartbeat can these methods be simplified with GenericTestUtils.waitFor? {quote} Make sense to me. I will upload a new patch a few hours later. > Updating distributed node attributes only when necessary > > > Key: YARN-8925 > URL: https://issues.apache.org/jira/browse/YARN-8925 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 3.2.1 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Labels: performance > Attachments: YARN-8925.001.patch, YARN-8925.002.patch, > YARN-8925.003.patch > > > Currently if distributed node attributes exist, even though there is no > change, updating for distributed node attributes will happen in every > heartbeat between NM and RM. Updating process will hold > NodeAttributesManagerImpl#writeLock and may have some influence in a large > cluster. We have found nodes UI of a large cluster is opened slowly and most > time it's waiting for the lock in NodeAttributesManagerImpl. I think this > updating should be called only when necessary to enhance the performance of > related process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8880) Add configurations for pluggable plugin framework
[ https://issues.apache.org/jira/browse/YARN-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679292#comment-16679292 ] Hudson commented on YARN-8880: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15385 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15385/]) YARN-8880. Add configurations for pluggable plugin framework. (wwei: rev f8c72d7b3acca8285bbc3024f491c4586805be1e) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/ResourcePluginManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/TestResourcePluginManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java > Add configurations for pluggable plugin framework > - > > Key: YARN-8880 > URL: https://issues.apache.org/jira/browse/YARN-8880 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-8880-trunk.001.patch, YARN-8880-trunk.002.patch, > YARN-8880-trunk.003.patch, YARN-8880-trunk.004.patch, > YARN-8880-trunk.005.patch > > > Added two configurations for the pluggable device framework. > {code:java} > > yarn.nodemanager.pluggable-device-framework.enabled > true/false > > > yarn.nodemanager.pluggable-device-framework.device-classes > com.cmp1.hdw1,... > {code} > The admin needs to know the register resource name of every plugin classes > configured. And declare them in resource-types.xml. > Please note that the count value defined in node-resource.xml will be > overridden by plugin. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null
[ https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679318#comment-16679318 ] Hadoop QA commented on YARN-8233: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-3.1 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 56s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 35s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 57s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 14s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 18s{color} | {color:green} branch-3.1 passed {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 16m 40s{color} | {color:red} branch has errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} branch-3.1 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 11m 19s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s{color} | {color:green} hadoop-project in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 27s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 93m 40s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:080e9d0 | | JIRA Issue | YARN-8233 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12947343/YARN-8233.001-branch-3.1-test.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs
[jira] [Updated] (YARN-8880) Add configurations for pluggable plugin framework
[ https://issues.apache.org/jira/browse/YARN-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8880: --- Attachment: YARN-8880-trunk.005.patch > Add configurations for pluggable plugin framework > - > > Key: YARN-8880 > URL: https://issues.apache.org/jira/browse/YARN-8880 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Attachments: YARN-8880-trunk.001.patch, YARN-8880-trunk.002.patch, > YARN-8880-trunk.003.patch, YARN-8880-trunk.004.patch, > YARN-8880-trunk.005.patch > > > Added two configurations for the pluggable device framework. > {code:java} > > yarn.nodemanager.pluggable-device-framework.enabled > true/false > > > yarn.nodemanager.pluggable-device-framework.device-classes > com.cmp1.hdw1,... > {code} > The admin needs to know the register resource name of every plugin classes > configured. And declare them in resource-types.xml. > Please note that the count value defined in node-resource.xml will be > overridden by plugin. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8985) FSParentQueue: debug log missing when assigning container
[ https://issues.apache.org/jira/browse/YARN-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679348#comment-16679348 ] Hadoop QA commented on YARN-8985: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 12m 21s{color} | {color:red} branch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 12m 12s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 22s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 36s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 53m 32s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSSchedulerNode) does not release lock on all exception paths At FSParentQueue.java:on all exception paths At FSParentQueue.java:[line 214] | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-8985 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12947346/YARN-8985.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 543364109e4e 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f8c72d7 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs |
[jira] [Commented] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null
[ https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679159#comment-16679159 ] Akira Ajisaka commented on YARN-8233: - Thanks [~Tao Yang] for the additional patches. They look good to me. bq. anything wrong on branch-3.1? Kicked precommit job again manually https://builds.apache.org/job/PreCommit-YARN-Build/22453/ > NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal > whose allocatedOrReservedContainer is null > - > > Key: YARN-8233 > URL: https://issues.apache.org/jira/browse/YARN-8233 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Fix For: 3.3.0, 3.2.1 > > Attachments: YARN-8233.001.branch-2.patch, > YARN-8233.001.branch-3.0.patch, YARN-8233.001.branch-3.1.patch, > YARN-8233.001.patch, YARN-8233.002.patch, YARN-8233.003.patch > > > Recently we saw a NPE problem in CapacityScheduler#tryCommit when try to find > the attemptId by calling {{c.getAllocatedOrReservedContainer().get...}} from > an allocate/reserve proposal. But got null allocatedOrReservedContainer and > thrown NPE. > Reference code: > {code:java} > // find the application to accept and apply the ResourceCommitRequest > if (request.anythingAllocatedOrReserved()) { > ContainerAllocationProposal c = > request.getFirstAllocatedOrReservedContainer(); > attemptId = > c.getAllocatedOrReservedContainer().getSchedulerApplicationAttempt() > .getApplicationAttemptId(); //NPE happens here > } else { ... > {code} > The proposal was constructed in > {{CapacityScheduler#createResourceCommitRequest}} and > allocatedOrReservedContainer is possibly null in async-scheduling process > when node was lost or application was finished (details in > {{CapacityScheduler#getSchedulerContainer}}). > Reference code: > {code:java} > // Allocated something > List allocations = > csAssignment.getAssignmentInformation().getAllocationDetails(); > if (!allocations.isEmpty()) { > RMContainer rmContainer = allocations.get(0).rmContainer; > allocated = new ContainerAllocationProposal<>( > getSchedulerContainer(rmContainer, true), //possibly null > getSchedulerContainersToRelease(csAssignment), > > getSchedulerContainer(csAssignment.getFulfilledReservedContainer(), > false), csAssignment.getType(), > csAssignment.getRequestLocalityType(), > csAssignment.getSchedulingMode() != null ? > csAssignment.getSchedulingMode() : > SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY, > csAssignment.getResource()); > } > {code} > I think we should add null check for allocateOrReserveContainer before create > allocate/reserve proposals. Besides the allocation process has increase > unconfirmed resource of app when creating an allocate assignment, so if this > check is null, we should decrease the unconfirmed resource of live app. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size
[ https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679064#comment-16679064 ] Íñigo Goiri commented on YARN-8972: --- {quote} No, there is no need to add options in yarn-default. {quote} {{TestYarnConfigurationFields}} disagrees. > [Router] Add support to prevent DoS attack over ApplicationSubmissionContext > size > - > > Key: YARN-8972 > URL: https://issues.apache.org/jira/browse/YARN-8972 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8972.v1.patch, YARN-8972.v2.patch, > YARN-8972.v3.patch, YARN-8972.v4.patch > > > This jira tracks the effort to add a new interceptor in the Router to prevent > user to submit applications with oversized ASC. > This avoid YARN cluster to failover. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size
[ https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679068#comment-16679068 ] Giovanni Matteo Fumarola edited comment on YARN-8972 at 11/7/18 11:30 PM: -- Thanks [~elgoiri] for the comment. There is no need to add options in yarn-default since I will reuse a configuration. The change was already done in [^YARN-8972.v4.patch] . was (Author: giovanni.fumarola): Thanks [~elgoiri] for the comment. There is no need to add options in yarn-default since I will reuse a configuration. > [Router] Add support to prevent DoS attack over ApplicationSubmissionContext > size > - > > Key: YARN-8972 > URL: https://issues.apache.org/jira/browse/YARN-8972 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8972.v1.patch, YARN-8972.v2.patch, > YARN-8972.v3.patch, YARN-8972.v4.patch > > > This jira tracks the effort to add a new interceptor in the Router to prevent > user to submit applications with oversized ASC. > This avoid YARN cluster to failover. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size
[ https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679068#comment-16679068 ] Giovanni Matteo Fumarola commented on YARN-8972: Thanks [~elgoiri] for the comment. There is no need to add options in yarn-default since I will reuse a configuration. > [Router] Add support to prevent DoS attack over ApplicationSubmissionContext > size > - > > Key: YARN-8972 > URL: https://issues.apache.org/jira/browse/YARN-8972 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8972.v1.patch, YARN-8972.v2.patch, > YARN-8972.v3.patch, YARN-8972.v4.patch > > > This jira tracks the effort to add a new interceptor in the Router to prevent > user to submit applications with oversized ASC. > This avoid YARN cluster to failover. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8933) [AMRMProxy] Fix potential empty AvailableResource and NumClusterNode in allocation response
[ https://issues.apache.org/jira/browse/YARN-8933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-8933: --- Attachment: YARN-8933.v3.patch > [AMRMProxy] Fix potential empty AvailableResource and NumClusterNode in > allocation response > --- > > Key: YARN-8933 > URL: https://issues.apache.org/jira/browse/YARN-8933 > Project: Hadoop YARN > Issue Type: Sub-task > Components: amrmproxy, federation >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > Attachments: YARN-8933.v1.patch, YARN-8933.v2.patch, > YARN-8933.v3.patch > > > After YARN-8696, the allocate response by FederationInterceptor is merged > from the responses from a random subset of all sub-clusters, depending on the > async heartbeat timing. As a result, cluster-wide information fields in the > response, e.g. AvailableResources and NumClusterNodes, are not consistent at > all. It can even be null/zero because the specific response is merged from an > empty set of sub-cluster responses. > In this patch, we let FederationInterceptor remember the last allocate > response from all known sub-clusters, and always construct the cluster-wide > info fields from all of them. We also moved sub-cluster timeout from > LocalityMulticastAMRMProxyPolicy to FederationInterceptor, so that > sub-clusters that expired (haven't had a successful allocate response for a > while) won't be included in the computation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size
[ https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8972: --- Attachment: YARN-8972.v4.patch > [Router] Add support to prevent DoS attack over ApplicationSubmissionContext > size > - > > Key: YARN-8972 > URL: https://issues.apache.org/jira/browse/YARN-8972 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8972.v1.patch, YARN-8972.v2.patch, > YARN-8972.v3.patch, YARN-8972.v4.patch > > > This jira tracks the effort to add a new interceptor in the Router to prevent > user to submit applications with oversized ASC. > This avoid YARN cluster to failover. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8933) [AMRMProxy] Fix potential empty fields in allocation response, move SubClusterTimeout to FederationInterceptor
[ https://issues.apache.org/jira/browse/YARN-8933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-8933: --- Summary: [AMRMProxy] Fix potential empty fields in allocation response, move SubClusterTimeout to FederationInterceptor (was: [AMRMProxy] Fix potential empty AvailableResource and NumClusterNode in allocation response) > [AMRMProxy] Fix potential empty fields in allocation response, move > SubClusterTimeout to FederationInterceptor > -- > > Key: YARN-8933 > URL: https://issues.apache.org/jira/browse/YARN-8933 > Project: Hadoop YARN > Issue Type: Sub-task > Components: amrmproxy, federation >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Major > Attachments: YARN-8933.v1.patch, YARN-8933.v2.patch, > YARN-8933.v3.patch > > > After YARN-8696, the allocate response by FederationInterceptor is merged > from the responses from a random subset of all sub-clusters, depending on the > async heartbeat timing. As a result, cluster-wide information fields in the > response, e.g. AvailableResources and NumClusterNodes, are not consistent at > all. It can even be null/zero because the specific response is merged from an > empty set of sub-cluster responses. > In this patch, we let FederationInterceptor remember the last allocate > response from all known sub-clusters, and always construct the cluster-wide > info fields from all of them. We also moved sub-cluster timeout from > LocalityMulticastAMRMProxyPolicy to FederationInterceptor, so that > sub-clusters that expired (haven't had a successful allocate response for a > while) won't be included in the computation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8983) YARN container with docker: hostname entry not in /etc/hosts
[ https://issues.apache.org/jira/browse/YARN-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678924#comment-16678924 ] Eric Yang commented on YARN-8983: - [~oliverhuh...@gmail.com] I don't know any reason that might cause the entry to be removed from YARN point of view. When docker is started with --net=host, the host /etc/hosts file is used. Is this a probable cause for the missing entry? > YARN container with docker: hostname entry not in /etc/hosts > > > Key: YARN-8983 > URL: https://issues.apache.org/jira/browse/YARN-8983 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.1 >Reporter: Keqiu Hu >Priority: Critical > > I'm experimenting to use Hadoop 2.9.1 to launch applications with docker > containers. Inside the container task, we try to get the hostname of the > container using > {code:java} > InetAddress.getLocalHost().getHostName(){code} > This works fine with LXC, however it throws the following exception when I > enable docker container using: > {code:java} > YARN_CONTAINER_RUNTIME_TYPE=docker > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=test4 > {code} > The exception: > > {noformat} > java.net.UnknownHostException: ctr-1541488751855-0023-01-03: > ctr-1541488751855-0023-01-03: Temporary failure in name resolution at > java.net.InetAddress.getLocalHost(InetAddress.java:1506) > at > com.linkedin.tony.TaskExecutor.registerAndGetClusterSpec(TaskExecutor.java:204) > > at com.linkedin.tony.TaskExecutor.main(TaskExecutor.java:109) Caused by: > java.net.UnknownHostException: ctr-1541488751855-0023-01-03: Temporary > failure in name resolution at > java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) > at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) > at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at > java.net.InetAddress.getLocalHost(InetAddress.java:1501) ... 2 more > {noformat} > > Did some research online, it seems to be related to missing entry in > /etc/hosts on the hostname. So I took a look at the /etc/hosts, it is missing > the entry : > {noformat} > pi@pi-aw:~/docker/$ docker ps > CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES > 71e3e9df8bc6 test4 "/entrypoint.sh bash..." 1 second ago Up Less than a > second container_1541488751855_0028_01_01 > 29d31f0327d1 test3 "/entrypoint.sh bash" 18 hours ago Up 18 hours > blissful_turing > pi@pi-aw:~/docker/$ de 71e3e9df8bc6 > groups: cannot find name for group ID 1000 > groups: cannot find name for group ID 116 > groups: cannot find name for group ID 126 > To run a command as administrator (user "root"), use "sudo ". > See "man sudo_root" for details. > pi@ctr-1541488751855-0028-01-01:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_01$ > cat /etc/hosts > 127.0.0.1 localhost > 192.168.0.14 pi-aw > # The following lines are desirable for IPv6 capable hosts > ::1 ip6-localhost ip6-loopback > fe00::0 ip6-localnet > ff00::0 ip6-mcastprefix > ff02::1 ip6-allnodes > ff02::2 ip6-allrouters > pi@ctr-1541488751855-0028-01-01:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_01$ > {noformat} > If I launch the image without YARN, I saw the entry in /etc/hosts: > {noformat} > pi@61f173f95631:~$ cat /etc/hosts > 127.0.0.1 localhost > ::1 localhost ip6-localhost ip6-loopback > fe00::0 ip6-localnet > ff00::0 ip6-mcastprefix > ff02::1 ip6-allnodes > ff02::2 ip6-allrouters > 172.17.0.3 61f173f95631 {noformat} > Here is my container-executor.cfg > {code:java} > 1 min.user.id=100 > 2 yarn.nodemanager.linux-container-executor.group=hadoop > 3 [docker] > 4 module.enabled=true > 5 docker.binary=/usr/bin/docker > 6 > docker.allowed.capabilities=SYS_CHROOT,MKNOD,SETFCAP,SETPCAP,FSETID,CHOWN,AUDIT_WRITE,SETGID,NET_RAW,FOWNER,SETUID,DAC_OVERRIDE,KILL,NET_BIND_SERVICE > 7 docker.allowed.networks=bridge,host,none > 8 > docker.allowed.rw-mounts=/tmp,/etc/hadoop/logs/,/private/etc/hadoop-2.9.1/logs/{code} > Since I'm using an older version of Hadoop 2.9.1, let me know if this is > something already fixed in later version :) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8983) YARN container with docker: hostname entry not in /etc/hosts
[ https://issues.apache.org/jira/browse/YARN-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678892#comment-16678892 ] Keqiu Hu commented on YARN-8983: Thanks guys for the replies! [~tangzhankun] That also misses a line like this: {code:java} 172.17.0.3 61f173f95631 {code} Which is a mapping between ip address and hostname (from /etc/hostname). I guess that is used by the Java networking's InetAddress.getLocalHost() API to get the local host name & ip address. [~eyang], yah, it is better to piggyback on RegistryDNS resolution for hostnames. However, as mentioned by you, it is only available post 3.x :(, which we can't upgrade to in short term. I'll check the Docker overlay network. I'm still curious why do we remove that [IP HOSTNAME] line from */etc/hosts* ? Is that intentional, cause by default if you launch a docker container, it is there. > YARN container with docker: hostname entry not in /etc/hosts > > > Key: YARN-8983 > URL: https://issues.apache.org/jira/browse/YARN-8983 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.1 >Reporter: Keqiu Hu >Priority: Critical > > I'm experimenting to use Hadoop 2.9.1 to launch applications with docker > containers. Inside the container task, we try to get the hostname of the > container using > {code:java} > InetAddress.getLocalHost().getHostName(){code} > This works fine with LXC, however it throws the following exception when I > enable docker container using: > {code:java} > YARN_CONTAINER_RUNTIME_TYPE=docker > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=test4 > {code} > The exception: > > {noformat} > java.net.UnknownHostException: ctr-1541488751855-0023-01-03: > ctr-1541488751855-0023-01-03: Temporary failure in name resolution at > java.net.InetAddress.getLocalHost(InetAddress.java:1506) > at > com.linkedin.tony.TaskExecutor.registerAndGetClusterSpec(TaskExecutor.java:204) > > at com.linkedin.tony.TaskExecutor.main(TaskExecutor.java:109) Caused by: > java.net.UnknownHostException: ctr-1541488751855-0023-01-03: Temporary > failure in name resolution at > java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) > at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) > at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at > java.net.InetAddress.getLocalHost(InetAddress.java:1501) ... 2 more > {noformat} > > Did some research online, it seems to be related to missing entry in > /etc/hosts on the hostname. So I took a look at the /etc/hosts, it is missing > the entry : > {noformat} > pi@pi-aw:~/docker/$ docker ps > CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES > 71e3e9df8bc6 test4 "/entrypoint.sh bash..." 1 second ago Up Less than a > second container_1541488751855_0028_01_01 > 29d31f0327d1 test3 "/entrypoint.sh bash" 18 hours ago Up 18 hours > blissful_turing > pi@pi-aw:~/docker/$ de 71e3e9df8bc6 > groups: cannot find name for group ID 1000 > groups: cannot find name for group ID 116 > groups: cannot find name for group ID 126 > To run a command as administrator (user "root"), use "sudo ". > See "man sudo_root" for details. > pi@ctr-1541488751855-0028-01-01:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_01$ > cat /etc/hosts > 127.0.0.1 localhost > 192.168.0.14 pi-aw > # The following lines are desirable for IPv6 capable hosts > ::1 ip6-localhost ip6-loopback > fe00::0 ip6-localnet > ff00::0 ip6-mcastprefix > ff02::1 ip6-allnodes > ff02::2 ip6-allrouters > pi@ctr-1541488751855-0028-01-01:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_01$ > {noformat} > If I launch the image without YARN, I saw the entry in /etc/hosts: > {noformat} > pi@61f173f95631:~$ cat /etc/hosts > 127.0.0.1 localhost > ::1 localhost ip6-localhost ip6-loopback > fe00::0 ip6-localnet > ff00::0 ip6-mcastprefix > ff02::1 ip6-allnodes > ff02::2 ip6-allrouters > 172.17.0.3 61f173f95631 {noformat} > Here is my container-executor.cfg > {code:java} > 1 min.user.id=100 > 2 yarn.nodemanager.linux-container-executor.group=hadoop > 3 [docker] > 4 module.enabled=true > 5 docker.binary=/usr/bin/docker > 6 > docker.allowed.capabilities=SYS_CHROOT,MKNOD,SETFCAP,SETPCAP,FSETID,CHOWN,AUDIT_WRITE,SETGID,NET_RAW,FOWNER,SETUID,DAC_OVERRIDE,KILL,NET_BIND_SERVICE > 7 docker.allowed.networks=bridge,host,none > 8 > docker.allowed.rw-mounts=/tmp,/etc/hadoop/logs/,/private/etc/hadoop-2.9.1/logs/{code} > Since I'm using an older version of Hadoop 2.9.1, let me know if this is > something already fixed in later version :) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (YARN-8983) YARN container with docker: hostname entry not in /etc/hosts
[ https://issues.apache.org/jira/browse/YARN-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678892#comment-16678892 ] Keqiu Hu edited comment on YARN-8983 at 11/7/18 10:33 PM: -- Thanks guys for the replies! [~tangzhankun] That also misses a line like this: {code:java} 172.17.0.3 61f173f95631 {code} Which is a mapping between ip address and hostname (from /etc/hostname). I guess that is used by the Java networking's InetAddress.getLocalHost() API to get the local host name & ip address. [~eyang], yah, it is better to piggyback on RegistryDNS resolution for hostnames. However, as mentioned by you, it is only available post 3.x :(, which we can't upgrade to in short term. I'll check the Docker overlay network. I'm still curious why do we remove that [IP HOSTNAME] line from */etc/hosts,* is that intentional? Cause by default if you launch a docker container, it is there. was (Author: oliverhuh...@gmail.com): Thanks guys for the replies! [~tangzhankun] That also misses a line like this: {code:java} 172.17.0.3 61f173f95631 {code} Which is a mapping between ip address and hostname (from /etc/hostname). I guess that is used by the Java networking's InetAddress.getLocalHost() API to get the local host name & ip address. [~eyang], yah, it is better to piggyback on RegistryDNS resolution for hostnames. However, as mentioned by you, it is only available post 3.x :(, which we can't upgrade to in short term. I'll check the Docker overlay network. I'm still curious why do we remove that [IP HOSTNAME] line from */etc/hosts* ? Is that intentional, cause by default if you launch a docker container, it is there. > YARN container with docker: hostname entry not in /etc/hosts > > > Key: YARN-8983 > URL: https://issues.apache.org/jira/browse/YARN-8983 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.1 >Reporter: Keqiu Hu >Priority: Critical > > I'm experimenting to use Hadoop 2.9.1 to launch applications with docker > containers. Inside the container task, we try to get the hostname of the > container using > {code:java} > InetAddress.getLocalHost().getHostName(){code} > This works fine with LXC, however it throws the following exception when I > enable docker container using: > {code:java} > YARN_CONTAINER_RUNTIME_TYPE=docker > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=test4 > {code} > The exception: > > {noformat} > java.net.UnknownHostException: ctr-1541488751855-0023-01-03: > ctr-1541488751855-0023-01-03: Temporary failure in name resolution at > java.net.InetAddress.getLocalHost(InetAddress.java:1506) > at > com.linkedin.tony.TaskExecutor.registerAndGetClusterSpec(TaskExecutor.java:204) > > at com.linkedin.tony.TaskExecutor.main(TaskExecutor.java:109) Caused by: > java.net.UnknownHostException: ctr-1541488751855-0023-01-03: Temporary > failure in name resolution at > java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) > at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) > at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at > java.net.InetAddress.getLocalHost(InetAddress.java:1501) ... 2 more > {noformat} > > Did some research online, it seems to be related to missing entry in > /etc/hosts on the hostname. So I took a look at the /etc/hosts, it is missing > the entry : > {noformat} > pi@pi-aw:~/docker/$ docker ps > CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES > 71e3e9df8bc6 test4 "/entrypoint.sh bash..." 1 second ago Up Less than a > second container_1541488751855_0028_01_01 > 29d31f0327d1 test3 "/entrypoint.sh bash" 18 hours ago Up 18 hours > blissful_turing > pi@pi-aw:~/docker/$ de 71e3e9df8bc6 > groups: cannot find name for group ID 1000 > groups: cannot find name for group ID 116 > groups: cannot find name for group ID 126 > To run a command as administrator (user "root"), use "sudo ". > See "man sudo_root" for details. > pi@ctr-1541488751855-0028-01-01:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_01$ > cat /etc/hosts > 127.0.0.1 localhost > 192.168.0.14 pi-aw > # The following lines are desirable for IPv6 capable hosts > ::1 ip6-localhost ip6-loopback > fe00::0 ip6-localnet > ff00::0 ip6-mcastprefix > ff02::1 ip6-allnodes > ff02::2 ip6-allrouters > pi@ctr-1541488751855-0028-01-01:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_01$ > {noformat} > If I launch the image without YARN, I saw the entry in /etc/hosts: > {noformat} > pi@61f173f95631:~$ cat /etc/hosts > 127.0.0.1 localhost > ::1 localhost ip6-localhost ip6-loopback > fe00::0 ip6-localnet > ff00::0 ip6-mcastprefix > ff02::1
[jira] [Commented] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size
[ https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678704#comment-16678704 ] Hadoop QA commented on YARN-8972: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 28s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 17s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 48s{color} | {color:red} hadoop-yarn-api in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 45s{color} | {color:green} hadoop-yarn-server-router in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 84m 15s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.conf.TestYarnConfigurationFields | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-8972 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12947264/YARN-8972.v3.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a5ce5b07ed37 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / c96cbe8 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | unit |
[jira] [Commented] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
[ https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678649#comment-16678649 ] Konstantinos Karanasos commented on YARN-8984: -- If I remember correctly, the Scheduling Requests are used only in case we have placement constraints (at least this was the initial design, not sure if things have changed recently). Given that, at successful container allocation, the Container Request of unconstrained requests will be properly removed through a different code-path. That said, if we want to start using Scheduling Request objects even without constraints (I don't see a reason why do this urgently, but we can do it in the long run), then I think we should fix the code. As [~cheersyang] said, I don't think the current fix will work, since the {{schedReqs}} will be null when there are no tags, and the map of {{outstandingSchedRequests}} has the allocation tags as key. Adding [~asuresh] (Arun, I think you worked on that code last, so let me know if I am missing something) and [~leftnoteasy]. > AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty > -- > > Key: YARN-8984 > URL: https://issues.apache.org/jira/browse/YARN-8984 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Yang Wang >Priority: Critical > Attachments: YARN-8984-001.patch, YARN-8984-002.patch, > YARN-8984-003.patch > > > In AMRMClient, outstandingSchedRequests should be removed or decreased when > container allocated. However, it could not work when allocation tag is null > or empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8963) Add flag to disable interactive shell
[ https://issues.apache.org/jira/browse/YARN-8963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang reassigned YARN-8963: --- Assignee: Eric Yang > Add flag to disable interactive shell > - > > Key: YARN-8963 > URL: https://issues.apache.org/jira/browse/YARN-8963 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8963.001.patch > > > For some production job, application admin might choose to disable debugging > to production jobs to prevent developer or system admin from accessing the > containers. It would be nice to add an environment variable flag to disable > interactive shell during application submission. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8962) Add ability to use interactive shell with normal yarn container
[ https://issues.apache.org/jira/browse/YARN-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang reassigned YARN-8962: --- Assignee: Eric Yang > Add ability to use interactive shell with normal yarn container > --- > > Key: YARN-8962 > URL: https://issues.apache.org/jira/browse/YARN-8962 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8962.001.patch, YARN-8962.002.patch > > > This task is focusing on extending interactive shell capability to yarn > container without docker. This will improve some aspect of debugging > mapreduce or spark applications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
[ https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678606#comment-16678606 ] Botong Huang edited comment on YARN-8984 at 11/7/18 6:27 PM: - Took a quick look. It is expected for AMRMClient to re-send all outstanding/pending request after an RM master-slave switch. When a container is allocated, we should remove it from the outstanding list, which is exactly what _removeFromOutstandingSchedulingRequests()_ is doing here. If we are not cleaning it up properly, very likely is because RM is not feeding in the proper allocationTags in the allocated _Container_ object? So we need to fix this instead of removing the null check here? was (Author: botong): Took a quick look. It is expected for AMRMClient to re-send all pending request after an RM failover. Whenever a container is allocated, we should remove it from the pending list, which is exactly what _removeFromOutstandingSchedulingRequests()_ is doing here. If we are not cleaning it up properly, very likely is it because RM is not feeding in the proper allocationTags in the allocated Container? So we need to fix this instead of removing the null check here? > AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty > -- > > Key: YARN-8984 > URL: https://issues.apache.org/jira/browse/YARN-8984 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Yang Wang >Priority: Critical > Attachments: YARN-8984-001.patch, YARN-8984-002.patch, > YARN-8984-003.patch > > > In AMRMClient, outstandingSchedRequests should be removed or decreased when > container allocated. However, it could not work when allocation tag is null > or empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size
[ https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678607#comment-16678607 ] Giovanni Matteo Fumarola commented on YARN-8972: Thanks [~bibinchundatt]. I pushed [^YARN-8972.v3.patch] with the whitespace fix. > [Router] Add support to prevent DoS attack over ApplicationSubmissionContext > size > - > > Key: YARN-8972 > URL: https://issues.apache.org/jira/browse/YARN-8972 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8972.v1.patch, YARN-8972.v2.patch, > YARN-8972.v3.patch > > > This jira tracks the effort to add a new interceptor in the Router to prevent > user to submit applications with oversized ASC. > This avoid YARN cluster to failover. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size
[ https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8972: --- Attachment: YARN-8972.v3.patch > [Router] Add support to prevent DoS attack over ApplicationSubmissionContext > size > - > > Key: YARN-8972 > URL: https://issues.apache.org/jira/browse/YARN-8972 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8972.v1.patch, YARN-8972.v2.patch, > YARN-8972.v3.patch > > > This jira tracks the effort to add a new interceptor in the Router to prevent > user to submit applications with oversized ASC. > This avoid YARN cluster to failover. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
[ https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678606#comment-16678606 ] Botong Huang commented on YARN-8984: Took a quick look. It is expected for AMRMClient to re-send all pending request after an RM failover. Whenever a container is allocated, we should remove it from the pending list, which is exactly what _removeFromOutstandingSchedulingRequests()_ is doing here. If we are not cleaning it up properly, very likely is it because RM is not feeding in the proper allocationTags in the allocated Container? So we need to fix this instead of removing the null check here? > AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty > -- > > Key: YARN-8984 > URL: https://issues.apache.org/jira/browse/YARN-8984 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Yang Wang >Priority: Critical > Attachments: YARN-8984-001.patch, YARN-8984-002.patch, > YARN-8984-003.patch > > > In AMRMClient, outstandingSchedRequests should be removed or decreased when > container allocated. However, it could not work when allocation tag is null > or empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8914) Add xtermjs to YARN UI2
[ https://issues.apache.org/jira/browse/YARN-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678592#comment-16678592 ] Eric Yang commented on YARN-8914: - Patch 006 added user.name query parameter for non-secure cluster. > Add xtermjs to YARN UI2 > --- > > Key: YARN-8914 > URL: https://issues.apache.org/jira/browse/YARN-8914 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-ui-v2 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8914.001.patch, YARN-8914.002.patch, > YARN-8914.003.patch, YARN-8914.004.patch, YARN-8914.005.patch, > YARN-8914.006.patch > > > In the container listing from UI2, we can add a link to connect to docker > container using xtermjs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8914) Add xtermjs to YARN UI2
[ https://issues.apache.org/jira/browse/YARN-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-8914: Attachment: YARN-8914.006.patch > Add xtermjs to YARN UI2 > --- > > Key: YARN-8914 > URL: https://issues.apache.org/jira/browse/YARN-8914 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-ui-v2 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-8914.001.patch, YARN-8914.002.patch, > YARN-8914.003.patch, YARN-8914.004.patch, YARN-8914.005.patch, > YARN-8914.006.patch > > > In the container listing from UI2, we can add a link to connect to docker > container using xtermjs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8978) For fair scheduler, application with higher priority should also get priority resources for running AM
[ https://issues.apache.org/jira/browse/YARN-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678577#comment-16678577 ] Yufei Gu commented on YARN-8978: [~qiuliang988], not sure if you still need this jira but you shouldn't make it as "fixed". Please make it as "invalid/won't fix" if you don't need it. > For fair scheduler, application with higher priority should also get priority > resources for running AM > -- > > Key: YARN-8978 > URL: https://issues.apache.org/jira/browse/YARN-8978 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: qiuliang >Priority: Major > Attachments: YARN-8978.001.patch > > > In order to allow important applications to run earlier, we used priority > scheduling in the fair scheduler, and FairSharePolicy uses YARN-6307. > Considering this situation, there are two applications (with different > priorities) in the same queue and both are accepted. Both applications are > demanding and hungry when dispatched to the queue. Next, calculate the weight > ratio. Since the used resources of both applications are 0, the weight ratio > is also 0. The priority is invalid in this case. Low-priority applications > may get resources to run AM earlier than high-priority applications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out
[ https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678585#comment-16678585 ] Chandni Singh commented on YARN-8672: - [~eyang] yes I looked into having the token filename as an optional argument. The problem with that is right now the last argument to {{ContainerLocalizer}} is a list of local dirs. Local dirs are from {{argv[5]...argv.length}}. This is why we cannot make token file an optional argument because the optional argument has to go to the end but then the program will not know if it is a local dir or token file. If we have to make it optional, we have to either do a hack, for eg. if the last argument is "tokenFileName=filename" then it is a token file otherwise it is a local dir. Or we change the way arguments are parsed by {{ContainerLocalizer}}, that is, have flags so that order of arguments don't matter. This will be backward incompatible. I think, for now, making the argument mandatory will be better. > TestContainerManager#testLocalingResourceWhileContainerRunning occasionally > times out > - > > Key: YARN-8672 > URL: https://issues.apache.org/jira/browse/YARN-8672 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: Jason Lowe >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8672.001.patch, YARN-8672.002.patch, > YARN-8672.003.patch, YARN-8672.004.patch, YARN-8672.005.patch > > > Precommit builds have been failing in > TestContainerManager#testLocalingResourceWhileContainerRunning. I have been > able to reproduce the problem without any patch applied if I run the test > enough times. It looks like something is removing container tokens from the > nmPrivate area just as a new localizer starts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8880) Add configurations for pluggable plugin framework
[ https://issues.apache.org/jira/browse/YARN-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678575#comment-16678575 ] Hadoop QA commented on YARN-8880: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 7s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 54s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 49s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 28s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 222 unchanged - 3 fixed = 223 total (was 225) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 8s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 46s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 20s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 46s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}112m 44s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-8880 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12947245/YARN-8880-trunk.004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 54e8a5fc8ddb 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64
[jira] [Commented] (YARN-8672) TestContainerManager#testLocalingResourceWhileContainerRunning occasionally times out
[ https://issues.apache.org/jira/browse/YARN-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678556#comment-16678556 ] Eric Yang commented on YARN-8672: - Option 2 seems like a safer approach to address this issue. We only need to change filename in a few place where we know that rapid creation and deletion of token file used during localization. Would it be possible that if token filename is not given, it would use the default pattern? Existing applications depend on containerid.tokens in working directory, can maintain backward compatibility. > TestContainerManager#testLocalingResourceWhileContainerRunning occasionally > times out > - > > Key: YARN-8672 > URL: https://issues.apache.org/jira/browse/YARN-8672 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: Jason Lowe >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8672.001.patch, YARN-8672.002.patch, > YARN-8672.003.patch, YARN-8672.004.patch, YARN-8672.005.patch > > > Precommit builds have been failing in > TestContainerManager#testLocalingResourceWhileContainerRunning. I have been > able to reproduce the problem without any patch applied if I run the test > enough times. It looks like something is removing container tokens from the > nmPrivate area just as a new localizer starts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
[ https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678554#comment-16678554 ] Botong Huang commented on YARN-8984: +[~kkaranasos] > AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty > -- > > Key: YARN-8984 > URL: https://issues.apache.org/jira/browse/YARN-8984 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Yang Wang >Priority: Critical > Attachments: YARN-8984-001.patch, YARN-8984-002.patch, > YARN-8984-003.patch > > > In AMRMClient, outstandingSchedRequests should be removed or decreased when > container allocated. However, it could not work when allocation tag is null > or empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8983) YARN container with docker: hostname entry not in /etc/hosts
[ https://issues.apache.org/jira/browse/YARN-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678519#comment-16678519 ] Eric Yang commented on YARN-8983: - [~oliverhuh...@gmail.com] [~tangzhankun] The recommendation is to use RegistryDNS to manage hostname or docker overlay network which comes with it's own built-in DNS. This is because the hostname can change more frequently between peers, and there is no easy way to update /etc/hosts once the docker container is running. RegistryDNS only exists in Hadoop 3+, and require YARN service AM to populate the information. Therefore, your best bet would be using Docker overlay network with built-in DNS on Hadoop 2.9.1. > YARN container with docker: hostname entry not in /etc/hosts > > > Key: YARN-8983 > URL: https://issues.apache.org/jira/browse/YARN-8983 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.1 >Reporter: Keqiu Hu >Priority: Critical > > I'm experimenting to use Hadoop 2.9.1 to launch applications with docker > containers. Inside the container task, we try to get the hostname of the > container using > {code:java} > InetAddress.getLocalHost().getHostName(){code} > This works fine with LXC, however it throws the following exception when I > enable docker container using: > {code:java} > YARN_CONTAINER_RUNTIME_TYPE=docker > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=test4 > {code} > The exception: > > {noformat} > java.net.UnknownHostException: ctr-1541488751855-0023-01-03: > ctr-1541488751855-0023-01-03: Temporary failure in name resolution at > java.net.InetAddress.getLocalHost(InetAddress.java:1506) > at > com.linkedin.tony.TaskExecutor.registerAndGetClusterSpec(TaskExecutor.java:204) > > at com.linkedin.tony.TaskExecutor.main(TaskExecutor.java:109) Caused by: > java.net.UnknownHostException: ctr-1541488751855-0023-01-03: Temporary > failure in name resolution at > java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) > at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) > at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at > java.net.InetAddress.getLocalHost(InetAddress.java:1501) ... 2 more > {noformat} > > Did some research online, it seems to be related to missing entry in > /etc/hosts on the hostname. So I took a look at the /etc/hosts, it is missing > the entry : > {noformat} > pi@pi-aw:~/docker/$ docker ps > CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES > 71e3e9df8bc6 test4 "/entrypoint.sh bash..." 1 second ago Up Less than a > second container_1541488751855_0028_01_01 > 29d31f0327d1 test3 "/entrypoint.sh bash" 18 hours ago Up 18 hours > blissful_turing > pi@pi-aw:~/docker/$ de 71e3e9df8bc6 > groups: cannot find name for group ID 1000 > groups: cannot find name for group ID 116 > groups: cannot find name for group ID 126 > To run a command as administrator (user "root"), use "sudo ". > See "man sudo_root" for details. > pi@ctr-1541488751855-0028-01-01:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_01$ > cat /etc/hosts > 127.0.0.1 localhost > 192.168.0.14 pi-aw > # The following lines are desirable for IPv6 capable hosts > ::1 ip6-localhost ip6-loopback > fe00::0 ip6-localnet > ff00::0 ip6-mcastprefix > ff02::1 ip6-allnodes > ff02::2 ip6-allrouters > pi@ctr-1541488751855-0028-01-01:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_01$ > {noformat} > If I launch the image without YARN, I saw the entry in /etc/hosts: > {noformat} > pi@61f173f95631:~$ cat /etc/hosts > 127.0.0.1 localhost > ::1 localhost ip6-localhost ip6-loopback > fe00::0 ip6-localnet > ff00::0 ip6-mcastprefix > ff02::1 ip6-allnodes > ff02::2 ip6-allrouters > 172.17.0.3 61f173f95631 {noformat} > Here is my container-executor.cfg > {code:java} > 1 min.user.id=100 > 2 yarn.nodemanager.linux-container-executor.group=hadoop > 3 [docker] > 4 module.enabled=true > 5 docker.binary=/usr/bin/docker > 6 > docker.allowed.capabilities=SYS_CHROOT,MKNOD,SETFCAP,SETPCAP,FSETID,CHOWN,AUDIT_WRITE,SETGID,NET_RAW,FOWNER,SETUID,DAC_OVERRIDE,KILL,NET_BIND_SERVICE > 7 docker.allowed.networks=bridge,host,none > 8 > docker.allowed.rw-mounts=/tmp,/etc/hadoop/logs/,/private/etc/hadoop-2.9.1/logs/{code} > Since I'm using an older version of Hadoop 2.9.1, let me know if this is > something already fixed in later version :) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8925) Updating distributed node attributes only when necessary
[ https://issues.apache.org/jira/browse/YARN-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678431#comment-16678431 ] Weiwei Yang edited comment on YARN-8925 at 11/7/18 3:58 PM: Hi [~Tao Yang] Thanks for the patch. It's a nice refactor in {{NodeStatusUpdaterImpl}}, looks pretty good. And also thanks for adding unit tests, the coverage seems good too. Some comments, *NodeLabelUtil#isNodeAttributesEquals* if {{leftNodeAttributes}} is a subset of {{rightNodeAttributes}} seems also equals. And except for the name and value, we also need to compare prefix right? It would be good if we have a separate UT for this method, to verify various of cases. *HeartbeatSyncIfNeededHandler* Can we rename this to {{CachedNodeDescriptorHandler}}? As this class caches the last value of node label/attribute and leverages the cache to reduce the overhead. *TestResourceTrackerService#testNodeRegistrationWithAttributes* {code:java} File tempDir = File.createTempFile("nattr", ".tmp"); {code} can we put tmp dir under {{TEMP_DIR}} that to be consistent with rest of tests. *TestNodeStatusUpdaterForAttributes* waitTillHeartbeat/waitTillHeartbeat can these methods be simplified with GenericTestUtils.waitFor? Thanks was (Author: cheersyang): Hi [~Tao Yang] Thanks for the patch. It's a nice refactor in {{NodeStatusUpdaterImpl}}, looks pretty good. And also thanks for adding unit tests, the coverage seems good too. Some comments, *NodeLabelUtil#isNodeAttributesEquals* if {{leftNodeAttributes}} is a subset of \{{rightNodeAttributes}} seems also equals. And except for the name and value, we also need to compare prefix right? It would be good if we have a separate UT for this method, to verify various of cases. *HeartbeatSyncIfNeededHandler* Can we rename this to \{{CachedNodeDescriptorHandler}}? As this class caches the last value of node label/attribute and leverages the cache to reduce the overhead. *TestResourceTrackerService#testNodeRegistrationWithAttributes* {code} File tempDir = File.createTempFile("nattr", ".tmp"); {code} can we put tmp dir under \{{TEMP_DIR}} that to be consistent with rest of tests. *TestNodeStatusUpdaterForAttributes* waitTillHeartbeat/waitTillHeartbeat can these methods be simplified with GenericTestUtils.waitFor? Thanks > Updating distributed node attributes only when necessary > > > Key: YARN-8925 > URL: https://issues.apache.org/jira/browse/YARN-8925 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 3.2.1 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Labels: performance > Attachments: YARN-8925.001.patch, YARN-8925.002.patch, > YARN-8925.003.patch > > > Currently if distributed node attributes exist, even though there is no > change, updating for distributed node attributes will happen in every > heartbeat between NM and RM. Updating process will hold > NodeAttributesManagerImpl#writeLock and may have some influence in a large > cluster. We have found nodes UI of a large cluster is opened slowly and most > time it's waiting for the lock in NodeAttributesManagerImpl. I think this > updating should be called only when necessary to enhance the performance of > related process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8880) Add configurations for pluggable plugin framework
[ https://issues.apache.org/jira/browse/YARN-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678434#comment-16678434 ] Zhankun Tang commented on YARN-8880: {quote}"This settings" -> "setting" {quote} Fixed. {quote}ResourcePluginManager#initializePluggableDevicePlugins currently only has the null check. will more checks be added here? Just want to make sure initializing problems can be found as early as possible, like class type check etc. {quote} Yeah. Will add more tests in the next basic framework patch. It will check the type and method implemented. {quote}TestResourcePluginManager: can we make sure mock NMs are stopped in each test cases? {quote} The NM will be stopped in "teardown", is this ok? {quote}Checkstyle issues need to be fixed {quote} Fixed. {quote}And there seems to have some unnecessary changes, e.g {quote} Fixed > Add configurations for pluggable plugin framework > - > > Key: YARN-8880 > URL: https://issues.apache.org/jira/browse/YARN-8880 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Attachments: YARN-8880-trunk.001.patch, YARN-8880-trunk.002.patch, > YARN-8880-trunk.003.patch, YARN-8880-trunk.004.patch > > > Added two configurations for the pluggable device framework. > {code:java} > > yarn.nodemanager.pluggable-device-framework.enabled > true/false > > > yarn.nodemanager.pluggable-device-framework.device-classes > com.cmp1.hdw1,... > {code} > The admin needs to know the register resource name of every plugin classes > configured. And declare them in resource-types.xml. > Please note that the count value defined in node-resource.xml will be > overridden by plugin. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8925) Updating distributed node attributes only when necessary
[ https://issues.apache.org/jira/browse/YARN-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678431#comment-16678431 ] Weiwei Yang commented on YARN-8925: --- Hi [~Tao Yang] Thanks for the patch. It's a nice refactor in {{NodeStatusUpdaterImpl}}, looks pretty good. And also thanks for adding unit tests, the coverage seems good too. Some comments, *NodeLabelUtil#isNodeAttributesEquals* if {{leftNodeAttributes}} is a subset of \{{rightNodeAttributes}} seems also equals. And except for the name and value, we also need to compare prefix right? It would be good if we have a separate UT for this method, to verify various of cases. *HeartbeatSyncIfNeededHandler* Can we rename this to \{{CachedNodeDescriptorHandler}}? As this class caches the last value of node label/attribute and leverages the cache to reduce the overhead. *TestResourceTrackerService#testNodeRegistrationWithAttributes* {code} File tempDir = File.createTempFile("nattr", ".tmp"); {code} can we put tmp dir under \{{TEMP_DIR}} that to be consistent with rest of tests. *TestNodeStatusUpdaterForAttributes* waitTillHeartbeat/waitTillHeartbeat can these methods be simplified with GenericTestUtils.waitFor? Thanks > Updating distributed node attributes only when necessary > > > Key: YARN-8925 > URL: https://issues.apache.org/jira/browse/YARN-8925 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 3.2.1 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Labels: performance > Attachments: YARN-8925.001.patch, YARN-8925.002.patch, > YARN-8925.003.patch > > > Currently if distributed node attributes exist, even though there is no > change, updating for distributed node attributes will happen in every > heartbeat between NM and RM. Updating process will hold > NodeAttributesManagerImpl#writeLock and may have some influence in a large > cluster. We have found nodes UI of a large cluster is opened slowly and most > time it's waiting for the lock in NodeAttributesManagerImpl. I think this > updating should be called only when necessary to enhance the performance of > related process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8880) Add configurations for pluggable plugin framework
[ https://issues.apache.org/jira/browse/YARN-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8880: --- Attachment: YARN-8880-trunk.004.patch > Add configurations for pluggable plugin framework > - > > Key: YARN-8880 > URL: https://issues.apache.org/jira/browse/YARN-8880 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Attachments: YARN-8880-trunk.001.patch, YARN-8880-trunk.002.patch, > YARN-8880-trunk.003.patch, YARN-8880-trunk.004.patch > > > Added two configurations for the pluggable device framework. > {code:java} > > yarn.nodemanager.pluggable-device-framework.enabled > true/false > > > yarn.nodemanager.pluggable-device-framework.device-classes > com.cmp1.hdw1,... > {code} > The admin needs to know the register resource name of every plugin classes > configured. And declare them in resource-types.xml. > Please note that the count value defined in node-resource.xml will be > overridden by plugin. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8977) Remove unnecessary type casting when calling AbstractYarnScheduler#getSchedulerNode
[ https://issues.apache.org/jira/browse/YARN-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678342#comment-16678342 ] Weiwei Yang commented on YARN-8977: --- Committed to trunk, cherry picked to branch-3.0, branch-3.1 and branch-3.2. Fixed in all 3.x streams. Thanks for the contribution [~jiwq]. > Remove unnecessary type casting when calling > AbstractYarnScheduler#getSchedulerNode > --- > > Key: YARN-8977 > URL: https://issues.apache.org/jira/browse/YARN-8977 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Trivial > Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1 > > Attachments: YARN-8977.001.patch, YARN-8977.002.patch > > > Due to the AbstractYarnScheduler#getSchedulerNode method return the generic > type, so I think don't need explicit type. > I found this issue in CapacityScheduler class. The warning message like: > {quote}Casting 'getSchedulerNode( nonKillableContainer.getAllocatedNode())' > to 'FiCaSchedulerNode' is redundant > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8977) Remove unnecessary type casting when calling AbstractYarnScheduler#getSchedulerNode
[ https://issues.apache.org/jira/browse/YARN-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678349#comment-16678349 ] Wanqiang Ji commented on YARN-8977: --- Thanks for your review and works [~cheersyang] > Remove unnecessary type casting when calling > AbstractYarnScheduler#getSchedulerNode > --- > > Key: YARN-8977 > URL: https://issues.apache.org/jira/browse/YARN-8977 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Trivial > Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1 > > Attachments: YARN-8977.001.patch, YARN-8977.002.patch > > > Due to the AbstractYarnScheduler#getSchedulerNode method return the generic > type, so I think don't need explicit type. > I found this issue in CapacityScheduler class. The warning message like: > {quote}Casting 'getSchedulerNode( nonKillableContainer.getAllocatedNode())' > to 'FiCaSchedulerNode' is redundant > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8977) Remove unnecessary type casting when calling AbstractYarnScheduler#getSchedulerNode
[ https://issues.apache.org/jira/browse/YARN-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678343#comment-16678343 ] Hudson commented on YARN-8977: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15384 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15384/]) YARN-8977. Remove unnecessary type casting when calling (wwei: rev c96cbe8659587cfc114a96aab1be5cc85029fe44) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestContinuousScheduling.java > Remove unnecessary type casting when calling > AbstractYarnScheduler#getSchedulerNode > --- > > Key: YARN-8977 > URL: https://issues.apache.org/jira/browse/YARN-8977 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Trivial > Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1 > > Attachments: YARN-8977.001.patch, YARN-8977.002.patch > > > Due to the AbstractYarnScheduler#getSchedulerNode method return the generic > type, so I think don't need explicit type. > I found this issue in CapacityScheduler class. The warning message like: > {quote}Casting 'getSchedulerNode( nonKillableContainer.getAllocatedNode())' > to 'FiCaSchedulerNode' is redundant > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8977) Remove unnecessary type casting when calling AbstractYarnScheduler#getSchedulerNode
[ https://issues.apache.org/jira/browse/YARN-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8977: -- Fix Version/s: (was: 3.0.2) 3.0.4 > Remove unnecessary type casting when calling > AbstractYarnScheduler#getSchedulerNode > --- > > Key: YARN-8977 > URL: https://issues.apache.org/jira/browse/YARN-8977 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Trivial > Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1 > > Attachments: YARN-8977.001.patch, YARN-8977.002.patch > > > Due to the AbstractYarnScheduler#getSchedulerNode method return the generic > type, so I think don't need explicit type. > I found this issue in CapacityScheduler class. The warning message like: > {quote}Casting 'getSchedulerNode( nonKillableContainer.getAllocatedNode())' > to 'FiCaSchedulerNode' is redundant > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8977) Remove unnecessary type casting when calling AbstractYarnScheduler#getSchedulerNode
[ https://issues.apache.org/jira/browse/YARN-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8977: -- Fix Version/s: 3.0.2 > Remove unnecessary type casting when calling > AbstractYarnScheduler#getSchedulerNode > --- > > Key: YARN-8977 > URL: https://issues.apache.org/jira/browse/YARN-8977 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Trivial > Fix For: 3.0.2, 3.1.2, 3.3.0, 3.2.1 > > Attachments: YARN-8977.001.patch, YARN-8977.002.patch > > > Due to the AbstractYarnScheduler#getSchedulerNode method return the generic > type, so I think don't need explicit type. > I found this issue in CapacityScheduler class. The warning message like: > {quote}Casting 'getSchedulerNode( nonKillableContainer.getAllocatedNode())' > to 'FiCaSchedulerNode' is redundant > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8977) Remove unnecessary type casting when calling AbstractYarnScheduler#getSchedulerNode
[ https://issues.apache.org/jira/browse/YARN-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8977: -- Fix Version/s: 3.1.2 > Remove unnecessary type casting when calling > AbstractYarnScheduler#getSchedulerNode > --- > > Key: YARN-8977 > URL: https://issues.apache.org/jira/browse/YARN-8977 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Trivial > Fix For: 3.1.2, 3.3.0, 3.2.1 > > Attachments: YARN-8977.001.patch, YARN-8977.002.patch > > > Due to the AbstractYarnScheduler#getSchedulerNode method return the generic > type, so I think don't need explicit type. > I found this issue in CapacityScheduler class. The warning message like: > {quote}Casting 'getSchedulerNode( nonKillableContainer.getAllocatedNode())' > to 'FiCaSchedulerNode' is redundant > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8880) Add configurations for pluggable plugin framework
[ https://issues.apache.org/jira/browse/YARN-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678245#comment-16678245 ] Weiwei Yang edited comment on YARN-8880 at 11/7/18 2:52 PM: Hi [~tangzhankun] Thanks for the patch. It looks good. Some small nits # "This settings" -> "setting" # ResourcePluginManager#initializePluggableDevicePlugins currently only has the null check. will more checks be added here? Just want to make sure initializing problems can be found as early as possible, like class type check etc. # TestResourcePluginManager: can we make sure mock NMs are stopped in each test cases? # Checkstyle issues need to be fixed And there seems to have some unnecessary changes, e.g {code:java} - ((NMContext)this.getNMContext()).setResourcePluginManager(rpm); + ((NMContext) this.getNMContext()).setResourcePluginManager(rpm); {code} and {code:java} - metrics, diskhandler); + metrics, diskhandler); {code} thanks. was (Author: cheersyang): Hi [~tangzhankun] Thanks for the patch. It looks good. Some small nits # "This settings" -> "setting" # ResourcePluginManager#initializePluggableDevicePlugins currently only has the null check. will more checks be added here? Just want to make sure initializing problems can be found as early as possible, like class type check etc. # TestResourcePluginManager: can we make sure mock NMs are stopped in each test cases? # Checkstyle issues need to be fixed And there seems to have some unnecessary changes, e.g {code} - ((NMContext)this.getNMContext()).setResourcePluginManager(rpm); + ((NMContext) this.getNMContext()).setResourcePluginManager(rpm); {code} and {code} - metrics, diskhandler); + metrics, diskhandler); {code} thanks. > Add configurations for pluggable plugin framework > - > > Key: YARN-8880 > URL: https://issues.apache.org/jira/browse/YARN-8880 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Attachments: YARN-8880-trunk.001.patch, YARN-8880-trunk.002.patch, > YARN-8880-trunk.003.patch > > > Added two configurations for the pluggable device framework. > {code:java} > > yarn.nodemanager.pluggable-device-framework.enabled > true/false > > > yarn.nodemanager.pluggable-device-framework.device-classes > com.cmp1.hdw1,... > {code} > The admin needs to know the register resource name of every plugin classes > configured. And declare them in resource-types.xml. > Please note that the count value defined in node-resource.xml will be > overridden by plugin. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8977) Remove unnecessary type casting when calling AbstractYarnScheduler#getSchedulerNode
[ https://issues.apache.org/jira/browse/YARN-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8977: -- Fix Version/s: 3.2.1 > Remove unnecessary type casting when calling > AbstractYarnScheduler#getSchedulerNode > --- > > Key: YARN-8977 > URL: https://issues.apache.org/jira/browse/YARN-8977 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Trivial > Fix For: 3.3.0, 3.2.1 > > Attachments: YARN-8977.001.patch, YARN-8977.002.patch > > > Due to the AbstractYarnScheduler#getSchedulerNode method return the generic > type, so I think don't need explicit type. > I found this issue in CapacityScheduler class. The warning message like: > {quote}Casting 'getSchedulerNode( nonKillableContainer.getAllocatedNode())' > to 'FiCaSchedulerNode' is redundant > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
[ https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678324#comment-16678324 ] Weiwei Yang edited comment on YARN-8984 at 11/7/18 2:48 PM: Thanks for the updates [~fly_in_gis], it almost seems good to me. One doubt if we remove that null check {code:java} List schedReqs = outstandingSchedRequests.get(container.getAllocationTags()); {code} now this seems to be possible to throw NPE, when container.getAllocationTags() is null. was (Author: cheersyang): Thanks for the updates [~fly_in_gis], it almost seems good to me. One thought if we remove that null check {code} List schedReqs = outstandingSchedRequests.get(container.getAllocationTags()); {code} now this seems to be possible to throw NPE, when container.getAllocationTags() is null. > AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty > -- > > Key: YARN-8984 > URL: https://issues.apache.org/jira/browse/YARN-8984 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Yang Wang >Priority: Critical > Attachments: YARN-8984-001.patch, YARN-8984-002.patch, > YARN-8984-003.patch > > > In AMRMClient, outstandingSchedRequests should be removed or decreased when > container allocated. However, it could not work when allocation tag is null > or empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
[ https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678324#comment-16678324 ] Weiwei Yang commented on YARN-8984: --- Thanks for the updates [~fly_in_gis], it almost seems good to me. One thought if we remove that null check {code} List schedReqs = outstandingSchedRequests.get(container.getAllocationTags()); {code} now this seems to be possible to throw NPE, when container.getAllocationTags() is null. > AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty > -- > > Key: YARN-8984 > URL: https://issues.apache.org/jira/browse/YARN-8984 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Yang Wang >Priority: Critical > Attachments: YARN-8984-001.patch, YARN-8984-002.patch, > YARN-8984-003.patch > > > In AMRMClient, outstandingSchedRequests should be removed or decreased when > container allocated. However, it could not work when allocation tag is null > or empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8977) Remove unnecessary type casting when calling AbstractYarnScheduler#getSchedulerNode
[ https://issues.apache.org/jira/browse/YARN-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8977: -- Priority: Trivial (was: Major) > Remove unnecessary type casting when calling > AbstractYarnScheduler#getSchedulerNode > --- > > Key: YARN-8977 > URL: https://issues.apache.org/jira/browse/YARN-8977 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Trivial > Attachments: YARN-8977.001.patch, YARN-8977.002.patch > > > Due to the AbstractYarnScheduler#getSchedulerNode method return the generic > type, so I think don't need explicit type. > I found this issue in CapacityScheduler class. The warning message like: > {quote}Casting 'getSchedulerNode( nonKillableContainer.getAllocatedNode())' > to 'FiCaSchedulerNode' is redundant > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8977) Remove explicit type when called AbstractYarnScheduler#getSchedulerNode to avoid type casting
[ https://issues.apache.org/jira/browse/YARN-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678272#comment-16678272 ] Weiwei Yang commented on YARN-8977: --- The UT failure should be irrelevant, I tested locally it can work correctly. +1 for the v2 patch, committing soon. > Remove explicit type when called AbstractYarnScheduler#getSchedulerNode to > avoid type casting > - > > Key: YARN-8977 > URL: https://issues.apache.org/jira/browse/YARN-8977 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Attachments: YARN-8977.001.patch, YARN-8977.002.patch > > > Due to the AbstractYarnScheduler#getSchedulerNode method return the generic > type, so I think don't need explicit type. > I found this issue in CapacityScheduler class. The warning message like: > {quote}Casting 'getSchedulerNode( nonKillableContainer.getAllocatedNode())' > to 'FiCaSchedulerNode' is redundant > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8977) Remove unnecessary type casting when calling AbstractYarnScheduler#getSchedulerNode
[ https://issues.apache.org/jira/browse/YARN-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8977: -- Summary: Remove unnecessary type casting when calling AbstractYarnScheduler#getSchedulerNode (was: Remove explicit type when called AbstractYarnScheduler#getSchedulerNode to avoid type casting) > Remove unnecessary type casting when calling > AbstractYarnScheduler#getSchedulerNode > --- > > Key: YARN-8977 > URL: https://issues.apache.org/jira/browse/YARN-8977 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Attachments: YARN-8977.001.patch, YARN-8977.002.patch > > > Due to the AbstractYarnScheduler#getSchedulerNode method return the generic > type, so I think don't need explicit type. > I found this issue in CapacityScheduler class. The warning message like: > {quote}Casting 'getSchedulerNode( nonKillableContainer.getAllocatedNode())' > to 'FiCaSchedulerNode' is redundant > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null
[ https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678260#comment-16678260 ] Weiwei Yang commented on YARN-8233: --- The patch for branch-3.1 looks good, however the jenkins job runs into some issues. From [https://builds.apache.org/job/PreCommit-YARN-Build/22446/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt], I see {quote}[ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called? {quote} anything wrong on branch-3.1? > NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal > whose allocatedOrReservedContainer is null > - > > Key: YARN-8233 > URL: https://issues.apache.org/jira/browse/YARN-8233 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Fix For: 3.3.0, 3.2.1 > > Attachments: YARN-8233.001.branch-2.patch, > YARN-8233.001.branch-3.0.patch, YARN-8233.001.branch-3.1.patch, > YARN-8233.001.patch, YARN-8233.002.patch, YARN-8233.003.patch > > > Recently we saw a NPE problem in CapacityScheduler#tryCommit when try to find > the attemptId by calling {{c.getAllocatedOrReservedContainer().get...}} from > an allocate/reserve proposal. But got null allocatedOrReservedContainer and > thrown NPE. > Reference code: > {code:java} > // find the application to accept and apply the ResourceCommitRequest > if (request.anythingAllocatedOrReserved()) { > ContainerAllocationProposal c = > request.getFirstAllocatedOrReservedContainer(); > attemptId = > c.getAllocatedOrReservedContainer().getSchedulerApplicationAttempt() > .getApplicationAttemptId(); //NPE happens here > } else { ... > {code} > The proposal was constructed in > {{CapacityScheduler#createResourceCommitRequest}} and > allocatedOrReservedContainer is possibly null in async-scheduling process > when node was lost or application was finished (details in > {{CapacityScheduler#getSchedulerContainer}}). > Reference code: > {code:java} > // Allocated something > List allocations = > csAssignment.getAssignmentInformation().getAllocationDetails(); > if (!allocations.isEmpty()) { > RMContainer rmContainer = allocations.get(0).rmContainer; > allocated = new ContainerAllocationProposal<>( > getSchedulerContainer(rmContainer, true), //possibly null > getSchedulerContainersToRelease(csAssignment), > > getSchedulerContainer(csAssignment.getFulfilledReservedContainer(), > false), csAssignment.getType(), > csAssignment.getRequestLocalityType(), > csAssignment.getSchedulingMode() != null ? > csAssignment.getSchedulingMode() : > SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY, > csAssignment.getResource()); > } > {code} > I think we should add null check for allocateOrReserveContainer before create > allocate/reserve proposals. Besides the allocation process has increase > unconfirmed resource of app when creating an allocate assignment, so if this > check is null, we should decrease the unconfirmed resource of live app. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8880) Add configurations for pluggable plugin framework
[ https://issues.apache.org/jira/browse/YARN-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678245#comment-16678245 ] Weiwei Yang commented on YARN-8880: --- Hi [~tangzhankun] Thanks for the patch. It looks good. Some small nits # "This settings" -> "setting" # ResourcePluginManager#initializePluggableDevicePlugins currently only has the null check. will more checks be added here? Just want to make sure initializing problems can be found as early as possible, like class type check etc. # TestResourcePluginManager: can we make sure mock NMs are stopped in each test cases? # Checkstyle issues need to be fixed And there seems to have some unnecessary changes, e.g {code} - ((NMContext)this.getNMContext()).setResourcePluginManager(rpm); + ((NMContext) this.getNMContext()).setResourcePluginManager(rpm); {code} and {code} - metrics, diskhandler); + metrics, diskhandler); {code} thanks. > Add configurations for pluggable plugin framework > - > > Key: YARN-8880 > URL: https://issues.apache.org/jira/browse/YARN-8880 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Attachments: YARN-8880-trunk.001.patch, YARN-8880-trunk.002.patch, > YARN-8880-trunk.003.patch > > > Added two configurations for the pluggable device framework. > {code:java} > > yarn.nodemanager.pluggable-device-framework.enabled > true/false > > > yarn.nodemanager.pluggable-device-framework.device-classes > com.cmp1.hdw1,... > {code} > The admin needs to know the register resource name of every plugin classes > configured. And declare them in resource-types.xml. > Please note that the count value defined in node-resource.xml will be > overridden by plugin. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8902) Add volume manager that manages CSI volume lifecycle
[ https://issues.apache.org/jira/browse/YARN-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678218#comment-16678218 ] Hadoop QA commented on YARN-8902: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 16s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 22s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 7s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 7 new + 58 unchanged - 0 fixed = 65 total (was 58) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 50s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 42s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}106m 14s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}187m 36s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-8902 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12947215/YARN-8902.008.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux cbfb9ab2ca25 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8dc1f6d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle |
[jira] [Commented] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
[ https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678192#comment-16678192 ] Hadoop QA commented on YARN-8984: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 35s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 33s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 14m 14s{color} | {color:red} branch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 0s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 20s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 24 new + 35 unchanged - 0 fixed = 59 total (was 35) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 11m 51s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 2s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 44s{color} | {color:red} hadoop-yarn-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 75m 39s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-8984 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12947223/YARN-8984-003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux e2095fc2c363 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8dc1f6d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle |
[jira] [Commented] (YARN-8880) Add configurations for pluggable plugin framework
[ https://issues.apache.org/jira/browse/YARN-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678151#comment-16678151 ] Zhankun Tang commented on YARN-8880: [~Weiwei Yang] , [~sunilg] , the failed test seems not related. Please help to review. > Add configurations for pluggable plugin framework > - > > Key: YARN-8880 > URL: https://issues.apache.org/jira/browse/YARN-8880 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Attachments: YARN-8880-trunk.001.patch, YARN-8880-trunk.002.patch, > YARN-8880-trunk.003.patch > > > Added two configurations for the pluggable device framework. > {code:java} > > yarn.nodemanager.pluggable-device-framework.enabled > true/false > > > yarn.nodemanager.pluggable-device-framework.device-classes > com.cmp1.hdw1,... > {code} > The admin needs to know the register resource name of every plugin classes > configured. And declare them in resource-types.xml. > Please note that the count value defined in node-resource.xml will be > overridden by plugin. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null
[ https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678122#comment-16678122 ] Hadoop QA commented on YARN-8233: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 32s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-3.1 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 0s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} branch-3.1 passed {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 11m 45s{color} | {color:red} branch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} branch-3.1 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 10m 34s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 15s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 63m 46s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:080e9d0 | | JIRA Issue | YARN-8233 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12947218/YARN-8233.001.branch-3.1.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 541f37412400 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-3.1 / eb426db | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/22446/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/22446/testReport/ | | Max. process+thread count | 100 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output |
[jira] [Commented] (YARN-8880) Add configurations for pluggable plugin framework
[ https://issues.apache.org/jira/browse/YARN-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678111#comment-16678111 ] Hadoop QA commented on YARN-8880: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 58s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 29s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 37s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 8s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 51s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 7 new + 222 unchanged - 3 fixed = 229 total (was 225) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 58s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 4s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 13s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 23s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 45s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}142m 57s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.TestContainerManager | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-8880 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12947204/YARN-8880-trunk.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs
[jira] [Updated] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
[ https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Wang updated YARN-8984: Attachment: YARN-8984-003.patch > AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty > -- > > Key: YARN-8984 > URL: https://issues.apache.org/jira/browse/YARN-8984 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Yang Wang >Priority: Critical > Attachments: YARN-8984-001.patch, YARN-8984-002.patch, > YARN-8984-003.patch > > > In AMRMClient, outstandingSchedRequests should be removed or decreased when > container allocated. However, it could not work when allocation tag is null > or empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
[ https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678081#comment-16678081 ] Yang Wang commented on YARN-8984: - There's no difference between in a separate class and in TestAMRMClientPlacementConstraints. When set YarnConfiguration.RM_PLACEMENT_CONSTRAINTS_HANDLER to scheduler, we could not get rejectedSchedulingRequests from AllocateResponse. It is not set by the capacity scheduler. So i add another test in TestAMRMClientPlacementConstraints. [~cheersyang] Please help to review. > AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty > -- > > Key: YARN-8984 > URL: https://issues.apache.org/jira/browse/YARN-8984 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Yang Wang >Priority: Critical > Attachments: YARN-8984-001.patch, YARN-8984-002.patch > > > In AMRMClient, outstandingSchedRequests should be removed or decreased when > container allocated. However, it could not work when allocation tag is null > or empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8977) Remove explicit type when called AbstractYarnScheduler#getSchedulerNode to avoid type casting
[ https://issues.apache.org/jira/browse/YARN-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678065#comment-16678065 ] Wanqiang Ji commented on YARN-8977: --- I don't think the hadoop-yarn-server-resourcemanager issues are caused by this patch. So pending on Jenkins again. > Remove explicit type when called AbstractYarnScheduler#getSchedulerNode to > avoid type casting > - > > Key: YARN-8977 > URL: https://issues.apache.org/jira/browse/YARN-8977 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Attachments: YARN-8977.001.patch, YARN-8977.002.patch > > > Due to the AbstractYarnScheduler#getSchedulerNode method return the generic > type, so I think don't need explicit type. > I found this issue in CapacityScheduler class. The warning message like: > {quote}Casting 'getSchedulerNode( nonKillableContainer.getAllocatedNode())' > to 'FiCaSchedulerNode' is redundant > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null
[ https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678045#comment-16678045 ] Tao Yang edited comment on YARN-8233 at 11/7/18 11:06 AM: -- Hi, [~ajisakaa], [~cheersyang] I have attached 3 patches: (1) patch for branch-3.1 just update test case since SchedulerApplicationAttempt#hasPendingResourceRequest API has changed in 3.2 and trunk. (2) patch for branch-3.0 includes modification above and drop the modification in CapacityScheduler#attemptAllocationOnNode since the method is not exist yet. (3) patch for branch-2 includes modifications above and update UT to add final keywords for variables which are used in Mockito#doAnswer. branch-2.9 can use branch-2 patch. I have applied these patch on my local environment, tried to run UT and did not found any problems. Just in case, please help to review these new patches before committing, Thanks! was (Author: tao yang): Hi, [~ajisakaa], [~cheersyang] I have attached 3 patches: (1) patch for branch-3.1 just update test case since SchedulerApplicationAttempt#hasPendingResourceRequest API has changed in 3.2 and trunk. (2) patch for branch-3.0 includes modification above and drop the modification in CapacityScheduler#attemptAllocationOnNode since the method is not exist yet. (3) patch for branch-2 includes modifications above and update UT to add final keywords for variables which are used in Mockito#doAnswer. branch-2.9 can use branch-2 patch. Please help to review these new patches before committing, Thanks. > NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal > whose allocatedOrReservedContainer is null > - > > Key: YARN-8233 > URL: https://issues.apache.org/jira/browse/YARN-8233 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Fix For: 3.3.0, 3.2.1 > > Attachments: YARN-8233.001.branch-2.patch, > YARN-8233.001.branch-3.0.patch, YARN-8233.001.branch-3.1.patch, > YARN-8233.001.patch, YARN-8233.002.patch, YARN-8233.003.patch > > > Recently we saw a NPE problem in CapacityScheduler#tryCommit when try to find > the attemptId by calling {{c.getAllocatedOrReservedContainer().get...}} from > an allocate/reserve proposal. But got null allocatedOrReservedContainer and > thrown NPE. > Reference code: > {code:java} > // find the application to accept and apply the ResourceCommitRequest > if (request.anythingAllocatedOrReserved()) { > ContainerAllocationProposal c = > request.getFirstAllocatedOrReservedContainer(); > attemptId = > c.getAllocatedOrReservedContainer().getSchedulerApplicationAttempt() > .getApplicationAttemptId(); //NPE happens here > } else { ... > {code} > The proposal was constructed in > {{CapacityScheduler#createResourceCommitRequest}} and > allocatedOrReservedContainer is possibly null in async-scheduling process > when node was lost or application was finished (details in > {{CapacityScheduler#getSchedulerContainer}}). > Reference code: > {code:java} > // Allocated something > List allocations = > csAssignment.getAssignmentInformation().getAllocationDetails(); > if (!allocations.isEmpty()) { > RMContainer rmContainer = allocations.get(0).rmContainer; > allocated = new ContainerAllocationProposal<>( > getSchedulerContainer(rmContainer, true), //possibly null > getSchedulerContainersToRelease(csAssignment), > > getSchedulerContainer(csAssignment.getFulfilledReservedContainer(), > false), csAssignment.getType(), > csAssignment.getRequestLocalityType(), > csAssignment.getSchedulingMode() != null ? > csAssignment.getSchedulingMode() : > SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY, > csAssignment.getResource()); > } > {code} > I think we should add null check for allocateOrReserveContainer before create > allocate/reserve proposals. Besides the allocation process has increase > unconfirmed resource of app when creating an allocate assignment, so if this > check is null, we should decrease the unconfirmed resource of live app. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null
[ https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678045#comment-16678045 ] Tao Yang commented on YARN-8233: Hi, [~ajisakaa], [~cheersyang] I have attached 3 patches: (1) patch for branch-3.1 just update test case since SchedulerApplicationAttempt#hasPendingResourceRequest API has changed in 3.2 and trunk. (2) patch for branch-3.0 includes modification above and drop the modification in CapacityScheduler#attemptAllocationOnNode since the method is not exist yet. (3) patch for branch-2 includes modifications above and update UT to add final keywords for variables which are used in Mockito#doAnswer. branch-2.9 can use branch-2 patch. Please help to review these new patches before committing, Thanks. > NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal > whose allocatedOrReservedContainer is null > - > > Key: YARN-8233 > URL: https://issues.apache.org/jira/browse/YARN-8233 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Fix For: 3.3.0, 3.2.1 > > Attachments: YARN-8233.001.branch-2.patch, > YARN-8233.001.branch-3.0.patch, YARN-8233.001.branch-3.1.patch, > YARN-8233.001.patch, YARN-8233.002.patch, YARN-8233.003.patch > > > Recently we saw a NPE problem in CapacityScheduler#tryCommit when try to find > the attemptId by calling {{c.getAllocatedOrReservedContainer().get...}} from > an allocate/reserve proposal. But got null allocatedOrReservedContainer and > thrown NPE. > Reference code: > {code:java} > // find the application to accept and apply the ResourceCommitRequest > if (request.anythingAllocatedOrReserved()) { > ContainerAllocationProposal c = > request.getFirstAllocatedOrReservedContainer(); > attemptId = > c.getAllocatedOrReservedContainer().getSchedulerApplicationAttempt() > .getApplicationAttemptId(); //NPE happens here > } else { ... > {code} > The proposal was constructed in > {{CapacityScheduler#createResourceCommitRequest}} and > allocatedOrReservedContainer is possibly null in async-scheduling process > when node was lost or application was finished (details in > {{CapacityScheduler#getSchedulerContainer}}). > Reference code: > {code:java} > // Allocated something > List allocations = > csAssignment.getAssignmentInformation().getAllocationDetails(); > if (!allocations.isEmpty()) { > RMContainer rmContainer = allocations.get(0).rmContainer; > allocated = new ContainerAllocationProposal<>( > getSchedulerContainer(rmContainer, true), //possibly null > getSchedulerContainersToRelease(csAssignment), > > getSchedulerContainer(csAssignment.getFulfilledReservedContainer(), > false), csAssignment.getType(), > csAssignment.getRequestLocalityType(), > csAssignment.getSchedulingMode() != null ? > csAssignment.getSchedulingMode() : > SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY, > csAssignment.getResource()); > } > {code} > I think we should add null check for allocateOrReserveContainer before create > allocate/reserve proposals. Besides the allocation process has increase > unconfirmed resource of app when creating an allocate assignment, so if this > check is null, we should decrease the unconfirmed resource of live app. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8983) YARN container with docker: hostname entry not in /etc/hosts
[ https://issues.apache.org/jira/browse/YARN-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678029#comment-16678029 ] Zhankun Tang commented on YARN-8983: [~oliverhuh...@gmail.com] , Just run a quick test with DistributedShell on yarn 3.3.0 with the command "cat /etc/hosts" in a Docker container. {code:java} # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters{code} > YARN container with docker: hostname entry not in /etc/hosts > > > Key: YARN-8983 > URL: https://issues.apache.org/jira/browse/YARN-8983 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.1 >Reporter: Keqiu Hu >Priority: Critical > > I'm experimenting to use Hadoop 2.9.1 to launch applications with docker > containers. Inside the container task, we try to get the hostname of the > container using > {code:java} > InetAddress.getLocalHost().getHostName(){code} > This works fine with LXC, however it throws the following exception when I > enable docker container using: > {code:java} > YARN_CONTAINER_RUNTIME_TYPE=docker > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=test4 > {code} > The exception: > > {noformat} > java.net.UnknownHostException: ctr-1541488751855-0023-01-03: > ctr-1541488751855-0023-01-03: Temporary failure in name resolution at > java.net.InetAddress.getLocalHost(InetAddress.java:1506) > at > com.linkedin.tony.TaskExecutor.registerAndGetClusterSpec(TaskExecutor.java:204) > > at com.linkedin.tony.TaskExecutor.main(TaskExecutor.java:109) Caused by: > java.net.UnknownHostException: ctr-1541488751855-0023-01-03: Temporary > failure in name resolution at > java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) > at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) > at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at > java.net.InetAddress.getLocalHost(InetAddress.java:1501) ... 2 more > {noformat} > > Did some research online, it seems to be related to missing entry in > /etc/hosts on the hostname. So I took a look at the /etc/hosts, it is missing > the entry : > {noformat} > pi@pi-aw:~/docker/$ docker ps > CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES > 71e3e9df8bc6 test4 "/entrypoint.sh bash..." 1 second ago Up Less than a > second container_1541488751855_0028_01_01 > 29d31f0327d1 test3 "/entrypoint.sh bash" 18 hours ago Up 18 hours > blissful_turing > pi@pi-aw:~/docker/$ de 71e3e9df8bc6 > groups: cannot find name for group ID 1000 > groups: cannot find name for group ID 116 > groups: cannot find name for group ID 126 > To run a command as administrator (user "root"), use "sudo ". > See "man sudo_root" for details. > pi@ctr-1541488751855-0028-01-01:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_01$ > cat /etc/hosts > 127.0.0.1 localhost > 192.168.0.14 pi-aw > # The following lines are desirable for IPv6 capable hosts > ::1 ip6-localhost ip6-loopback > fe00::0 ip6-localnet > ff00::0 ip6-mcastprefix > ff02::1 ip6-allnodes > ff02::2 ip6-allrouters > pi@ctr-1541488751855-0028-01-01:/tmp/hadoop-pi/nm-local-dir/usercache/pi/appcache/application_1541488751855_0028/container_1541488751855_0028_01_01$ > {noformat} > If I launch the image without YARN, I saw the entry in /etc/hosts: > {noformat} > pi@61f173f95631:~$ cat /etc/hosts > 127.0.0.1 localhost > ::1 localhost ip6-localhost ip6-loopback > fe00::0 ip6-localnet > ff00::0 ip6-mcastprefix > ff02::1 ip6-allnodes > ff02::2 ip6-allrouters > 172.17.0.3 61f173f95631 {noformat} > Here is my container-executor.cfg > {code:java} > 1 min.user.id=100 > 2 yarn.nodemanager.linux-container-executor.group=hadoop > 3 [docker] > 4 module.enabled=true > 5 docker.binary=/usr/bin/docker > 6 > docker.allowed.capabilities=SYS_CHROOT,MKNOD,SETFCAP,SETPCAP,FSETID,CHOWN,AUDIT_WRITE,SETGID,NET_RAW,FOWNER,SETUID,DAC_OVERRIDE,KILL,NET_BIND_SERVICE > 7 docker.allowed.networks=bridge,host,none > 8 > docker.allowed.rw-mounts=/tmp,/etc/hadoop/logs/,/private/etc/hadoop-2.9.1/logs/{code} > Since I'm using an older version of Hadoop 2.9.1, let me know if this is > something already fixed in later version :) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8233) NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal whose allocatedOrReservedContainer is null
[ https://issues.apache.org/jira/browse/YARN-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-8233: --- Attachment: YARN-8233.001.branch-3.1.patch YARN-8233.001.branch-3.0.patch YARN-8233.001.branch-2.patch > NPE in CapacityScheduler#tryCommit when handling allocate/reserve proposal > whose allocatedOrReservedContainer is null > - > > Key: YARN-8233 > URL: https://issues.apache.org/jira/browse/YARN-8233 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Fix For: 3.3.0, 3.2.1 > > Attachments: YARN-8233.001.branch-2.patch, > YARN-8233.001.branch-3.0.patch, YARN-8233.001.branch-3.1.patch, > YARN-8233.001.patch, YARN-8233.002.patch, YARN-8233.003.patch > > > Recently we saw a NPE problem in CapacityScheduler#tryCommit when try to find > the attemptId by calling {{c.getAllocatedOrReservedContainer().get...}} from > an allocate/reserve proposal. But got null allocatedOrReservedContainer and > thrown NPE. > Reference code: > {code:java} > // find the application to accept and apply the ResourceCommitRequest > if (request.anythingAllocatedOrReserved()) { > ContainerAllocationProposal c = > request.getFirstAllocatedOrReservedContainer(); > attemptId = > c.getAllocatedOrReservedContainer().getSchedulerApplicationAttempt() > .getApplicationAttemptId(); //NPE happens here > } else { ... > {code} > The proposal was constructed in > {{CapacityScheduler#createResourceCommitRequest}} and > allocatedOrReservedContainer is possibly null in async-scheduling process > when node was lost or application was finished (details in > {{CapacityScheduler#getSchedulerContainer}}). > Reference code: > {code:java} > // Allocated something > List allocations = > csAssignment.getAssignmentInformation().getAllocationDetails(); > if (!allocations.isEmpty()) { > RMContainer rmContainer = allocations.get(0).rmContainer; > allocated = new ContainerAllocationProposal<>( > getSchedulerContainer(rmContainer, true), //possibly null > getSchedulerContainersToRelease(csAssignment), > > getSchedulerContainer(csAssignment.getFulfilledReservedContainer(), > false), csAssignment.getType(), > csAssignment.getRequestLocalityType(), > csAssignment.getSchedulingMode() != null ? > csAssignment.getSchedulingMode() : > SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY, > csAssignment.getResource()); > } > {code} > I think we should add null check for allocateOrReserveContainer before create > allocate/reserve proposals. Besides the allocation process has increase > unconfirmed resource of app when creating an allocate assignment, so if this > check is null, we should decrease the unconfirmed resource of live app. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8902) Add volume manager that manages CSI volume lifecycle
[ https://issues.apache.org/jira/browse/YARN-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677981#comment-16677981 ] Weiwei Yang commented on YARN-8902: --- Hi [~sunilg] Thanks for the review, {quote}Does CsiAdaptorClientProtocol need to have an interface for unpublish volume? {quote} Yes, it will have unpublish volume too, that will be added when we work on unpublish stuff. {quote}CsiAdaptorClientProtocol impl will be done when we do adapter code in nm ? {quote} Correct, actually if you take a look at YARN-8953, it has a more detailed implementation. Also a sample workflow can be found here https://issues.apache.org/jira/secure/attachment/12947186/csi_adaptor_workflow.png. {quote}In CsiConstants, there are duplicate issues. {quote} Removed 2nd one in v8 patch. {quote}VolumeCapability.validateCapability checks only minCapacity? Do we need to define a range or something similar here.? I am also thinking whether we need to normalize unit with min or max capacity and keep a common value. Could help to avoid run time conversions. {quote} This validation is just an user-input validation, the real validation happens at CSI driver side. The resource min/max value is our capacity range (I have renamed VolumeCapacity to VolumeCapacityRange accordingly to avoid confusion). And I checked CSI spec, the capacity is specified as bytes. To be compatible with resource definition, we allow user to set units, but underneath we need to convert them to bytes (will handle this when integrating with adaptor code). Apart from upon, v8 patch also simplifies the interface \{{CsiAdaptorClientProtocol}}, since this interface will be fully implemented in YARN-8953, lets keep as simple as possible here because it is only used for testing within this patch. Hope it makes sense. Thanks > Add volume manager that manages CSI volume lifecycle > > > Key: YARN-8902 > URL: https://issues.apache.org/jira/browse/YARN-8902 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8902.001.patch, YARN-8902.002.patch, > YARN-8902.003.patch, YARN-8902.004.patch, YARN-8902.005.patch, > YARN-8902.006.patch, YARN-8902.007.patch, YARN-8902.008.patch > > > The CSI volume manager is a service running in RM process, that manages all > CSI volumes' lifecycle. The details about volume's lifecycle states can be > found in [CSI > spec|https://github.com/container-storage-interface/spec/blob/master/spec.md]. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8902) Add volume manager that manages CSI volume lifecycle
[ https://issues.apache.org/jira/browse/YARN-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8902: -- Attachment: YARN-8902.008.patch > Add volume manager that manages CSI volume lifecycle > > > Key: YARN-8902 > URL: https://issues.apache.org/jira/browse/YARN-8902 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8902.001.patch, YARN-8902.002.patch, > YARN-8902.003.patch, YARN-8902.004.patch, YARN-8902.005.patch, > YARN-8902.006.patch, YARN-8902.007.patch, YARN-8902.008.patch > > > The CSI volume manager is a service running in RM process, that manages all > CSI volumes' lifecycle. The details about volume's lifecycle states can be > found in [CSI > spec|https://github.com/container-storage-interface/spec/blob/master/spec.md]. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8866) Fix a parsing error for crossdomain.xml
[ https://issues.apache.org/jira/browse/YARN-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677935#comment-16677935 ] Hudson commented on YARN-8866: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15382 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15382/]) YARN-8866. Fix a parsing error for crossdomain.xml. (tasanuma: rev 8dc1f6dbf712a65390a9a6859f62fec0481af31b) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml > Fix a parsing error for crossdomain.xml > --- > > Key: YARN-8866 > URL: https://issues.apache.org/jira/browse/YARN-8866 > Project: Hadoop YARN > Issue Type: Bug > Components: build, yarn-ui-v2 >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1 > > Attachments: YARN-8866.1.patch > > > [QBT|https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/] reports > a parsing error for crossdomain.xml in hadoop-yarn-ui. > {noformat} > Parsing Error(s): > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml > > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
[ https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677917#comment-16677917 ] Weiwei Yang commented on YARN-8984: --- Hi [~fly_in_gis] I suppose you should have set to use "scheduler" handler, + conf.set(YarnConfiguration.RM_PLACEMENT_CONSTRAINTS_HANDLER, "scheduler"); then I am not sure what is the difference to run it in \{{TestAMRMClientPlacementConstraints}} than a separate class. Could u pls take a look? Thanks > AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty > -- > > Key: YARN-8984 > URL: https://issues.apache.org/jira/browse/YARN-8984 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Yang Wang >Priority: Critical > Attachments: YARN-8984-001.patch, YARN-8984-002.patch > > > In AMRMClient, outstandingSchedRequests should be removed or decreased when > container allocated. However, it could not work when allocation tag is null > or empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8945) Calculation of maximum applications should respect specified and global maximum applications for absolute resource
[ https://issues.apache.org/jira/browse/YARN-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677914#comment-16677914 ] Hadoop QA commented on YARN-8945: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 21s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 18s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}117m 38s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}173m 13s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-8945 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945697/YARN-8945.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 169cd72c0fbf 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / addec29 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/22443/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/22443/testReport/ | | Max. process+thread count | 933 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output |
[jira] [Commented] (YARN-8866) Fix a parsing error for crossdomain.xml
[ https://issues.apache.org/jira/browse/YARN-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677909#comment-16677909 ] Takanobu Asanuma commented on YARN-8866: Committed to branch-3.0, branch-3.1, branch-3.2, trunk. > Fix a parsing error for crossdomain.xml > --- > > Key: YARN-8866 > URL: https://issues.apache.org/jira/browse/YARN-8866 > Project: Hadoop YARN > Issue Type: Bug > Components: build, yarn-ui-v2 >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Attachments: YARN-8866.1.patch > > > [QBT|https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/] reports > a parsing error for crossdomain.xml in hadoop-yarn-ui. > {noformat} > Parsing Error(s): > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml > > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8866) Fix a parsing error for crossdomain.xml
[ https://issues.apache.org/jira/browse/YARN-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677901#comment-16677901 ] Takanobu Asanuma commented on YARN-8866: Thanks for the review, [~leftnoteasy]! I'd like to commit it now. > Fix a parsing error for crossdomain.xml > --- > > Key: YARN-8866 > URL: https://issues.apache.org/jira/browse/YARN-8866 > Project: Hadoop YARN > Issue Type: Bug > Components: build, yarn-ui-v2 >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Attachments: YARN-8866.1.patch > > > [QBT|https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/] reports > a parsing error for crossdomain.xml in hadoop-yarn-ui. > {noformat} > Parsing Error(s): > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml > > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8880) Add configurations for pluggable plugin framework
[ https://issues.apache.org/jira/browse/YARN-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8880: --- Attachment: YARN-8880-trunk.003.patch > Add configurations for pluggable plugin framework > - > > Key: YARN-8880 > URL: https://issues.apache.org/jira/browse/YARN-8880 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Attachments: YARN-8880-trunk.001.patch, YARN-8880-trunk.002.patch, > YARN-8880-trunk.003.patch > > > Added two configurations for the pluggable device framework. > {code:java} > > yarn.nodemanager.pluggable-device-framework.enabled > true/false > > > yarn.nodemanager.pluggable-device-framework.device-classes > com.cmp1.hdw1,... > {code} > The admin needs to know the register resource name of every plugin classes > configured. And declare them in resource-types.xml. > Please note that the count value defined in node-resource.xml will be > overridden by plugin. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8898) Fix FederationInterceptor#allocate to set application priority in allocateResponse
[ https://issues.apache.org/jira/browse/YARN-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677535#comment-16677535 ] Bibin A Chundatt edited comment on YARN-8898 at 11/7/18 8:35 AM: - As per the current implementation UAM application to secondary clusters doesnt set priority,tags,type, etc.. .If these fields are not used during submit. getApplications api result, container allocation etc might not be as expected. {quote} what are the client APIs that you are referring to {quote} Client API's - ApplicationClientProtocol(YarnClient API) and WebServiceProtcol(Rest API). For application specific/ Container specific API calls filters are based on few of the above mentioned fields was (Author: bibinchundatt): {quote} what are the client APIs that you are referring to {quote} Client API's - ApplicationClientProtocol(YarnClient API) and WebServiceProtcol(Rest API). For application specific/ Container specific API calls filters are based on few of the above mentioned fields > Fix FederationInterceptor#allocate to set application priority in > allocateResponse > -- > > Key: YARN-8898 > URL: https://issues.apache.org/jira/browse/YARN-8898 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Major > > In case of FederationInterceptor#mergeAllocateResponses skips > application_priority in response returned -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8902) Add volume manager that manages CSI volume lifecycle
[ https://issues.apache.org/jira/browse/YARN-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677829#comment-16677829 ] Sunil Govindan commented on YARN-8902: -- Hi [~cheersyang] . Thank you. Few initial comments Does CsiAdaptorClientProtocol need to have an interface for unpublish volume? 2. CsiAdaptorClientProtocol impl will be done when we do adapter code in nm ? 3. In CsiConstants, there are duplicate issues. {code:java} 32public static final String CSI_DRIVER_NAME = "driver.name"; 34public static final String CSI_VOLUME_DRIVER_NAME = "driver.name"; {code} 4. Does VolumeCapability need to cover access mode's and access capabilites as well ? 5. VolumeCapability.validateCapability checks only minCapacity? Do we need to define a range or something similar here.? I am also thinking whether we need to normalize unit with min or max capacity and keep a common value. Could help to avoid run time conversions. 6. We can add Evolving and Unstable tags for all interfaces/classes which are public > Add volume manager that manages CSI volume lifecycle > > > Key: YARN-8902 > URL: https://issues.apache.org/jira/browse/YARN-8902 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8902.001.patch, YARN-8902.002.patch, > YARN-8902.003.patch, YARN-8902.004.patch, YARN-8902.005.patch, > YARN-8902.006.patch, YARN-8902.007.patch > > > The CSI volume manager is a service running in RM process, that manages all > CSI volumes' lifecycle. The details about volume's lifecycle states can be > found in [CSI > spec|https://github.com/container-storage-interface/spec/blob/master/spec.md]. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size
[ https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677536#comment-16677536 ] Bibin A Chundatt edited comment on YARN-8972 at 11/7/18 8:28 AM: - [~giovanni.fumarola] Advantage with interceptor at router side is, it will avoids router home cluster addition to federation Store , then submit to RM etc.. Since its optional lets add this. was (Author: bibinchundatt): [~giovanni.fumarola] I advantage with interceptor at router side is, it will avoids router home cluster addition, then submit to RM etc.. Since its optional lets add this. > [Router] Add support to prevent DoS attack over ApplicationSubmissionContext > size > - > > Key: YARN-8972 > URL: https://issues.apache.org/jira/browse/YARN-8972 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8972.v1.patch, YARN-8972.v2.patch > > > This jira tracks the effort to add a new interceptor in the Router to prevent > user to submit applications with oversized ASC. > This avoid YARN cluster to failover. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
[ https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Wang updated YARN-8984: Attachment: YARN-8984-002.patch > AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty > -- > > Key: YARN-8984 > URL: https://issues.apache.org/jira/browse/YARN-8984 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Yang Wang >Priority: Critical > Attachments: YARN-8984-001.patch, YARN-8984-002.patch > > > In AMRMClient, outstandingSchedRequests should be removed or decreased when > container allocated. However, it could not work when allocation tag is null > or empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8984) AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty
[ https://issues.apache.org/jira/browse/YARN-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677802#comment-16677802 ] Yang Wang commented on YARN-8984: - Hi, [~cheersyang] I have tried to move the test to TestAMRMClientPlacementConstraints and found the case failed. Because containers could not be allocated when allocationTags is empty. I think it is another issue about placement-processor. > AMRMClient#OutstandingSchedRequests leaks when AllocationTags is null or empty > -- > > Key: YARN-8984 > URL: https://issues.apache.org/jira/browse/YARN-8984 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yang Wang >Assignee: Yang Wang >Priority: Critical > Attachments: YARN-8984-001.patch > > > In AMRMClient, outstandingSchedRequests should be removed or decreased when > container allocated. However, it could not work when allocation tag is null > or empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org