[jira] [Commented] (YARN-9561) Add C changes for the new RuncContainerRuntime
[ https://issues.apache.org/jira/browse/YARN-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970666#comment-16970666 ] Hadoop QA commented on YARN-9561: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 6s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 20m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 76m 2s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 31s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 16s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 19m 16s{color} | {color:red} root generated 4 new + 22 unchanged - 4 fixed = 26 total (was 26) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 19m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 16m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 51s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}171m 56s{color} | {color:red} root in the patch failed. {color} | | {color:blue}0{color} | {color:blue} asflicense {color} | {color:blue} 0m 48s{color} | {color:blue} ASF License check generated no output? {color} | | {color:black}{color} | {color:black} {color} | {color:black}321m 5s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.shortcircuit.TestShortCircuitCache | | | hadoop.hdfs.TestDFSShell | | | hadoop.hdfs.TestDisableConnCache | | | hadoop.hdfs.server.mover.TestMover | | | hadoop.hdfs.TestErasureCodingPolicyWithSnapshot | | | hadoop.hdfs.TestErasureCodingExerciseAPIs | | | hadoop.hdfs.TestByteBufferPread | | | hadoop.hdfs.client.impl.TestBlockReaderLocalLegacy | | | hadoop.hdfs.TestParallelShortCircuitReadNoChecksum | | | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | | hadoop.hdfs.TestHAAuxiliaryPort | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy | | | hadoop.hdfs.TestSafeMode | | | hadoop.hdfs.TestFileCreationEmpty | | | hadoop.hdfs.server.namenode.TestNameNodeMXBean | | | hadoop.hdfs.TestErasureCodingAddConfig | | | hadoop.hdfs.TestWriteReadStripedFile | | | hadoop.hdfs.TestSetrepDecreasing | | | hadoop.hdfs.TestLeaseRecovery2 | | | hadoop.hdfs.TestFileCreation | | | hadoop.hdfs.server.namenode.TestAddStripedBlocks | | | hadoop.hdfs.TestDFSShellGenericOptions | | | hadoop.hdfs.TestRenameWhileOpen | | | hadoop.hdfs.TestStateAlignmentContextWithHA | | | hadoop.hdfs.TestDistributedFileSystemWithECFile | | | hadoop.hdfs.client.impl.TestBlockReaderLocal | | | hadoop.hdfs.client.impl.TestClientBlockVerification | | | hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9561 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12985387/YARN-9561.011.patch | |
[jira] [Commented] (YARN-9561) Add C changes for the new RuncContainerRuntime
[ https://issues.apache.org/jira/browse/YARN-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970646#comment-16970646 ] Eric Yang commented on YARN-9561: - [~ebadger] What is the right way to run test_runc_util with patch 11? Cetest is crashing on my machine: {code} mvn clean test -Dtest=cetest -Pnative {code} Maven output looks like this: {code} [INFO] --- [INFO] C M A K E B U I L D E RT E S T [INFO] --- [INFO] cetest: running /home/eyang/test/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/test/cetest --gtest_filter=-Perf. --gtest_output=xml:/home/eyang/test/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/surefire-reports/TEST-cetest.xml [INFO] with extra environment variables {} [INFO] STATUS: ERROR CODE 139 after 5 millisecond(s). [INFO] --- [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 01:05 min [INFO] Finished at: 2019-11-08T18:29:30-05:00 [INFO] Final Memory: 56M/575M [INFO] [ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.3.0-SNAPSHOT:cmake-test (cetest) on project hadoop-yarn-server-nodemanager: Test /home/eyang/test/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/test/cetest returned ERROR CODE 139 -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException {code} Run the test manually reveals: {code} $ ./cetest Determining user details Requested user eyang is not whitelisted and has id 501,which is below the minimum allowed 1000 Setting NM UID Segmentation fault {code} > Add C changes for the new RuncContainerRuntime > -- > > Key: YARN-9561 > URL: https://issues.apache.org/jira/browse/YARN-9561 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9561.001.patch, YARN-9561.002.patch, > YARN-9561.003.patch, YARN-9561.004.patch, YARN-9561.005.patch, > YARN-9561.006.patch, YARN-9561.007.patch, YARN-9561.008.patch, > YARN-9561.009.patch, YARN-9561.010.patch, YARN-9561.011.patch > > > This JIRA will be used to add the C changes to the container-executor native > binary that are necessary for the new RuncContainerRuntime. There should be > no changes to existing code paths. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9923) Detect missing Docker binary or not running Docker daemon
[ https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970541#comment-16970541 ] Hadoop QA commented on YARN-9923: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 40s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 26 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 8m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 24m 56s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 36s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 19s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 40s{color} | {color:orange} root: The patch generated 9 new + 603 unchanged - 44 fixed = 612 total (was 647) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 8m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 31s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 10s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 54s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 15s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 59s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}
[jira] [Updated] (YARN-9561) Add C changes for the new RuncContainerRuntime
[ https://issues.apache.org/jira/browse/YARN-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-9561: -- Attachment: YARN-9561.011.patch > Add C changes for the new RuncContainerRuntime > -- > > Key: YARN-9561 > URL: https://issues.apache.org/jira/browse/YARN-9561 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9561.001.patch, YARN-9561.002.patch, > YARN-9561.003.patch, YARN-9561.004.patch, YARN-9561.005.patch, > YARN-9561.006.patch, YARN-9561.007.patch, YARN-9561.008.patch, > YARN-9561.009.patch, YARN-9561.010.patch, YARN-9561.011.patch > > > This JIRA will be used to add the C changes to the container-executor native > binary that are necessary for the new RuncContainerRuntime. There should be > no changes to existing code paths. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9561) Add C changes for the new RuncContainerRuntime
[ https://issues.apache.org/jira/browse/YARN-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970535#comment-16970535 ] Eric Badger commented on YARN-9561: --- Thanks for the review, [~Jim_Brennan]! bq. stat_file_as_nm should ensure that it restores the calling user/group before returning. As we discussed offline, the check does work as designed. bq. (nit) might want to change the name of stat_file_as_nm - it is not at all clear from the name that it will fail if the file exists and succeed if it doesn't Good call. I updated the name in patch 011. > Add C changes for the new RuncContainerRuntime > -- > > Key: YARN-9561 > URL: https://issues.apache.org/jira/browse/YARN-9561 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9561.001.patch, YARN-9561.002.patch, > YARN-9561.003.patch, YARN-9561.004.patch, YARN-9561.005.patch, > YARN-9561.006.patch, YARN-9561.007.patch, YARN-9561.008.patch, > YARN-9561.009.patch, YARN-9561.010.patch, YARN-9561.011.patch > > > This JIRA will be used to add the C changes to the container-executor native > binary that are necessary for the new RuncContainerRuntime. There should be > no changes to existing code paths. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9952) Continuous scheduling thread crashes
[ https://issues.apache.org/jira/browse/YARN-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-9952: - Summary: Continuous scheduling thread crashes (was: ontinuous scheduling thread crashes) > Continuous scheduling thread crashes > > > Key: YARN-9952 > URL: https://issues.apache.org/jira/browse/YARN-9952 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: kailiu_dev >Priority: Major > > {color:#172b4d}2019-10-16 09:14:51,215 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread{color}[FairSchedulerContinuousScheduling,5,main]{color:#172b4d} threw > an Exception.{color} > {color:#172b4d} java.lang.IllegalArgumentException: Comparison method > violates its general contract!{color} > {color:#172b4d} at java.util.TimSort.mergeHi(TimSort.java:868){color} > {color:#172b4d} at java.util.TimSort.mergeAt(TimSort.java:485){color} > {color:#172b4d} at > java.util.TimSort.mergeForceCollapse(TimSort.java:426){color} > {color:#172b4d} at java.util.TimSort.sort(TimSort.java:223){color} > {color:#172b4d} at java.util.TimSort.sort(TimSort.java:173){color} > {color:#172b4d} at java.util.Arrays.sort(Arrays.java:659){color} > {color:#172b4d} at > java.util.Collections.sort(Collections.java:217){color} > {color:#172b4d} at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117){color} > {color:#172b4d} at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296){color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9930) Support max running app logic for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970421#comment-16970421 ] Peter Bacsko commented on YARN-9930: [~epayne] thanks for explaining - now it's clear. I do agree that it's important to keep backward compatibility. I guess we can introduce a boolean property like "yarn.scheduler.capacity.maxrunningapps.reject" with default value "true". > Support max running app logic for CapacityScheduler > --- > > Key: YARN-9930 > URL: https://issues.apache.org/jira/browse/YARN-9930 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, capacityscheduler >Affects Versions: 3.1.0, 3.1.1 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > > In FairScheduler, there has limitation for max running which will let > application pending. > But in CapacityScheduler there has no feature like max running app.Only got > max app,and jobs will be rejected directly on client. > This jira i want to implement this semantic for CapacityScheduler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9930) Support max running app logic for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970417#comment-16970417 ] Eric Payne commented on YARN-9930: -- The Max Apps Per User Setting exists in CS, but it's a calculated value and not directly configurable. It's calculated based on the Max Apps per Queue, the User Limit Factor, and the Minimum User Limit Percent, which are all directly configurable. To see the current value of the Max Apps Per User, go to the CS UI and click on the twisty of any queue. In the CS, any user who reaches that value will not be allowed to submit any more apps until one of their running apps completes. For CS's backward compatibility, I feel that it's important to keep that as the default behavior. > Support max running app logic for CapacityScheduler > --- > > Key: YARN-9930 > URL: https://issues.apache.org/jira/browse/YARN-9930 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, capacityscheduler >Affects Versions: 3.1.0, 3.1.1 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > > In FairScheduler, there has limitation for max running which will let > application pending. > But in CapacityScheduler there has no feature like max running app.Only got > max app,and jobs will be rejected directly on client. > This jira i want to implement this semantic for CapacityScheduler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8990) Fix fair scheduler race condition in app submit and queue cleanup
[ https://issues.apache.org/jira/browse/YARN-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970395#comment-16970395 ] Wilfred Spiegelenburg commented on YARN-8990: - Thank you [~Steven Rand] for making us aware of the omission. And yes you are correct. This was checked into 3.2.0 only and not in 3.2.x. It is in 3.3 For YARN-8992 the fix version is set incorrectly, that one is only in 3.3 (adding comment there too) [~sunilg] how do we handle these two? > Fix fair scheduler race condition in app submit and queue cleanup > - > > Key: YARN-8990 > URL: https://issues.apache.org/jira/browse/YARN-8990 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.2.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Blocker > Fix For: 3.2.0, 3.3.0 > > Attachments: YARN-8990.001.patch, YARN-8990.002.patch > > > With the introduction of the dynamic queue deletion in YARN-8191 a race > condition was introduced that can cause a queue to be removed while an > application submit is in progress. > The issue occurs in {{FairScheduler.addApplication()}} when an application is > submitted to a dynamic queue which is empty or the queue does not exist yet. > If during the processing of the application submit the > {{AllocationFileLoaderService}} kicks of for an update the queue clean up > will be run first. The application submit first creates the queue and get a > reference back to the queue. > Other checks are performed and as the last action before getting ready to > generate an AppAttempt the queue is updated to show the submitted application > ID.. > The time between the queue creation and the queue update to show the submit > is long enough for the queue to be removed. The application however is lost > and will never get any resources assigned. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8992) Fair scheduler can delete a dynamic queue while an application attempt is being added to the queue
[ https://issues.apache.org/jira/browse/YARN-8992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970394#comment-16970394 ] Wilfred Spiegelenburg commented on YARN-8992: - For YARN-8992 the fix version is set incorrectly, this one is only in 3.3. Similar to the related YARN-8990, missing from 3.2.1 > Fair scheduler can delete a dynamic queue while an application attempt is > being added to the queue > -- > > Key: YARN-8992 > URL: https://issues.apache.org/jira/browse/YARN-8992 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: Haibo Chen >Assignee: Wilfred Spiegelenburg >Priority: Major > Fix For: 3.2.1 > > Attachments: YARN-8992.001.patch, YARN-8992.002.patch > > > As discovered in YARN-8990, QueueManager can see a leaf queue being empty > while FSLeafQueue.addApp() is called in the middle of > {code:java} > return queue.getNumRunnableApps() == 0 && > leafQueue.getNumNonRunnableApps() == 0 && > leafQueue.getNumAssignedApps() == 0;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8373) RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH
[ https://issues.apache.org/jira/browse/YARN-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970388#comment-16970388 ] Hadoop QA commented on YARN-8373: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 5s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 30s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 31 unchanged - 0 fixed = 32 total (was 31) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 31s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 81m 52s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}133m 55s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-8373 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12985363/YARN-8373.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 4bf072a5f5af 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 42fc888 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/25127/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25127/testReport/ | | Max. process+thread count | 899 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
[jira] [Commented] (YARN-8373) RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH
[ https://issues.apache.org/jira/browse/YARN-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970386#comment-16970386 ] Hadoop QA commented on YARN-8373: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 3s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 30s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 31 unchanged - 0 fixed = 32 total (was 31) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 10s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 82m 2s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}135m 9s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-8373 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12985362/YARN-8373.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux e2f60afc1aae 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 42fc888 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/25126/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25126/testReport/ | | Max. process+thread count | 891 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U:
[jira] [Commented] (YARN-9564) Create docker-to-squash tool for image conversion
[ https://issues.apache.org/jira/browse/YARN-9564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970382#comment-16970382 ] Jim Brennan commented on YARN-9564: --- Thanks for the updates [~ebadger]! I am +1 (non-binding) on patch 006. > Create docker-to-squash tool for image conversion > - > > Key: YARN-9564 > URL: https://issues.apache.org/jira/browse/YARN-9564 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9564.001.patch, YARN-9564.002.patch, > YARN-9564.003.patch, YARN-9564.004.patch, YARN-9564.005.patch, > YARN-9564.006.patch > > > The new runc runtime uses docker images that are converted into multiple > squashfs images. Each layer of the docker image will get its own squashfs > image. We need a tool to help automate the creation of these squashfs images > when all we have is a docker image -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9561) Add C changes for the new RuncContainerRuntime
[ https://issues.apache.org/jira/browse/YARN-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970380#comment-16970380 ] Jim Brennan commented on YARN-9561: --- Thanks for the updates [~ebadger]! A couple comments on the new patch: * stat_file_as_nm should ensure that it restores the calling user/group before returning. * (nit) might want to change the name of stat_file_as_nm - it is not at all clear from the name that it will fail if the file exists and succeed if it doesn't > Add C changes for the new RuncContainerRuntime > -- > > Key: YARN-9561 > URL: https://issues.apache.org/jira/browse/YARN-9561 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9561.001.patch, YARN-9561.002.patch, > YARN-9561.003.patch, YARN-9561.004.patch, YARN-9561.005.patch, > YARN-9561.006.patch, YARN-9561.007.patch, YARN-9561.008.patch, > YARN-9561.009.patch, YARN-9561.010.patch > > > This JIRA will be used to add the C changes to the container-executor native > binary that are necessary for the new RuncContainerRuntime. There should be > no changes to existing code paths. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9562) Add Java changes for the new RuncContainerRuntime
[ https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970381#comment-16970381 ] Jim Brennan commented on YARN-9562: --- Thanks for the updates [~ebadger]! I am +1 (non-binding) on patch 014. > Add Java changes for the new RuncContainerRuntime > - > > Key: YARN-9562 > URL: https://issues.apache.org/jira/browse/YARN-9562 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-9562.001.patch, YARN-9562.002.patch, > YARN-9562.003.patch, YARN-9562.004.patch, YARN-9562.005.patch, > YARN-9562.006.patch, YARN-9562.007.patch, YARN-9562.008.patch, > YARN-9562.009.patch, YARN-9562.010.patch, YARN-9562.011.patch, > YARN-9562.012.patch, YARN-9562.013.patch, YARN-9562.014.patch > > > This JIRA will be used to add the Java changes for the new > RuncContainerRuntime. This will work off of YARN-9560 to use much of the > existing DockerLinuxContainerRuntime code once it is moved up into an > abstract class that can be extended. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9920) YarnAuthorizationProvider AccessRequest gets Null RemoteAddress from FairScheduler
[ https://issues.apache.org/jira/browse/YARN-9920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970376#comment-16970376 ] Wilfred Spiegelenburg commented on YARN-9920: - I am not sure what you are after with this change. The {{getRemoteAddress()}} method for the {{AccessRequest}} is never called. Same for {{getForwardedAddresses()}}. ACLs do not support limiting on IP either so passing the information through from a YARN perspective does not make sense and having nulls does not cause any issues. If this is to allow auditing or extending in the future then I can understand otherwise please explain. The other problem I have is with the IPC server call that is done. When we get to the {{Server.getRemoteAddress()}} call which address are we getting back? The remote address is a thread local for the IPC server since we have multiple threads servicing IPC incoming requests how can we be sure that the scheduler when checking queue access for example gets the correct thread from the server pool? Second issue is that the web services, like moving an app, uses the {{ClientRMService}} to eventually execute the move. The access check is performed inside the {{ClientRMService}} in which we call {{Server.getRemoteAddress()}}. There is no IPC request at all which probably means we get the local node IP back. Based on just the quick look I think the info is highly suspect and it would be better to have a proper look at when and where we build the {{AccessRequest}} to make sure we get the proper information in all cases. So a no go from my side. > YarnAuthorizationProvider AccessRequest gets Null RemoteAddress from > FairScheduler > -- > > Key: YARN-9920 > URL: https://issues.apache.org/jira/browse/YARN-9920 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, security >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9920-001.patch, YARN-9920-002.patch, > YARN-9920-003.patch > > > YarnAuthorizationProvider AccessRequest has null RemoteAddress in case of > FairScheduler. FSQueue#hasAccess uses Server.getRemoteAddress() which will be > null when the call is from RMWebServices and EventDispatcher. It works fine > when called by IPC Server Handler. > FSQueue#hasAccess is called at three places where (2) and (3) returns null. > *1. IPC Server -> RMAppManager#createAndPopulateNewRMApp -> FSQueue#hasAccess > -> Server.getRemoteAddress returns correct Remote IP.* > > *2. IPC Server -> RMAppManager#createAndPopulateNewRMApp -> > AppAddedSchedulerEvent* > *EventDispatcher -> FairScheduler#addApplication -> FSQueue.hasAccess -> > Server.getRemoteAddress returns null* > > {code:java} > org.apache.hadoop.yarn.security.ConfiguredYarnAuthorizer.checkPermission(ConfiguredYarnAuthorizer.java:101) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.hasAccess(FSQueue.java:316) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:509) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1268) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:133) > at > org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) > {code} > > *3. RMWebServices -> QueueACLsManager#checkAccess -> FSQueue.hasAccess -> > Server.getRemoteAddress returns null.* > {code:java} > org.apache.hadoop.yarn.security.ConfiguredYarnAuthorizer.checkPermission(ConfiguredYarnAuthorizer.java:101) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.hasAccess(FSQueue.java:316) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.checkAccess(FairScheduler.java:1610) > at > org.apache.hadoop.yarn.server.resourcemanager.security.QueueACLsManager.checkAccess(QueueACLsManager.java:84) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.hasAccess(RMWebServices.java:270) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:553) > {code} > > Have verified with CapacityScheduler and it works fine. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970342#comment-16970342 ] kailiu_dev commented on YARN-9940: -- [~wilfreds] , thank you. Yes, in our code now, is protected void completedContainer(RMContainer rmContainer not {color:#d73a49}protected{color} {color:#d73a49}synchronized{color} {color:#d73a49}void{color}{color:#24292e} completedContainer({color}{color:#24292e}RMContainer{color}{color:#24292e} rmContainer,{color} ... in source code in hadoop2.7.2 may be changed by someone, I will need to review total code, thank you! > avoid continuous scheduling thread crashes while sorting nodes get > 'Comparison method violates its general contract' > > > Key: YARN-9940 > URL: https://issues.apache.org/jira/browse/YARN-9940 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: kailiu_dev >Assignee: kailiu_dev >Priority: Major > Attachments: YARN-9940-branch-2.7.2.001.patch > > > 2019-10-16 09:14:51,215 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception. > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:868) > at java.util.TimSort.mergeAt(TimSort.java:485) > at java.util.TimSort.mergeForceCollapse(TimSort.java:426) > at java.util.TimSort.sort(TimSort.java:223) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9965) Fix NodeManager failing to start when Hdfs Auxillary Jar is set
[ https://issues.apache.org/jira/browse/YARN-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970341#comment-16970341 ] Hadoop QA commented on YARN-9965: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 40s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 57s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 3s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 57s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 79m 30s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9965 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12985361/YARN-9965-001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux b81325f64415 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 42fc888 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25125/testReport/ | | Max. process+thread count | 306 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/25125/console | | Powered by | Apache Yetus 0.8.0
[jira] [Commented] (YARN-9886) Queue mapping based on userid passed through application tag
[ https://issues.apache.org/jira/browse/YARN-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970269#comment-16970269 ] Hadoop QA commented on YARN-9886: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 8s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 48s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 2s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 34s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 52s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 48s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m 23s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}181m 34s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9886 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12985350/YARN-9886.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 01f9e8396635 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Updated] (YARN-9923) Detect missing Docker binary or not running Docker daemon
[ https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Antal updated YARN-9923: - Attachment: YARN-9923.002.patch > Detect missing Docker binary or not running Docker daemon > - > > Key: YARN-9923 > URL: https://issues.apache.org/jira/browse/YARN-9923 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, yarn >Affects Versions: 3.2.1 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: YARN-9923.001.patch, YARN-9923.002.patch > > > Currently if a NodeManager is enabled to allocate Docker containers, but the > specified binary (docker.binary in the container-executor.cfg) is missing the > container allocation fails with the following error message: > {noformat} > Container launch fails > Exit code: 29 > Exception message: Launch container failed > Shell error output: sh: : No > such file or directory > Could not inspect docker network to get type /usr/bin/docker network inspect > host --format='{{.Driver}}'. > Error constructing docker command, docker error code=-1, error > message='Unknown error' > {noformat} > I suggest to add a property say "yarn.nodemanager.runtime.linux.docker.check" > to have the following options: > - STARTUP: setting this option the NodeManager would not start if Docker > binaries are missing or the Docker daemon is not running (the exception is > considered FATAL during startup) > - RUNTIME: would give a more detailed/user-friendly exception in > NodeManager's side (NM logs) if Docker binaries are missing or the daemon is > not working. This would also prevent further Docker container allocation as > long as the binaries do not exist and the docker daemon is not running. > - NONE (default): preserving the current behaviour, throwing exception during > container allocation, carrying on using the default retry procedure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970251#comment-16970251 ] Wilfred Spiegelenburg commented on YARN-9940: - The changes you are making are also not helpful in hadoop 2.7. When you synchronise a method you do the same as placing all the code inside the method within a synchronised block. There is only one synchronised method or code block that can run at any time in the same class. These two code samples are effectively the same: {code:java} public synchronized myMethod() { all my code here... } {code} and {code:java} public myMethod() { synchronized (this) { all my code here... } } {code} Since the method {{FairScheduler.completedContainer()}} is already synchronised your change of adding a block that is synchronised inside the method does not make a difference. It will be optimised away by the compiler. Hadoop 2.7.2 does not have read/write locks in the scheduler at all so I don't know what version you are running but it is not hadoop 2.7. The read/write locks were introduced in YARN-3139 which is only in hadoop 2.9 and later. Same for the line numbers in the stack they do not line up with the 2.7 release. As per YARN-8373: - the test is not really testing anything as it holds the scheduler lock while calling the deductUnallocatedResource() this does not happen in the real code and should not be there. - the best solution is to move to a PriorityQueue for the sorted list that really fixes the issue as the test shows without the lock in place. > avoid continuous scheduling thread crashes while sorting nodes get > 'Comparison method violates its general contract' > > > Key: YARN-9940 > URL: https://issues.apache.org/jira/browse/YARN-9940 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: kailiu_dev >Assignee: kailiu_dev >Priority: Major > Attachments: YARN-9940-branch-2.7.2.001.patch > > > 2019-10-16 09:14:51,215 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception. > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:868) > at java.util.TimSort.mergeAt(TimSort.java:485) > at java.util.TimSort.mergeForceCollapse(TimSort.java:426) > at java.util.TimSort.sort(TimSort.java:223) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9952) ontinuous scheduling thread crashes
[ https://issues.apache.org/jira/browse/YARN-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YARN-9952. - Resolution: Duplicate > ontinuous scheduling thread crashes > --- > > Key: YARN-9952 > URL: https://issues.apache.org/jira/browse/YARN-9952 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: kailiu_dev >Priority: Major > > {color:#172b4d}2019-10-16 09:14:51,215 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread{color}[FairSchedulerContinuousScheduling,5,main]{color:#172b4d} threw > an Exception.{color} > {color:#172b4d} java.lang.IllegalArgumentException: Comparison method > violates its general contract!{color} > {color:#172b4d} at java.util.TimSort.mergeHi(TimSort.java:868){color} > {color:#172b4d} at java.util.TimSort.mergeAt(TimSort.java:485){color} > {color:#172b4d} at > java.util.TimSort.mergeForceCollapse(TimSort.java:426){color} > {color:#172b4d} at java.util.TimSort.sort(TimSort.java:223){color} > {color:#172b4d} at java.util.TimSort.sort(TimSort.java:173){color} > {color:#172b4d} at java.util.Arrays.sort(Arrays.java:659){color} > {color:#172b4d} at > java.util.Collections.sort(Collections.java:217){color} > {color:#172b4d} at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117){color} > {color:#172b4d} at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296){color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8373) RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH
[ https://issues.apache.org/jira/browse/YARN-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970241#comment-16970241 ] Wilfred Spiegelenburg commented on YARN-8373: - patch-003 to fix the imports the IDE had replaced with a wildcard > RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH > --- > > Key: YARN-8373 > URL: https://issues.apache.org/jira/browse/YARN-8373 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Affects Versions: 2.9.0 >Reporter: Girish Bhat >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: newbie > Attachments: YARN-8373.001.patch, YARN-8373.002.patch, > YARN-8373.003.patch > > > > > {noformat} > sudo -u yarn /usr/local/hadoop/latest/bin/yarn version Hadoop 2.9.0 > Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r > 756ebc8394e473ac25feac05fa493f6d612e6c50 Compiled by arsuresh on > 2017-11-13T23:15Z Compiled with protoc 2.5.0 From source with checksum > 0a76a9a32a5257331741f8d5932f183 This command was run using > /usr/local/hadoop/hadoop-2.9.0/share/hadoop/common/hadoop-common-2.9.0.jar{noformat} > This is for version 2.9.0 > > {noformat} > 2018-05-25 05:53:12,742 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received > RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, Fai > rSchedulerContinuousScheduling, that exited unexpectedly: > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:340) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:907) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) > 2018-05-25 05:53:12,743 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Shutting down > the resource manager. > 2018-05-25 05:53:12,749 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1: a critical thread, FairSchedulerContinuousScheduling, that exited > unexpectedly: java.lang.IllegalArgumentException: Comparison method violates > its general contract! > at java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:340) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:907) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) > 2018-05-25 05:53:12,772 ERROR > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: > ExpiredTokenRemover received java.lang.InterruptedException: sleep > interrupted{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8373) RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH
[ https://issues.apache.org/jira/browse/YARN-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YARN-8373: Attachment: YARN-8373.003.patch > RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH > --- > > Key: YARN-8373 > URL: https://issues.apache.org/jira/browse/YARN-8373 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Affects Versions: 2.9.0 >Reporter: Girish Bhat >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: newbie > Attachments: YARN-8373.001.patch, YARN-8373.002.patch, > YARN-8373.003.patch > > > > > {noformat} > sudo -u yarn /usr/local/hadoop/latest/bin/yarn version Hadoop 2.9.0 > Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r > 756ebc8394e473ac25feac05fa493f6d612e6c50 Compiled by arsuresh on > 2017-11-13T23:15Z Compiled with protoc 2.5.0 From source with checksum > 0a76a9a32a5257331741f8d5932f183 This command was run using > /usr/local/hadoop/hadoop-2.9.0/share/hadoop/common/hadoop-common-2.9.0.jar{noformat} > This is for version 2.9.0 > > {noformat} > 2018-05-25 05:53:12,742 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received > RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, Fai > rSchedulerContinuousScheduling, that exited unexpectedly: > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:340) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:907) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) > 2018-05-25 05:53:12,743 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Shutting down > the resource manager. > 2018-05-25 05:53:12,749 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1: a critical thread, FairSchedulerContinuousScheduling, that exited > unexpectedly: java.lang.IllegalArgumentException: Comparison method violates > its general contract! > at java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:340) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:907) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) > 2018-05-25 05:53:12,772 ERROR > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: > ExpiredTokenRemover received java.lang.InterruptedException: sleep > interrupted{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8373) RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH
[ https://issues.apache.org/jira/browse/YARN-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YARN-8373: Attachment: YARN-8373.002.patch > RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH > --- > > Key: YARN-8373 > URL: https://issues.apache.org/jira/browse/YARN-8373 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Affects Versions: 2.9.0 >Reporter: Girish Bhat >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: newbie > Attachments: YARN-8373.001.patch, YARN-8373.002.patch > > > > > {noformat} > sudo -u yarn /usr/local/hadoop/latest/bin/yarn version Hadoop 2.9.0 > Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r > 756ebc8394e473ac25feac05fa493f6d612e6c50 Compiled by arsuresh on > 2017-11-13T23:15Z Compiled with protoc 2.5.0 From source with checksum > 0a76a9a32a5257331741f8d5932f183 This command was run using > /usr/local/hadoop/hadoop-2.9.0/share/hadoop/common/hadoop-common-2.9.0.jar{noformat} > This is for version 2.9.0 > > {noformat} > 2018-05-25 05:53:12,742 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received > RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, Fai > rSchedulerContinuousScheduling, that exited unexpectedly: > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:340) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:907) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) > 2018-05-25 05:53:12,743 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Shutting down > the resource manager. > 2018-05-25 05:53:12,749 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1: a critical thread, FairSchedulerContinuousScheduling, that exited > unexpectedly: java.lang.IllegalArgumentException: Comparison method violates > its general contract! > at java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:340) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:907) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) > 2018-05-25 05:53:12,772 ERROR > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: > ExpiredTokenRemover received java.lang.InterruptedException: sleep > interrupted{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8373) RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH
[ https://issues.apache.org/jira/browse/YARN-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970224#comment-16970224 ] Wilfred Spiegelenburg commented on YARN-8373: - Your link points to code in master not in trunk: master has not been updated since 2015, [trunk|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java] has the readlock in the code. I do agree that fixing the data consistency would be the best thing. However locking large numbers of nodes before we sort and then unlock them again will have a huge performance impact. Moving to a PriorityQueue is possible as the FS is the only one that uses the method at the moment. It also fixes the issue as is confirmed by the unit test. The old unit test without special locking modifies the nodes while sorting without issues. This has been confirmed in local runs with extra logging: {code} 2019-11-09 00:57:09,958 INFO [Thread-25] scheduler.SchedulerNode (SchedulerNode.java:deductUnallocatedResource(349)) - deducting resource from null:2147 2019-11-09 00:57:09,958 INFO [FairSchedulerContinuousScheduling] scheduler.ClusterNodeTracker (ClusterNodeTracker.java:sortedNodeList(390)) - sorting node list of size 8000 2019-11-09 00:57:09,958 INFO [Thread-25] scheduler.SchedulerNode (SchedulerNode.java:deductUnallocatedResource(349)) - deducting resource from null:5949 2019-11-09 00:57:09,958 INFO [Thread-25] scheduler.SchedulerNode (SchedulerNode.java:deductUnallocatedResource(349)) - deducting resource from null:4677 ... ... 2019-11-09 00:57:09,961 INFO [Thread-25] scheduler.SchedulerNode (SchedulerNode.java:deductUnallocatedResource(349)) - deducting resource from null:2212 2019-11-09 00:57:09,961 INFO [FairSchedulerContinuousScheduling] fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1005)) - scheduler sorted node list of size 8000 2019-11-09 00:57:09,962 INFO [Thread-25] scheduler.SchedulerNode (SchedulerNode.java:deductUnallocatedResource(349)) - deducting resource from null:2949 2019-11-09 00:57:09,962 INFO [Thread-25] scheduler.SchedulerNode (SchedulerNode.java:deductUnallocatedResource(349)) - deducting resource from null:3866 {code} New patch uploaded > RM Received RMFatalEvent of type CRITICAL_THREAD_CRASH > --- > > Key: YARN-8373 > URL: https://issues.apache.org/jira/browse/YARN-8373 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Affects Versions: 2.9.0 >Reporter: Girish Bhat >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: newbie > Attachments: YARN-8373.001.patch, YARN-8373.002.patch > > > > > {noformat} > sudo -u yarn /usr/local/hadoop/latest/bin/yarn version Hadoop 2.9.0 > Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r > 756ebc8394e473ac25feac05fa493f6d612e6c50 Compiled by arsuresh on > 2017-11-13T23:15Z Compiled with protoc 2.5.0 From source with checksum > 0a76a9a32a5257331741f8d5932f183 This command was run using > /usr/local/hadoop/hadoop-2.9.0/share/hadoop/common/hadoop-common-2.9.0.jar{noformat} > This is for version 2.9.0 > > {noformat} > 2018-05-25 05:53:12,742 ERROR > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received > RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, Fai > rSchedulerContinuousScheduling, that exited unexpectedly: > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.ClusterNodeTracker.sortedNodeList(ClusterNodeTracker.java:340) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:907) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) > 2018-05-25 05:53:12,743 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Shutting down > the resource manager. > 2018-05-25 05:53:12,749 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1: a critical thread, FairSchedulerContinuousScheduling, that exited > unexpectedly: java.lang.IllegalArgumentException: Comparison method violates > its general
[jira] [Updated] (YARN-9965) Fix NodeManager failing to start when Hdfs Auxillary Jar is set
[ https://issues.apache.org/jira/browse/YARN-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9965: Attachment: YARN-9965-001.patch > Fix NodeManager failing to start when Hdfs Auxillary Jar is set > --- > > Key: YARN-9965 > URL: https://issues.apache.org/jira/browse/YARN-9965 > Project: Hadoop YARN > Issue Type: Bug > Components: auxservices, nodemanager >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9965-001.patch > > > Loading an auxiliary jar from a Hdfs location on a node manager works as > expected on first time. The subsequent restart fails with > ClassNotFoundException > {code:java} > 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: > classpath: [] > 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: > system classes: [java., javax.accessibility., javax.activation., > javax.activity., javax.annotation., javax.annotation.processing., > javax.crypto., javax.imageio., javax.jws., javax.lang.model., > -javax.management.j2ee., javax.management., javax.naming., javax.net., > javax.print., javax.rmi., javax.script., -javax.security.auth.message., > javax.security.auth., javax.security.cert., javax.security.sasl., > javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., > -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., > org.xml.sax., org.apache.commons.logging., org.apache.log4j., > -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, > hdfs-default.xml, mapred-default.xml, yarn-default.xml] > 2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed > in state INITED > java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:270) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:321) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:478) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016) > {code} > > The issue happens when the previous localized auxillary service jar location > is present when setup of auxillary services. Removing > /yarn/nm/nmAuxService/org.apache.auxtest.AuxServiceFromHDFS_1573205385991 and > restart NM works fine. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9965) Fix NodeManager failing to start when Hdfs Auxillary Jar is set
[ https://issues.apache.org/jira/browse/YARN-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9965: Summary: Fix NodeManager failing to start when Hdfs Auxillary Jar is set (was: ClassNotFoundException when auxiliary service is loaded from HDFS on subsequent restart of NM) > Fix NodeManager failing to start when Hdfs Auxillary Jar is set > --- > > Key: YARN-9965 > URL: https://issues.apache.org/jira/browse/YARN-9965 > Project: Hadoop YARN > Issue Type: Bug > Components: auxservices, nodemanager >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > > Loading an auxiliary jar from a Hdfs location on a node manager works as > expected on first time. The subsequent restart fails with > ClassNotFoundException > {code:java} > 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: > classpath: [] > 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: > system classes: [java., javax.accessibility., javax.activation., > javax.activity., javax.annotation., javax.annotation.processing., > javax.crypto., javax.imageio., javax.jws., javax.lang.model., > -javax.management.j2ee., javax.management., javax.naming., javax.net., > javax.print., javax.rmi., javax.script., -javax.security.auth.message., > javax.security.auth., javax.security.cert., javax.security.sasl., > javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., > -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., > org.xml.sax., org.apache.commons.logging., org.apache.log4j., > -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, > hdfs-default.xml, mapred-default.xml, yarn-default.xml] > 2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed > in state INITED > java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:270) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:321) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:478) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016) > {code} > > The issue happens when the previous localized auxillary service jar location > is present when setup of auxillary services. Removing > /yarn/nm/nmAuxService/org.apache.auxtest.AuxServiceFromHDFS_1573205385991 and > restart NM works fine. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9965) ClassNotFoundException when auxiliary service is loaded from HDFS on subsequent restart of NM
[ https://issues.apache.org/jira/browse/YARN-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9965: Description: Loading an auxiliary jar from a Hdfs location on a node manager works as expected on first time. The subsequent restart fails with ClassNotFoundException {code:java} 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: classpath: [] 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: system classes: [java., javax.accessibility., javax.activation., javax.activity., javax.annotation., javax.annotation.processing., javax.crypto., javax.imageio., javax.jws., javax.lang.model., -javax.management.j2ee., javax.management., javax.naming., javax.net., javax.print., javax.rmi., javax.script., -javax.security.auth.message., javax.security.auth., javax.security.cert., javax.security.sasl., javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., org.xml.sax., org.apache.commons.logging., org.apache.log4j., -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, hdfs-default.xml, mapred-default.xml, yarn-default.xml] 2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed in state INITED java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189) at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169) at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:270) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:321) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:478) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016) {code} The issue happens when the previous localized auxillary service jar location is present when setup of auxillary services. Removing /yarn/nm/nmAuxService/org.apache.auxtest.AuxServiceFromHDFS_1573205385991 and restart NM works fine. was: Loading an auxiliary jar from a Hdfs location on a node manager works as expected on first time. The subsequent restart fails with ClassNotFoundException {code:java} 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: classpath: [] 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: system classes: [java., javax.accessibility., javax.activation., javax.activity., javax.annotation., javax.annotation.processing., javax.crypto., javax.imageio., javax.jws., javax.lang.model., -javax.management.j2ee., javax.management., javax.naming., javax.net., javax.print., javax.rmi., javax.script., -javax.security.auth.message., javax.security.auth., javax.security.cert., javax.security.sasl., javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., org.xml.sax., org.apache.commons.logging., org.apache.log4j., -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, hdfs-default.xml, mapred-default.xml, yarn-default.xml] 2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed in state INITED java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at
[jira] [Updated] (YARN-9965) ClassNotFoundException when auxiliary service is loaded from HDFS on subsequent restart of NM
[ https://issues.apache.org/jira/browse/YARN-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9965: Description: Loading an auxiliary jar from a Hdfs location on a node manager works as expected on first time. The subsequent restart fails with ClassNotFoundException {code:java} 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: classpath: [] 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: system classes: [java., javax.accessibility., javax.activation., javax.activity., javax.annotation., javax.annotation.processing., javax.crypto., javax.imageio., javax.jws., javax.lang.model., -javax.management.j2ee., javax.management., javax.naming., javax.net., javax.print., javax.rmi., javax.script., -javax.security.auth.message., javax.security.auth., javax.security.cert., javax.security.sasl., javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., org.xml.sax., org.apache.commons.logging., org.apache.log4j., -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, hdfs-default.xml, mapred-default.xml, yarn-default.xml] 2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed in state INITED java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189) at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169) at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:270) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:321) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:478) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016) {code} The issue happens when the previous localized auxillary service jar location is present when setup of auxillary services. Removing /yarn/nm/nmAuxService/org.apache.auxtest.AuxServiceFromHDFS_1573205385991and restart NM works fine. was: Loading an auxiliary jar from a Hdfs location on a node manager works as expected on first time. The subsequent restart fails with ClassNotFoundException {code:java} 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: classpath: [] 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: system classes: [java., javax.accessibility., javax.activation., javax.activity., javax.annotation., javax.annotation.processing., javax.crypto., javax.imageio., javax.jws., javax.lang.model., -javax.management.j2ee., javax.management., javax.naming., javax.net., javax.print., javax.rmi., javax.script., -javax.security.auth.message., javax.security.auth., javax.security.cert., javax.security.sasl., javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., org.xml.sax., org.apache.commons.logging., org.apache.log4j., -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, hdfs-default.xml, mapred-default.xml, yarn-default.xml] 2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed in state INITED java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at
[jira] [Created] (YARN-9965) ClassNotFoundException when auxiliary service is loaded from HDFS on subsequent restart of NM
Prabhu Joseph created YARN-9965: --- Summary: ClassNotFoundException when auxiliary service is loaded from HDFS on subsequent restart of NM Key: YARN-9965 URL: https://issues.apache.org/jira/browse/YARN-9965 Project: Hadoop YARN Issue Type: Bug Reporter: Prabhu Joseph Assignee: Prabhu Joseph Loading an auxiliary jar from a Hdfs location on a node manager works as expected on first time. The subsequent restart fails with ClassNotFoundException {code:java} 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: classpath: [] 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: system classes: [java., javax.accessibility., javax.activation., javax.activity., javax.annotation., javax.annotation.processing., javax.crypto., javax.imageio., javax.jws., javax.lang.model., -javax.management.j2ee., javax.management., javax.naming., javax.net., javax.print., javax.rmi., javax.script., -javax.security.auth.message., javax.security.auth., javax.security.cert., javax.security.sasl., javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., org.xml.sax., org.apache.commons.logging., org.apache.log4j., -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, hdfs-default.xml, mapred-default.xml, yarn-default.xml] 2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed in state INITED java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189) at org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169) at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:270) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:321) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:478) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016) {code} *AuxService.java: Path downloaded = download.call()* On first time - the downloaded jar file is shown as [file:/yarn/nm/nmAuxService/org.apache.auxtest.AuxServiceFromHDFS_1573205385991/aux-service-hdfs.jar|file:///yarn/nm/nmAuxService/org.apache.auxtest.AuxServiceFromHDFS_1573205385991/aux-service-hdfs.jar] On Subsequent restart - it is shown as /yarn/nm/nmAuxService/org.apache.auxtest.AuxServiceFromHDFS_1573205385991/aux-service-hdfs.jar/* Looks the previous jar local location is not removed causing the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9965) ClassNotFoundException when auxiliary service is loaded from HDFS on subsequent restart of NM
[ https://issues.apache.org/jira/browse/YARN-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9965: Component/s: nodemanager auxservices > ClassNotFoundException when auxiliary service is loaded from HDFS on > subsequent restart of NM > - > > Key: YARN-9965 > URL: https://issues.apache.org/jira/browse/YARN-9965 > Project: Hadoop YARN > Issue Type: Bug > Components: auxservices, nodemanager >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > > Loading an auxiliary jar from a Hdfs location on a node manager works as > expected on first time. The subsequent restart fails with > ClassNotFoundException > {code:java} > 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: > classpath: [] > 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: > system classes: [java., javax.accessibility., javax.activation., > javax.activity., javax.annotation., javax.annotation.processing., > javax.crypto., javax.imageio., javax.jws., javax.lang.model., > -javax.management.j2ee., javax.management., javax.naming., javax.net., > javax.print., javax.rmi., javax.script., -javax.security.auth.message., > javax.security.auth., javax.security.cert., javax.security.sasl., > javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., > -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., > org.xml.sax., org.apache.commons.logging., org.apache.log4j., > -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, > hdfs-default.xml, mapred-default.xml, yarn-default.xml] > 2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed > in state INITED > java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:270) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:321) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:478) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016) > {code} > > *AuxService.java: Path downloaded = download.call()* > On first time - the downloaded jar file is shown as > [file:/yarn/nm/nmAuxService/org.apache.auxtest.AuxServiceFromHDFS_1573205385991/aux-service-hdfs.jar|file:///yarn/nm/nmAuxService/org.apache.auxtest.AuxServiceFromHDFS_1573205385991/aux-service-hdfs.jar] > > On Subsequent restart - it is shown as > /yarn/nm/nmAuxService/org.apache.auxtest.AuxServiceFromHDFS_1573205385991/aux-service-hdfs.jar/* > Looks the previous jar local location is not removed causing the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9965) ClassNotFoundException when auxiliary service is loaded from HDFS on subsequent restart of NM
[ https://issues.apache.org/jira/browse/YARN-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9965: Affects Version/s: 3.2.0 > ClassNotFoundException when auxiliary service is loaded from HDFS on > subsequent restart of NM > - > > Key: YARN-9965 > URL: https://issues.apache.org/jira/browse/YARN-9965 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > > Loading an auxiliary jar from a Hdfs location on a node manager works as > expected on first time. The subsequent restart fails with > ClassNotFoundException > {code:java} > 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: > classpath: [] > 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: > system classes: [java., javax.accessibility., javax.activation., > javax.activity., javax.annotation., javax.annotation.processing., > javax.crypto., javax.imageio., javax.jws., javax.lang.model., > -javax.management.j2ee., javax.management., javax.naming., javax.net., > javax.print., javax.rmi., javax.script., -javax.security.auth.message., > javax.security.auth., javax.security.cert., javax.security.sasl., > javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., > -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., > org.xml.sax., org.apache.commons.logging., org.apache.log4j., > -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, > hdfs-default.xml, mapred-default.xml, yarn-default.xml] > 2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed > in state INITED > java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:270) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:321) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:478) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016) > {code} > > *AuxService.java: Path downloaded = download.call()* > On first time - the downloaded jar file is shown as > [file:/yarn/nm/nmAuxService/org.apache.auxtest.AuxServiceFromHDFS_1573205385991/aux-service-hdfs.jar|file:///yarn/nm/nmAuxService/org.apache.auxtest.AuxServiceFromHDFS_1573205385991/aux-service-hdfs.jar] > > On Subsequent restart - it is shown as > /yarn/nm/nmAuxService/org.apache.auxtest.AuxServiceFromHDFS_1573205385991/aux-service-hdfs.jar/* > Looks the previous jar local location is not removed causing the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9886) Queue mapping based on userid passed through application tag
[ https://issues.apache.org/jira/browse/YARN-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kinga Marton updated YARN-9886: --- Attachment: YARN-9886.003.patch > Queue mapping based on userid passed through application tag > > > Key: YARN-9886 > URL: https://issues.apache.org/jira/browse/YARN-9886 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Kinga Marton >Assignee: Kinga Marton >Priority: Major > Attachments: YARN-9886-WIP.patch, YARN-9886.001.patch, > YARN-9886.002.patch, YARN-9886.003.patch > > > There are situations when the real submitting user differs from the user what > arrives to YARN. For example in case of a Hive application when Hive > impersonation is turned off, the hive queries will run as Hive user and the > mapping is done based on this username. Unfortunately in this case YARN > doesn't have any information about the real user and there are cases when the > customer may want to map these applications to the real submitting user's > queue instead of the Hive queue. > For these cases, if they would pass the username in the application tag we > may read it and use it during the queue mapping, if that user has rights to > run on the real user's queue. > [~sunilg] please correct me if I missed something. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9861) The ResourceManager log reports an error "Too many open files", the analysis is related to the service
[ https://issues.apache.org/jira/browse/YARN-9861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huiyangjian resolved YARN-9861. --- Resolution: Fixed https://issues.apache.org/jira/browse/YARN-9837 The stream is not closed,The load method needs to be added"IOUtils.closeStream(dataInputStream);" > The ResourceManager log reports an error "Too many open files", the analysis > is related to the service > -- > > Key: YARN-9861 > URL: https://issues.apache.org/jira/browse/YARN-9861 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-native-services >Affects Versions: 3.3.0 > Environment: yarn version:3.3.0-SNAPSHOT > hdfs version:2.7.1 >Reporter: huiyangjian >Priority: Major > Attachments: picture1.png, picture2.png, picture3.png, picture4.png, > picture5.png, submarine_kerasgesv2date20190807.json > > > The ResourceManager log outputs "Too many open files" and cannot commit a new > task. > 1. First is the error in picture1, > 2. Then check the file handle open by RM (lsof -p PID), see picture 2, > 3. Also read nameNode audit log (picture 3), > 4. Confirm about service according to the path of service configuration > (picture 4), > 5. Handle number growth trend (picture 5). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970053#comment-16970053 ] Hadoop QA commented on YARN-9537: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 2s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 27s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 81m 40s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}140m 5s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | YARN-9537 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12985328/YARN-9537.004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 7a74968be9a3 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 42fc888 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/25122/testReport/ | | Max. process+thread count | 858 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/25122/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Add configuration to disable AM preemption >
[jira] [Commented] (YARN-9948) Remove attempts that are beyond max-attempt limit from RMAppImpl
[ https://issues.apache.org/jira/browse/YARN-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970038#comment-16970038 ] Hu Ziqian commented on YARN-9948: - Hi [~hex108], in our cluster, one attempt will use more than 1k memory in RMAppImpl. And it will also increase the memory size and response size when user call the restful api. In our cluster, most of the apps are streaming apps and some of them may try more than 10 thousand attempts. It's really a big memory cost. in comment 2, it says that _app will always retry if there are some attempts that does not count towards max attempt retry in the attempts we kept._ I think it won't happen because we only delete attempt which finish time < (endTime - attemptFailuresValidityInterval). Although we delete some old attempts, it will not change the result of (numberOfFailure < app.maxAppAttempts) and won't change the result of app state. If app try more than app.maxAppAttempts during attemptFailuresValidityInterval, it will failed as expected. I also add a config to keep them in memory by default. > Remove attempts that are beyond max-attempt limit from RMAppImpl > > > Key: YARN-9948 > URL: https://issues.apache.org/jira/browse/YARN-9948 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.1.3 >Reporter: Hu Ziqian >Priority: Major > Attachments: YARN-9948.001.patch > > > RM will store app attempt in both state store and RMAppImpl. YARN-3480 > removes attempts that are beyond max-attempt limit from state store. In this > issue we delete those attempts in RMAppImpl the reduce decrease memory usage > of RM. > We introduce flag yarn.resourcemanager.am.delete-old-attempts.enabled to > enable this logic, default value is false. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9957) The first container we recover may not be the AM
[ https://issues.apache.org/jira/browse/YARN-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969942#comment-16969942 ] Hadoop QA commented on YARN-9957: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 15s{color} | {color:red} Docker failed to build yetus/hadoop:ef54f78530d. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-9957 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12985329/YARN-9957-branch-2.9.1.002.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/25123/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > The first container we recover may not be the AM > > > Key: YARN-9957 > URL: https://issues.apache.org/jira/browse/YARN-9957 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.9.1 >Reporter: Xianghao Lu >Assignee: Xianghao Lu >Priority: Major > Fix For: 2.9.1 > > Attachments: 1.jpg, 2.jpg, YARN-9957-branch-2.9.1.001.patch, > YARN-9957-branch-2.9.1.002.patch > > > YARN-7382 says that if not running unmanaged, the first container we recover > is always the AM, however, the actual situation is not like this, this can > lead to a wrong am resource usage after rm recover. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9957) The first container we recover may not be the AM
[ https://issues.apache.org/jira/browse/YARN-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969940#comment-16969940 ] Xianghao Lu edited comment on YARN-9957 at 11/8/19 8:30 AM: IMO, the root cause of the following case in YARN-7382 is [app.getPendingDemand|https://github.com/apache/hadoop/blob/e30710aea4e6e55e69372929106cf119af06fd0e/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java#L373], which we rely on to get apps with pending resource. we know demand = pending + usage, when map get to 100%, usage become 0 after map container completed, but demand not be updated immediately, so the scheduler mistakenly think of there are pending resouces request. {code:java} While running an MR job (e.g. sleep) and an RM failover occurs, once the maps gets to 100%, the now active RM will crash {code} Here is my test log {quote}2019-11-08 14:39:58,640 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: lxh debug app application_1573179594570_0001 demand resouceusage 2019-11-08 14:39:59,643 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: lxh debug app application_1573179594570_0001{color:#FF} demand resouceusage {color} 2019-11-08 14:40:00,439 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e2053_1573179594570_0001_01_03 Container Transitioned {color:#FF}from RUNNING to COMPLETED{color} 2019-11-08 14:40:00,439 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1573179594570_0001 CONTAINERID=container_e2053_1573179594570_0001_01_03 RESOURCE= 2019-11-08 14:40:00,440 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: app application_1573179594570_0001 {color:#FF}demand resouceusage {color} 2019-11-08 14:40:00,440 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: app application_1573179594570_0001 demand resouceusage 2019-11-08 14:40:00,440 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: app application_1573179594570_0001 pending demand {color:#FF}schedulerKey []{color} 2019-11-08 14:40:00,442 FATAL org.apache.hadoop.yarn.event.EventDispatcher: Error in handling event type NODE_UPDATE to the Event Dispatcher java.util.NoSuchElementException at java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2053) at java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:372) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:934) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1359) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:346) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:207) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1034) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:902) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1119) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:129) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) at java.lang.Thread.run(Thread.java:748) 2019-11-08 14:40:00,443 INFO org.apache.hadoop.yarn.event.EventDispatcher: Exiting, bbye.. {quote} was (Author: luxianghao): IMO, the root cause of the following case in YARN-7382 is [app.getPendingDemand|#L373], which we rely on to get apps with pending resource. we know demand = pending + usage, when map get to 100%, usage become 0 after map container completed, but demand not be updated immediately, so the scheduler mistakenly think of there are pending resouces request. {code:java} While running an MR job (e.g. sleep) and an RM failover occurs, once the maps gets to 100%, the now active RM will crash {code} Here is my test log {quote}2019-11-08 14:39:58,640 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: lxh debug app application_1573179594570_0001 demand resouceusage 2019-11-08 14:39:59,643 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: lxh
[jira] [Comment Edited] (YARN-9957) The first container we recover may not be the AM
[ https://issues.apache.org/jira/browse/YARN-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969940#comment-16969940 ] Xianghao Lu edited comment on YARN-9957 at 11/8/19 8:29 AM: IMO, the root cause of the following case in YARN-7382 is [app.getPendingDemand|#L373], which we rely on to get apps with pending resource. we know demand = pending + usage, when map get to 100%, usage become 0 after map container completed, but demand not be updated immediately, so the scheduler mistakenly think of there are pending resouces request. {code:java} While running an MR job (e.g. sleep) and an RM failover occurs, once the maps gets to 100%, the now active RM will crash {code} Here is my test log {quote}2019-11-08 14:39:58,640 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: lxh debug app application_1573179594570_0001 demand resouceusage 2019-11-08 14:39:59,643 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: lxh debug app application_1573179594570_0001{color:#FF} demand resouceusage {color} 2019-11-08 14:40:00,439 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e2053_1573179594570_0001_01_03 Container Transitioned {color:#FF}from RUNNING to COMPLETED{color} 2019-11-08 14:40:00,439 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1573179594570_0001 CONTAINERID=container_e2053_1573179594570_0001_01_03 RESOURCE= 2019-11-08 14:40:00,440 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: app application_1573179594570_0001 {color:#FF}demand resouceusage {color} 2019-11-08 14:40:00,440 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: app application_1573179594570_0001 demand resouceusage 2019-11-08 14:40:00,440 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: app application_1573179594570_0001 pending demand {color:#FF}schedulerKey []{color} 2019-11-08 14:40:00,442 FATAL org.apache.hadoop.yarn.event.EventDispatcher: Error in handling event type NODE_UPDATE to the Event Dispatcher java.util.NoSuchElementException at java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2053) at java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:372) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:934) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1359) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:346) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:207) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1034) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:902) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1119) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:129) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) at java.lang.Thread.run(Thread.java:748) 2019-11-08 14:40:00,443 INFO org.apache.hadoop.yarn.event.EventDispatcher: Exiting, bbye.. {quote} was (Author: luxianghao): IMO, the root cause of the following case in YARN-7382 is [app.getPendingDemand|#L373]], which we rely on to get apps with pending resource. we know demand = pending + usage, when map get to 100%, usage become 0 after map container completed, but demand not be updated immediately, so the scheduler mistakenly think of there are pending resouces request. {code:java} While running an MR job (e.g. sleep) and an RM failover occurs, once the maps gets to 100%, the now active RM will crash {code} Here is my test log {quote}2019-11-08 14:39:58,640 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: lxh debug app application_1573179594570_0001 demand resouceusage 2019-11-08 14:39:59,643 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: lxh debug app application_1573179594570_0001{color:#FF} demand resouceusage {color} 2019-11-08 14:40:00,439 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e2053_1573179594570_0001_01_03 Container Transitioned
[jira] [Updated] (YARN-9957) The first container we recover may not be the AM
[ https://issues.apache.org/jira/browse/YARN-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianghao Lu updated YARN-9957: -- Attachment: YARN-9957-branch-2.9.1.002.patch > The first container we recover may not be the AM > > > Key: YARN-9957 > URL: https://issues.apache.org/jira/browse/YARN-9957 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.9.1 >Reporter: Xianghao Lu >Assignee: Xianghao Lu >Priority: Major > Fix For: 2.9.1 > > Attachments: 1.jpg, 2.jpg, YARN-9957-branch-2.9.1.001.patch, > YARN-9957-branch-2.9.1.002.patch > > > YARN-7382 says that if not running unmanaged, the first container we recover > is always the AM, however, the actual situation is not like this, this can > lead to a wrong am resource usage after rm recover. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9957) The first container we recover may not be the AM
[ https://issues.apache.org/jira/browse/YARN-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969940#comment-16969940 ] Xianghao Lu commented on YARN-9957: --- IMO, the root cause of the following case in YARN-7382 is [app.getPendingDemand|#L373]], which we rely on to get apps with pending resource. we know demand = pending + usage, when map get to 100%, usage become 0 after map container completed, but demand not be updated immediately, so the scheduler mistakenly think of there are pending resouces request. {code:java} While running an MR job (e.g. sleep) and an RM failover occurs, once the maps gets to 100%, the now active RM will crash {code} Here is my test log {quote}2019-11-08 14:39:58,640 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: lxh debug app application_1573179594570_0001 demand resouceusage 2019-11-08 14:39:59,643 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: lxh debug app application_1573179594570_0001{color:#FF} demand resouceusage {color} 2019-11-08 14:40:00,439 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e2053_1573179594570_0001_01_03 Container Transitioned {color:#FF}from RUNNING to COMPLETED{color} 2019-11-08 14:40:00,439 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1573179594570_0001 CONTAINERID=container_e2053_1573179594570_0001_01_03 RESOURCE= 2019-11-08 14:40:00,440 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: app application_1573179594570_0001 {color:#FF}demand resouceusage {color} 2019-11-08 14:40:00,440 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt: app application_1573179594570_0001 demand resouceusage 2019-11-08 14:40:00,440 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue: app application_1573179594570_0001 pending demand {color:#FF}schedulerKey []{color} 2019-11-08 14:40:00,442 FATAL org.apache.hadoop.yarn.event.EventDispatcher: Error in handling event type NODE_UPDATE to the Event Dispatcher java.util.NoSuchElementException at java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2053) at java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:372) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:934) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1359) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:346) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:207) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1034) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:902) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1119) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:129) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) at java.lang.Thread.run(Thread.java:748) 2019-11-08 14:40:00,443 INFO org.apache.hadoop.yarn.event.EventDispatcher: Exiting, bbye.. {quote} > The first container we recover may not be the AM > > > Key: YARN-9957 > URL: https://issues.apache.org/jira/browse/YARN-9957 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.9.1 >Reporter: Xianghao Lu >Assignee: Xianghao Lu >Priority: Major > Fix For: 2.9.1 > > Attachments: 1.jpg, 2.jpg, YARN-9957-branch-2.9.1.001.patch > > > YARN-7382 says that if not running unmanaged, the first container we recover > is always the AM, however, the actual situation is not like this, this can > lead to a wrong am resource usage after rm recover. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9865) Capacity scheduler: add support for combined %user + %secondary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969938#comment-16969938 ] Szilard Nemeth commented on YARN-9865: -- Hi [~maniraj...@gmail.com]! Can you ask someone else please? I'm busy with other things for the upcoming days. > Capacity scheduler: add support for combined %user + %secondary_group mapping > - > > Key: YARN-9865 > URL: https://issues.apache.org/jira/browse/YARN-9865 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9865-005.patch, YARN-9865.001.patch, > YARN-9865.002.patch, YARN-9865.003.patch, YARN-9865.004.patch > > > Similiar to YARN-9841, but for secondary group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969933#comment-16969933 ] zhoukang commented on YARN-9537: new patch added [~yufeigu] > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0, 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9537-002.patch, YARN-9537.001.patch, > YARN-9537.003.patch, YARN-9537.004.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-9537: --- Attachment: YARN-9537.004.patch > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0, 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9537-002.patch, YARN-9537.001.patch, > YARN-9537.003.patch, YARN-9537.004.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org