[jira] [Created] (YARN-5665) Documentation does not mention package name requirement for yarn.resourcemanager.scheduler.class
Miklos Szegedi created YARN-5665: Summary: Documentation does not mention package name requirement for yarn.resourcemanager.scheduler.class Key: YARN-5665 URL: https://issues.apache.org/jira/browse/YARN-5665 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 3.0.0-alpha1 Reporter: Miklos Szegedi Priority: Trivial http://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-project-dist/hadoop-common/ClusterSetup.html refers to FairScheduler, when it documents the setting yarn.resourcemanager.scheduler.class. What it forgets to mention is that the user has to specify the full class path like org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler otherwise the system throws java.lang.ClassNotFoundException: FairScheduler. It would be nice, if the documentation specified the full class path, so that the user does not need to look it up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5686) DefaultContainerExecutor random working dir algorigthm skews results
Miklos Szegedi created YARN-5686: Summary: DefaultContainerExecutor random working dir algorigthm skews results Key: YARN-5686 URL: https://issues.apache.org/jira/browse/YARN-5686 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Priority: Minor {code} long randomPosition = RandomUtils.nextLong() % totalAvailable; ... while (randomPosition > availableOnDisk[dir]) { randomPosition -= availableOnDisk[dir++]; } {code} The code above selects a disk based on the random number weighted by the free space on each disk respectively. For example, if I have two disks with 100 bytes each, totalAvailable is 200. The value of randomPosition will be 0..199. 0..99 should select the first disk, 100..199 should select the second disk inclusively. Random number 100 should select the second disk to be fair but this is not the case right now. We need to use {code} while (randomPosition >= availableOnDisk[dir]) {code} instead of {code} while (randomPosition > availableOnDisk[dir]) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5725) Test uncaught exception in TestContainersMonitorResourceChange.testContainersResourceChange when setting IP and host
Miklos Szegedi created YARN-5725: Summary: Test uncaught exception in TestContainersMonitorResourceChange.testContainersResourceChange when setting IP and host Key: YARN-5725 URL: https://issues.apache.org/jira/browse/YARN-5725 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Miklos Szegedi Priority: Minor The issue is a warning but it prevents container monitor to continue 2016-10-12 14:38:23,280 WARN [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(594)) - Uncaught exception in ContainersMonitorImpl while monitoring resource of container_123456_0001_01_01 java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:455) 2016-10-12 14:38:23,281 WARN [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(613)) - org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5726) Test exception in TestContainersMonitorResourceChange.testContainersResourceChange when trying to get NMTimelinePublisher
Miklos Szegedi created YARN-5726: Summary: Test exception in TestContainersMonitorResourceChange.testContainersResourceChange when trying to get NMTimelinePublisher Key: YARN-5726 URL: https://issues.apache.org/jira/browse/YARN-5726 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Miklos Szegedi Priority: Trivial 2016-10-12 14:38:39,970 WARN [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(594)) - Uncaught exception in ContainersMonitorImpl while monitoring resource of container_123456_0001_01_01 java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:587) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5736) YARN container executor config does not handle white space
Miklos Szegedi created YARN-5736: Summary: YARN container executor config does not handle white space Key: YARN-5736 URL: https://issues.apache.org/jira/browse/YARN-5736 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Miklos Szegedi Priority: Trivial The container executor configuration reader does not handle white spaces or malformed key value pairs in the config file correctly or gracefully as an example the following key value line which is part of the configuration (note the << is used as a marker to show the extra trailing space): yarn.nodemanager.linux-container-executor.group=yarn << is a valid line but when you run the check over the file: [root@test]#./container-executor --checksetup Can't get group information for yarn - Success. [root@test]# It fails to find the yarn group but it really tries to find the "yarn " group which fails. There is no trimming anywhere while processing the lines. If a space would be added in before or after the = sign a failure would also occur. Minor nit is the fact that a failure still is logged as a Success -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5757) RM Cluster Node API documentation is not up to date
Miklos Szegedi created YARN-5757: Summary: RM Cluster Node API documentation is not up to date Key: YARN-5757 URL: https://issues.apache.org/jira/browse/YARN-5757 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Miklos Szegedi Priority: Trivial For an example please refer to this field that does not exist since YARN-686: healthStatusstring The health status of the node - Healthy or Unhealthy -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5776) Checkstyle: MonitioringThread.Run method length is too long
Miklos Szegedi created YARN-5776: Summary: Checkstyle: MonitioringThread.Run method length is too long Key: YARN-5776 URL: https://issues.apache.org/jira/browse/YARN-5776 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Miklos Szegedi Priority: Trivial YARN-5725 had a check style violation that should be resolved by refactoring the function Details: ContainersMonitorImpl.java:395 MonitioringThread.Run @Override:5: Method length is 233 lines (max allowed is 150). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5834) TestNodeStatusUpdater.testNMRMConnectionConf compares nodemanager wait time to the incorrect value
Miklos Szegedi created YARN-5834: Summary: TestNodeStatusUpdater.testNMRMConnectionConf compares nodemanager wait time to the incorrect value Key: YARN-5834 URL: https://issues.apache.org/jira/browse/YARN-5834 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Priority: Minor The function is TestNodeStatusUpdater#testNMRMConnectionConf() I believe the connectionWaitMs references below were meant to be nmRmConnectionWaitMs. {code} conf.setLong(YarnConfiguration.NM_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, nmRmConnectionWaitMs); conf.setLong(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, connectionWaitMs); ... long t = System.currentTimeMillis(); long duration = t - waitStartTime; boolean waitTimeValid = (duration >= nmRmConnectionWaitMs) && (duration < (*connectionWaitMs* + delta)); if(!waitTimeValid) { // throw exception if NM doesn't retry long enough throw new Exception("NM should have tried re-connecting to RM during " + "period of at least " + *connectionWaitMs* + " ms, but " + "stopped retrying within " + (*connectionWaitMs* + delta) + " ms: " + e, e); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5849) Automatically create YARN control group for pre-mounted cgroups
Miklos Szegedi created YARN-5849: Summary: Automatically create YARN control group for pre-mounted cgroups Key: YARN-5849 URL: https://issues.apache.org/jira/browse/YARN-5849 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.0.0-alpha1, 2.7.3, 3.0.0-alpha2 Reporter: Miklos Szegedi Assignee: Miklos Szegedi Priority: Minor Yarn can be launched with linux-container-executor.cgroups.mount set to false. It will search for the cgroup mount paths set up by the administrator parsing the /etc/mtab file. You can also specify resource.percentage-physical-cpu-limit to limit the CPU resources assigned to containers. linux-container-executor.cgroups.hierarchy is the root of the settings of all YARN containers. If this is specified but not created YARN will fail at startup: Caused by: java.io.FileNotFoundException: /cgroups/cpu/hadoop-yarn/cpu.cfs_period_us (Permission denied) org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.updateCgroup(CgroupsLCEResourcesHandler.java:263) This JIRA is about automatically creating YARN control group in the case above. It reduces the cost of administration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5927) BaseContainerManagerTest::waitForNMContainerState timeout accounting is not accurate
Miklos Szegedi created YARN-5927: Summary: BaseContainerManagerTest::waitForNMContainerState timeout accounting is not accurate Key: YARN-5927 URL: https://issues.apache.org/jira/browse/YARN-5927 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Priority: Trivial See below that timeoutSecs is increased twice. We also do a sleep right away before even checking the observed value. {code} do { Thread.sleep(2000); ... timeoutSecs += 2; } while (!finalStates.contains(currentState) && timeoutSecs++ < timeOutMax); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5986) Adding YARN configuration entries to ContainerLaunchContext
Miklos Szegedi created YARN-5986: Summary: Adding YARN configuration entries to ContainerLaunchContext Key: YARN-5986 URL: https://issues.apache.org/jira/browse/YARN-5986 Project: Hadoop YARN Issue Type: Improvement Reporter: Miklos Szegedi Assignee: Miklos Szegedi Currently ContainerLaunchContext is defined as message ContainerLaunchContextProto { repeated StringLocalResourceMapProto localResources = 1; optional bytes tokens = 2; repeated StringBytesMapProto service_data = 3; repeated StringStringMapProto environment = 4; repeated string command = 5; repeated ApplicationACLMapProto application_ACLs = 6; optional ContainerRetryContextProto container_retry_context = 7; } It would be nice to have an additional parameter "configuration" to support cases like YARN-5600, where we want to pass a parameter to Yarn and not the application or container. repeated StringStringMapProto configuration = 8; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5987) NM configured command to collect heap dump of preemted container
Miklos Szegedi created YARN-5987: Summary: NM configured command to collect heap dump of preemted container Key: YARN-5987 URL: https://issues.apache.org/jira/browse/YARN-5987 Project: Hadoop YARN Issue Type: Improvement Reporter: Miklos Szegedi Assignee: Miklos Szegedi The node manager can kill a container, if it exceeds the assigned memory limits. It would be nice to have a configuration entry to set up a command that can collect additional debug information, if needed. The collected information can be used for root cause analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6060) Linux container executor fails to run container on directories mounted as noexec
Miklos Szegedi created YARN-6060: Summary: Linux container executor fails to run container on directories mounted as noexec Key: YARN-6060 URL: https://issues.apache.org/jira/browse/YARN-6060 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, yarn Reporter: Miklos Szegedi Assignee: Miklos Szegedi If node manager directories are mounted as noexec, LCE fails with the following error: Launching container... Couldn't execute the container launch file /tmp/hadoop-/nm-local-dir/usercache//appcache/application_1483656052575_0001/container_1483656052575_0001_02_01/launch_container.sh - Permission denied -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-6060) Linux container executor fails to run container on directories mounted as noexec
[ https://issues.apache.org/jira/browse/YARN-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi resolved YARN-6060. -- Resolution: Won't Fix Assignee: (was: Miklos Szegedi) > Linux container executor fails to run container on directories mounted as > noexec > > > Key: YARN-6060 > URL: https://issues.apache.org/jira/browse/YARN-6060 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, yarn >Reporter: Miklos Szegedi > Attachments: YARN-6060.000.patch, YARN-6060.001.patch > > > If node manager directories are mounted as noexec, LCE fails with the > following error: > Launching container... > Couldn't execute the container launch file > /tmp/hadoop-/nm-local-dir/usercache//appcache/application_1483656052575_0001/container_1483656052575_0001_02_01/launch_container.sh > - Permission denied -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6077) /bin/bash path is hardcoded in node manager
Miklos Szegedi created YARN-6077: Summary: /bin/bash path is hardcoded in node manager Key: YARN-6077 URL: https://issues.apache.org/jira/browse/YARN-6077 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi There should be a configuration similar to MRJobConfig.MAPRED_ADMIN_USER_SHELL -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6144) Fair share calculation error
Miklos Szegedi created YARN-6144: Summary: Fair share calculation error Key: YARN-6144 URL: https://issues.apache.org/jira/browse/YARN-6144 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler, resourcemanager Affects Versions: 3.0.0-alpha2 Reporter: Miklos Szegedi Assignee: Miklos Szegedi Priority: Blocker {{preemptContainers()}} calls {{trackContainerForPreemption()}} to collect the list of containers and resources that were preempted for an application. Later the list is reduced when {{containerCompleted()}} calls {{untrackContainerForPreemption()}}. The bug is that the resource variable {{preemptedResources}} is subtracted, not just when the container was preempted but also when it has completed successfully. This causes that we return an incorrect value in {{getResourceUsage()}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6158) FairScheduler: app usage can go to negative
Miklos Szegedi created YARN-6158: Summary: FairScheduler: app usage can go to negative Key: YARN-6158 URL: https://issues.apache.org/jira/browse/YARN-6158 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler, resourcemanager Affects Versions: 3.0.0-alpha2 Reporter: Miklos Szegedi Assignee: Miklos Szegedi FiCaSchedulerApp.containerCompleted checks, if the container being completed is in the active list: {code} // Remove from the list of containers if (null == liveContainers.remove(containerId)) { return false; } {code} Fair scheduler should do the same, otherwise multiple different container close events leave the application with negative resource usage in {{queue.getMetrics().releaseResources}} and {{attemptResourceUsage.decUsed}} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6169) container-executor message on empty configuration file can be improved
Miklos Szegedi created YARN-6169: Summary: container-executor message on empty configuration file can be improved Key: YARN-6169 URL: https://issues.apache.org/jira/browse/YARN-6169 Project: Hadoop YARN Issue Type: Improvement Reporter: Miklos Szegedi Priority: Trivial If the configuration file is empty, we get the following error message: {{Invalid configuration provided in /root/etc/hadoop/container-executor.cfg}} This is does not provide enough details to figure out what is the issue at the first glance. We should use something like 'Empty configuration file provided...' {code} if (cfg->size == 0) { fprintf(ERRORFILE, "Invalid configuration provided in %s\n", file_name); exit(INVALID_CONFIG_FILE); } {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6171) ConcurrentModificationException in ApplicationMasterService.allocate
Miklos Szegedi created YARN-6171: Summary: ConcurrentModificationException in ApplicationMasterService.allocate Key: YARN-6171 URL: https://issues.apache.org/jira/browse/YARN-6171 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 3.0.0-alpha2 Reporter: Miklos Szegedi Assignee: Miklos Szegedi I have noticed an exception that closes the Application Master occasionally with Fair scheduler. {code} Caused by: org.apache.hadoop.ipc.RemoteException(java.util.ConcurrentModificationException): java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922) at java.util.HashMap$KeyIterator.next(HashMap.java:956) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:1005) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:532) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6218) TestAMRMClient fails with fair scheduler
Miklos Szegedi created YARN-6218: Summary: TestAMRMClient fails with fair scheduler Key: YARN-6218 URL: https://issues.apache.org/jira/browse/YARN-6218 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Miklos Szegedi Priority: Minor We ran into this issue on v2. Allocation does not happen in the specified amount of time. Error Message expected:<2> but was:<0> Stacktrace java.lang.AssertionError: expected:<2> but was:<0> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientMatchStorage(TestAMRMClient.java:495) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6230) Failing unit test TestFairScheduler.testMoveWouldViolateMaxResourcesConstraints
Miklos Szegedi created YARN-6230: Summary: Failing unit test TestFairScheduler.testMoveWouldViolateMaxResourcesConstraints Key: YARN-6230 URL: https://issues.apache.org/jira/browse/YARN-6230 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Priority: Minor I have run into this in one of the job runs: {code} Tests run: 92, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 14.683 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler testMoveWouldViolateMaxResourcesConstraints(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) Time elapsed: 0.115 sec <<< ERROR! java.lang.Exception: Unexpected exception, expected but was at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testMoveWouldViolateMaxResourcesConstraints(TestFairScheduler.java:4533) {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6302) Fail the node, if Linux Container Executor is not configured properly
Miklos Szegedi created YARN-6302: Summary: Fail the node, if Linux Container Executor is not configured properly Key: YARN-6302 URL: https://issues.apache.org/jira/browse/YARN-6302 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Miklos Szegedi Priority: Minor We have a cluster that has one node with misconfigured Linux Container Executor. Every time an AM or regular container is launched on the cluster, it will fail. The node will still have resources available, so it keeps failing apps until the administrator notices the issue and decommissions the node. AM Blacklisting only helps, if the application is already running. As a possible improvement, when the LCE is used on the cluster and a NM gets certain errors back from the LCE, like error 24 configuration not found, we should not try to allocate anything on the node anymore or shut down the node entirely. That kind of problem normally does not fix itself and it means that nothing can really run on that node. {code} Application application_1488920587909_0010 failed 2 times due to AM Container for appattempt_1488920587909_0010_02 exited with exitCode: -1000 Failing this attempt.Diagnostics: Application application_1488920587909_0010 initialization failed (exitCode=24) with output: For more detailed output, check the application tracking page: http://node-1.domain.com:8088/cluster/app/application_1488920587909_0010 Then click on links to logs of each attempt. . Failing the application. {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6361) FSLeafQueue.fetchAppsWithDemand CPU usage is high with big queues
Miklos Szegedi created YARN-6361: Summary: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big queues Key: YARN-6361 URL: https://issues.apache.org/jira/browse/YARN-6361 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Priority: Minor FSLeafQueue.fetchAppsWithDemand sorts the applications by the current policy. Most of the time is spent in FairShareComparator.compare. We could improve this by doing the calculations outside the sort loop (O(n)) and we sorted by a fixed number inside instead O(n*log(n)). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6368) Decommissioning an NM results in a -1 exit code
Miklos Szegedi created YARN-6368: Summary: Decommissioning an NM results in a -1 exit code Key: YARN-6368 URL: https://issues.apache.org/jira/browse/YARN-6368 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Miklos Szegedi Priority: Minor In NodeManager.java we should exit normally in case the RM shuts down the node: {code} } finally { if (shouldExitOnShutdownEvent && !ShutdownHookManager.get().isShutdownInProgress()) { ExitUtil.terminate(-1); } } {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6412) aux-services classpath not documented
Miklos Szegedi created YARN-6412: Summary: aux-services classpath not documented Key: YARN-6412 URL: https://issues.apache.org/jira/browse/YARN-6412 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Priority: Minor YARN-4577 introduced two new configuration entries yarn.nodemanager.aux-services.%s.classpath and yarn.nodemanager.aux-services.%s.system-classes. These are not documented in hadoop-yarn-common/.../yarn-default.xml -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6416) SIGNAL_CMD argument number is wrong
Miklos Szegedi created YARN-6416: Summary: SIGNAL_CMD argument number is wrong Key: YARN-6416 URL: https://issues.apache.org/jira/browse/YARN-6416 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Priority: Minor Yarn application signal command has two arguments, so the number below should be 2 I think. {code} opts.getOption(SIGNAL_CMD).setArgs(3); {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6417) TestContainerManager.testContainerLaunchAndStop disabled
Miklos Szegedi created YARN-6417: Summary: TestContainerManager.testContainerLaunchAndStop disabled Key: YARN-6417 URL: https://issues.apache.org/jira/browse/YARN-6417 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Miklos Szegedi Priority: Minor TestContainerManager.testContainerLaunchAndStop was disabled in YARN-1897 but it is passing. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6432) FS preemption should reserve a node before considering containers on it for preemption
Miklos Szegedi created YARN-6432: Summary: FS preemption should reserve a node before considering containers on it for preemption Key: YARN-6432 URL: https://issues.apache.org/jira/browse/YARN-6432 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Miklos Szegedi -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6433) Only accessible cgroup mount directories should be selected for a controller
Miklos Szegedi created YARN-6433: Summary: Only accessible cgroup mount directories should be selected for a controller Key: YARN-6433 URL: https://issues.apache.org/jira/browse/YARN-6433 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0-alpha3 Reporter: Miklos Szegedi Assignee: Miklos Szegedi I have a Ubuntu16 box that returns the following error with pre-mounted cgroups on the latest trunk: {code} 2017-04-03 19:42:18,511 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl: Cgroups not accessible /run/lxcfs/controllers/cpu,cpuacct {code} The version is: {code} $ uname -a Linux mybox 4.4.0-24-generic #43-Ubuntu SMP Wed Jun 8 19:27:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux {code} The following cpu cgroup filesystems are mounted: {code} $ grep cpuacct /etc/mtab cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct,nsroot=/ 0 0 cpu,cpuacct /run/lxcfs/controllers/cpu,cpuacct cgroup rw,relatime,cpu,cpuacct,nsroot=/ 0 0 {code} /sys/fs/cgroup is accessible to my yarn user, so it should be selected instead of /run/lxcfs/controllers -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6438) Code can be improved in ContainersMonitorImpl.java
Miklos Szegedi created YARN-6438: Summary: Code can be improved in ContainersMonitorImpl.java Key: YARN-6438 URL: https://issues.apache.org/jira/browse/YARN-6438 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Miklos Szegedi Priority: Minor I noticed two code blocks that can be improved in ContainersMonitorImpl.java. cpuUsagePercentPerCoreByAllContainers and cpuUsageTotalCoresByAllContainers track the same value and CHANGE_MONITORING_CONTAINER_RESOURCE is checked twice along with two calls to changeContainerResource. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6442) Inaccurate javadoc in NodeManagerHardwareUtils.getContainerMemoryMB
Miklos Szegedi created YARN-6442: Summary: Inaccurate javadoc in NodeManagerHardwareUtils.getContainerMemoryMB Key: YARN-6442 URL: https://issues.apache.org/jira/browse/YARN-6442 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Priority: Minor NodeManagerHardwareUtils.getContainerMemoryMB has the following javadoc: {code} "If the OS has a * ResourceCalculatorPlugin implemented, the calculation is 0.8 * (RAM - 2 * * JVM-memory) i.e. use 80% of the memory after accounting for memory used by * the DataNode and the NodeManager. If the number is less than 1GB, log a * warning message." {code} I think the accurate expression is 0.8*(RAM-2*JVM)-systemreserved. I also do not see the 1GB cap in the code. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6456) Isolation of Docker containers In LinuxContainerExecutor
Miklos Szegedi created YARN-6456: Summary: Isolation of Docker containers In LinuxContainerExecutor Key: YARN-6456 URL: https://issues.apache.org/jira/browse/YARN-6456 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Miklos Szegedi One reason to use Docker containers is to be able to isolate different workloads, even, if they run as the same user. I have noticed some issues in the current design: 1. DockerLinuxContainerRuntime mounts containerLocalDirs {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see and modify the files of another container. I think the application file cache directory should be enough for the container to run in most of the cases. 2. The whole cgroups directory is mounted. Would the container directory be enough? 3. There is no way to enforce exclusive use of Docker for all containers. There should be an option that it is not the user but the admin that requires to use Docker. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6472) Possible Java sandbox improvements
Miklos Szegedi created YARN-6472: Summary: Possible Java sandbox improvements Key: YARN-6472 URL: https://issues.apache.org/jira/browse/YARN-6472 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Greg Phillips I set the sandbox to enforcing mode. Unfortunately I was able to break out of the sandbox running native code with the following command: {code} cmd = "$JAVA_HOME/bin/java %s -Xmx825955249 org.apache.hadoop.yarn.applications.helloworld.HelloWorld `touch ../../helloworld`" + \ " 1>/AppMaster.stdout 2>/AppMaster.stderr" $ ls .../nm-local-dir/usercache/root/appcache/ helloworld {code} Also, if I am not using sandboxes, could we create the nm-sandbox-policies directory (empty) lazily? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6474) CGroupsHandlerImpl.java has a few checkstyle issues left to be fixed after YARN-5301
Miklos Szegedi created YARN-6474: Summary: CGroupsHandlerImpl.java has a few checkstyle issues left to be fixed after YARN-5301 Key: YARN-6474 URL: https://issues.apache.org/jira/browse/YARN-6474 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Priority: Minor The main issue is throw inside finally -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6475) Fix some long function checkstyle issues
Miklos Szegedi created YARN-6475: Summary: Fix some long function checkstyle issues Key: YARN-6475 URL: https://issues.apache.org/jira/browse/YARN-6475 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Priority: Trivial I am talking about these two: {code} ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java:441: @Override:3: Method length is 176 lines (max allowed is 150). [MethodLength] ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java:159: @Override:3: Method length is 158 lines (max allowed is 150). [MethodLength] {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6500) Do not mount inaccessible cgroups directories in CgroupsLCEResourcesHandler
Miklos Szegedi created YARN-6500: Summary: Do not mount inaccessible cgroups directories in CgroupsLCEResourcesHandler Key: YARN-6500 URL: https://issues.apache.org/jira/browse/YARN-6500 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Miklos Szegedi Port YARN-6433 findControllerInMtab change from CGroupsHandlerImpl to CgroupsLCEResourcesHandler -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6525) Linux container executor should not propagate application errors
Miklos Szegedi created YARN-6525: Summary: Linux container executor should not propagate application errors Key: YARN-6525 URL: https://issues.apache.org/jira/browse/YARN-6525 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0-alpha2 Reporter: Miklos Szegedi wait_and_get_exit_code currently returns the application error code as LCE error code. This may overlap with LCE errors. Instead LCE should return a fixed application failed error code. I should print the application error into the logs. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6732) Could not find artifact org.apache.hadoop:hadoop-azure-datalake:jar
Miklos Szegedi created YARN-6732: Summary: Could not find artifact org.apache.hadoop:hadoop-azure-datalake:jar Key: YARN-6732 URL: https://issues.apache.org/jira/browse/YARN-6732 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi I get the following build error when resolving dependencies: {code} [INFO] BUILD FAILURE [INFO] [INFO] Total time: 04:03 min [INFO] Finished at: 2017-06-22T18:34:55+00:00 [INFO] Final Memory: 69M/167M [INFO] [ERROR] Failed to execute goal on project hadoop-tools-dist: Could not resolve dependencies for project org.apache.hadoop:hadoop-tools-dist:jar:2.9.0-SNAPSHOT: Could not find artifact org.apache.hadoop:hadoop-azure-datalake:jar:2.9.0-SNAPSHOT in apache.snapshots.https (https://repository.apache.org/content/repositories/snapshots) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hadoop-tools-dist The command '/bin/sh -c mvn dependency:resolve' returned a non-zero code: 1 {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6757) Refactor the setting yarn.nodemanager.linux-container-executor.cgroups.mount-path
Miklos Szegedi created YARN-6757: Summary: Refactor the setting yarn.nodemanager.linux-container-executor.cgroups.mount-path Key: YARN-6757 URL: https://issues.apache.org/jira/browse/YARN-6757 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0-alpha4 Reporter: Miklos Szegedi Assignee: Miklos Szegedi Priority: Minor We should add the ability to specify a custom cgroup path. This is how the documentation of inux-container-executor.cgroups.mount-path would look like: {code} Requested cgroup mount path. Yarn has built in functionality to discover the system cgroup mount paths, so use this setting only, if the discovery does not work. This path must exist before the NodeManager is launched. The location can vary depending on the Linux distribution in use. Common locations include /sys/fs/cgroup and /cgroup. If cgroups are not mounted, set yarn.nodemanager.linux-container-executor.cgroups.mount to true. In this case it specifies, where the LCE should attempt to mount cgroups if not found. If cgroups is accessible through lxcfs or some other file system, then set this path and yarn.nodemanager.linux-container-executor.cgroups.mount to false. Yarn tries to use this path first, before any cgroup mount point discovery. If it cannot find this directory, it falls back to searching for cgroup mount points in the system. Only used when the LCE resources handler is set to the CgroupsLCEResourcesHandler {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6895) Preemption reservation may cause regular reservation leaks
Miklos Szegedi created YARN-6895: Summary: Preemption reservation may cause regular reservation leaks Key: YARN-6895 URL: https://issues.apache.org/jira/browse/YARN-6895 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 3.0.0-alpha4 Reporter: Miklos Szegedi Assignee: Miklos Szegedi Priority: Blocker We found a limitation in the implementation of YARN-6432. If the container released is smaller than the preemption request, a node reservation is created that is never deleted. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6913) Some cgroup settings are not documented in yarn-default.xml
Miklos Szegedi created YARN-6913: Summary: Some cgroup settings are not documented in yarn-default.xml Key: YARN-6913 URL: https://issues.apache.org/jira/browse/YARN-6913 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Miklos Szegedi Priority: Minor yarn.nodemanager.resource.memory.cgroups.swappiness yarn.nodemanager.resource.memory.cgroups.soft-limit-percentage yarn.nodemanager.resource.network.outbound-bandwidth-mbit yarn.nodemanager.resource.network.outbound-bandwidth-yarn-mbit -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6925) FSSchedulerNode could be simplified extracting preemption fields into a class
Miklos Szegedi created YARN-6925: Summary: FSSchedulerNode could be simplified extracting preemption fields into a class Key: YARN-6925 URL: https://issues.apache.org/jira/browse/YARN-6925 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Miklos Szegedi Assignee: Yufei Gu Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6926) FSSchedulerNode reservation conflict
Miklos Szegedi created YARN-6926: Summary: FSSchedulerNode reservation conflict Key: YARN-6926 URL: https://issues.apache.org/jira/browse/YARN-6926 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Miklos Szegedi Assignee: Yufei Gu FSSchedulerNode reserves space for preemptor apps, but other nodes may reserve normally, if there is not enough free space. This causes double accounting and reservation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6943) Update Yarn to YARN in documentation
Miklos Szegedi created YARN-6943: Summary: Update Yarn to YARN in documentation Key: YARN-6943 URL: https://issues.apache.org/jira/browse/YARN-6943 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Priority: Minor Based on the discussion with [~templedf] in YARN-6757 the official case of YARN is YARN, not Yarn, so we should update all the md files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6968) Hard coded reference to an absolute pathname in org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(Contai
Miklos Szegedi created YARN-6968: Summary: Hard coded reference to an absolute pathname in org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(ContainerRuntimeContext) Key: YARN-6968 URL: https://issues.apache.org/jira/browse/YARN-6968 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Miklos Szegedi Assignee: Miklos Szegedi This could be done after YARN-6757 is checked in. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6997) org.apache.hadoop.yarn.client.SCMAdmin wrong package name
Miklos Szegedi created YARN-6997: Summary: org.apache.hadoop.yarn.client.SCMAdmin wrong package name Key: YARN-6997 URL: https://issues.apache.org/jira/browse/YARN-6997 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Miklos Szegedi Priority: Critical It should be org.apache.hadoop.yarn.client.cli.SCMAdmin to follow the current naming standard. This may cause appcompat issues in the future. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6998) Dead code in SCMAdmin
Miklos Szegedi created YARN-6998: Summary: Dead code in SCMAdmin Key: YARN-6998 URL: https://issues.apache.org/jira/browse/YARN-6998 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Miklos Szegedi Priority: Minor printHelp is always called with "" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7001) If shared cache upload is terminated in the middle, the temp file will never be deleted
Miklos Szegedi created YARN-7001: Summary: If shared cache upload is terminated in the middle, the temp file will never be deleted Key: YARN-7001 URL: https://issues.apache.org/jira/browse/YARN-7001 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi There is a missing deleteTempFile(tempPath); {code} tempPath = new Path(directoryPath, getTemporaryFileName(actualPath)); if (!uploadFile(actualPath, tempPath)) { LOG.warn("Could not copy the file to the shared cache at " + tempPath); return false; } {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7009) TestNMClient.testNMClientNoCleanupOnStop is flaky by design
Miklos Szegedi created YARN-7009: Summary: TestNMClient.testNMClientNoCleanupOnStop is flaky by design Key: YARN-7009 URL: https://issues.apache.org/jira/browse/YARN-7009 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Miklos Szegedi The sleeps to wait for a transition to reinit and than back to running is not long enough, it can miss the reinit event. {code} java.lang.AssertionError: Exception is not expected: org.apache.hadoop.yarn.exceptions.YarnException: Cannot perform RE_INIT on [container_1502735389852_0001_01_01]. Current state is [REINITIALIZING, isReInitializing=true]. at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.preReInitializeOrLocalizeCheck(ContainerManagerImpl.java:1772) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.reInitializeContainer(ContainerManagerImpl.java:1697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.reInitializeContainer(ContainerManagerImpl.java:1668) at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.reInitializeContainer(ContainerManagementProtocolPBServiceImpl.java:214) at org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:237) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testReInitializeContainer(TestNMClient.java:567) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:405) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClientNoCleanupOnStop(TestNMClient.java:214) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Cannot perform RE_INIT on [container_1502735389852_0001_01_01]. Current state is [REINITIALIZING, isReInitializing=true]. at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.preReInitializeOrLocalizeCheck(ContainerManagerImpl.java:1772) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.reInitializeContainer(ContainerManagerImpl.java:1697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.reInitializeContainer(ContainerManagerImpl.java:1668) at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.reInitializeContainer(ContainerManagementProtocolPBServiceImpl.java:214) at org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:237) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccess
[jira] [Created] (YARN-7034) DefaultLinuxContainerRuntime and DockerLinuxContainerRuntime sends client environment variables to container-executor
Miklos Szegedi created YARN-7034: Summary: DefaultLinuxContainerRuntime and DockerLinuxContainerRuntime sends client environment variables to container-executor Key: YARN-7034 URL: https://issues.apache.org/jira/browse/YARN-7034 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Miklos Szegedi Priority: Critical This behavior is unnecessary since there is nothing that is used from the environment right now. One option is to whitelist these variables before passing them. Are there any known use cases for this to justify? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7064) Use cgroup to get container resource utilization
Miklos Szegedi created YARN-7064: Summary: Use cgroup to get container resource utilization Key: YARN-7064 URL: https://issues.apache.org/jira/browse/YARN-7064 Project: Hadoop YARN Issue Type: Improvement Reporter: Miklos Szegedi Assignee: Miklos Szegedi This is an addendum to YARN-6668. What happens is that that jira always wants to rebase patches against YARN-1011 instead of trunk. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7099) ResourceHandlerModule.parseConfiguredCGroupPath only works for privileged yarn users.
Miklos Szegedi created YARN-7099: Summary: ResourceHandlerModule.parseConfiguredCGroupPath only works for privileged yarn users. Key: YARN-7099 URL: https://issues.apache.org/jira/browse/YARN-7099 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Miklos Szegedi Assignee: Miklos Szegedi Priority: Minor canWrite is failing {code} if (candidate.isDirectory() && candidate.canWrite()) { pathSubsystemMappings.put(candidate.getAbsolutePath(), cgroupList); } else { LOG.warn("The following cgroup is not a directory or it is not" + " writable" + candidate.getAbsolutePath()); } {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7145) Identify potential flaky unit tests
Miklos Szegedi created YARN-7145: Summary: Identify potential flaky unit tests Key: YARN-7145 URL: https://issues.apache.org/jira/browse/YARN-7145 Project: Hadoop YARN Issue Type: Test Components: nodemanager, resourcemanager Reporter: Miklos Szegedi Assignee: Miklos Szegedi Priority: Minor I intend to add a 200 milliseconds sleep into AsyncDispatcher, and run the job to identify the tests that are potentially flaky. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7181) CPUTimeTracker.updateElapsedJiffies can report negative usage
Miklos Szegedi created YARN-7181: Summary: CPUTimeTracker.updateElapsedJiffies can report negative usage Key: YARN-7181 URL: https://issues.apache.org/jira/browse/YARN-7181 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Miklos Szegedi It happens, when the process exited and elapsedJiffies becomes 0 again. {code} public void updateElapsedJiffies(BigInteger elapsedJiffies, long newTime) { cumulativeCpuTime = elapsedJiffies.multiply(jiffyLengthInMillis); sampleTime = newTime; } {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7232) Consider /sys/fs/cgroup as the default CGroup mount path
Miklos Szegedi created YARN-7232: Summary: Consider /sys/fs/cgroup as the default CGroup mount path Key: YARN-7232 URL: https://issues.apache.org/jira/browse/YARN-7232 Project: Hadoop YARN Issue Type: Sub-task Reporter: Miklos Szegedi YARN-6968 fixed the findbugs issue due to the hard coded /sys/fs/cgroups mount path for Docker containers. It removed the default value on the other hand. This jira is a followup to make sure the admin does not have to set the value every time. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7233) Make the cgroup mount into Docker containers configurable
Miklos Szegedi created YARN-7233: Summary: Make the cgroup mount into Docker containers configurable Key: YARN-7233 URL: https://issues.apache.org/jira/browse/YARN-7233 Project: Hadoop YARN Issue Type: Sub-task Reporter: Miklos Szegedi Not all containers need this mount. There should be an option to opt for lxcfs. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7239) Possible launch/cleanup race condition in ContainersLauncher
Miklos Szegedi created YARN-7239: Summary: Possible launch/cleanup race condition in ContainersLauncher Key: YARN-7239 URL: https://issues.apache.org/jira/browse/YARN-7239 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi ContainersLauncher.handle() submits the launch job and then adds the job into the collection risking that the cleanup will miss it and return. This should be in reversed order in all 3 instances: {code} containerLauncher.submit(launch); running.put(containerId, launch); {code} The cleanup code the above code is racing with: {code} ContainerLaunch runningContainer = running.get(containerId); if (runningContainer == null) { // Container not launched. So nothing needs to be done. LOG.info("Container " + containerId + " not running, nothing to signal."); return; } ... {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7289) TestApplicationLifetimeMonitor.testApplicationLifetimeMonitor times out
Miklos Szegedi created YARN-7289: Summary: TestApplicationLifetimeMonitor.testApplicationLifetimeMonitor times out Key: YARN-7289 URL: https://issues.apache.org/jira/browse/YARN-7289 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Miklos Szegedi Assignee: Miklos Szegedi -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7293) ContainerLaunch.cleanupContainer may miss a starting node
Miklos Szegedi created YARN-7293: Summary: ContainerLaunch.cleanupContainer may miss a starting node Key: YARN-7293 URL: https://issues.apache.org/jira/browse/YARN-7293 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi The relevant part of YARN-7009 needs to be backported -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7294) TestSignalContainer#testSignalRequestDeliveryToNM fails intermittently with Fair scheduler
Miklos Szegedi created YARN-7294: Summary: TestSignalContainer#testSignalRequestDeliveryToNM fails intermittently with Fair scheduler Key: YARN-7294 URL: https://issues.apache.org/jira/browse/YARN-7294 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Miklos Szegedi This issue exists due to the fact that FS needs an update after allocation and more node updates for all the requests to be fulfilled. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-4943) Add support to collect actual resource usage from cgroups
[ https://issues.apache.org/jira/browse/YARN-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi resolved YARN-4943. -- Resolution: Duplicate > Add support to collect actual resource usage from cgroups > - > > Key: YARN-4943 > URL: https://issues.apache.org/jira/browse/YARN-4943 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Varun Vasudev > > We should add support to collect actual resource usage from Cgroups(if > they're enabled) - it's more accurate and it can give you more detailed > information. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7349) TestSignalContainer.testSignalRequestDeliveryToNM fails with attempt state is not correct
Miklos Szegedi created YARN-7349: Summary: TestSignalContainer.testSignalRequestDeliveryToNM fails with attempt state is not correct Key: YARN-7349 URL: https://issues.apache.org/jira/browse/YARN-7349 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi java.lang.AssertionError: Attempt state is not correct (timeout). Expected :ALLOCATED Actual :SCHEDULED at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:358) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:317) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:298) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:955) at org.apache.hadoop.yarn.server.resourcemanager.TestSignalContainer.testSignalRequestDeliveryToNM(TestSignalContainer.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7350) TestSignalContainer should check both FS and CS
Miklos Szegedi created YARN-7350: Summary: TestSignalContainer should check both FS and CS Key: YARN-7350 URL: https://issues.apache.org/jira/browse/YARN-7350 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7354) Fair scheduler should support application lifetime monitor
Miklos Szegedi created YARN-7354: Summary: Fair scheduler should support application lifetime monitor Key: YARN-7354 URL: https://issues.apache.org/jira/browse/YARN-7354 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi For details see the fair scheduler specific code in TestApplicationLifetimeMonitor.testApplicationLifetimeMonitor added by YARN-7289 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7387) org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer fails intermittently
Miklos Szegedi created YARN-7387: Summary: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer fails intermittently Key: YARN-7387 URL: https://issues.apache.org/jira/browse/YARN-7387 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi {code} Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 52.481 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer testDecreaseAfterIncreaseWithAllocationExpiration(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer) Time elapsed: 13.292 sec <<< FAILURE! java.lang.AssertionError: expected:<3072> but was:<4096> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer.testDecreaseAfterIncreaseWithAllocationExpiration(TestIncreaseAllocationExpirer.java:459) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7506) Overhaul the design of the Linux container-executor regarding Docker and future runtimes
Miklos Szegedi created YARN-7506: Summary: Overhaul the design of the Linux container-executor regarding Docker and future runtimes Key: YARN-7506 URL: https://issues.apache.org/jira/browse/YARN-7506 Project: Hadoop YARN Issue Type: Wish Components: nodemanager Reporter: Miklos Szegedi I raise this topic to discuss a potential improvement of the container executor tool in node manager. container-executor has two main purposes. It executes Linux *system calls not available from Java*, and it executes tasks *available to root that are not available to the yarn user*. Historically container-executor did both by doing impersonation. The yarn user is separated from root because it runs network services, so *the yarn user should be restricted* by design. Because of this it has it's own config file container-executor.cfg writable by root only that specifies what actions are allowed for the yarn user. However, the requirements have changed with Docker and that raises the following questions: 1. The Docker feature of YARN requires root permissions to *access the Docker socket* but it does not run any system calls, so could the Docker related code in container-executor be *refactored into a separate Java process ran as root*? Java would make the development much faster and more secure. 2. The Docker feature only needs the Docker unix socket. It is not a good idea to let the yarn user directly access the socket, since that would elevate its privileges to root. However, the Java tool running as root mentioned in the previous question could act as a *proxy on the Docker socket* operating directly on the Docker REST API *eliminating the need to use the Docker CLI*. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7553) TestShellBasedUnixGroupsMapping.testFiniteGroupResolutionTime flaky
Miklos Szegedi created YARN-7553: Summary: TestShellBasedUnixGroupsMapping.testFiniteGroupResolutionTime flaky Key: YARN-7553 URL: https://issues.apache.org/jira/browse/YARN-7553 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Miklos Szegedi {code} [ERROR] testFiniteGroupResolutionTime(org.apache.hadoop.security.TestShellBasedUnixGroupsMapping) Time elapsed: 61.975 s <<< FAILURE! java.lang.AssertionError: Expected the logs to carry a message about command timeout but was: 2017-11-22 00:10:57,523 WARN security.ShellBasedUnixGroupsMapping (ShellBasedUnixGroupsMapping.java:getUnixGroups(181)) - unable to return groups for user foobarnonexistinguser PartialGroupNameException The user name 'foobarnonexistinguser' is not found. at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.resolvePartialGroupNames(ShellBasedUnixGroupsMapping.java:275) at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:178) at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:97) at org.apache.hadoop.security.TestShellBasedUnixGroupsMapping.testFiniteGroupResolutionTime(TestShellBasedUnixGroupsMapping.java:278) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7554) TestCryptoStreamsWithOpensslAesCtrCryptoCodec fails on Debian 9
Miklos Szegedi created YARN-7554: Summary: TestCryptoStreamsWithOpensslAesCtrCryptoCodec fails on Debian 9 Key: YARN-7554 URL: https://issues.apache.org/jira/browse/YARN-7554 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Miklos Szegedi {code} [ERROR] org.apache.hadoop.crypto.TestCryptoStreamsWithOpensslAesCtrCryptoCodec Time elapsed: 0.478 s <<< FAILURE! java.lang.AssertionError: Unable to instantiate codec org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec, is the required version of OpenSSL installed? at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertNotNull(Assert.java:621) at org.apache.hadoop.crypto.TestCryptoStreamsWithOpensslAesCtrCryptoCodec.init(TestCryptoStreamsWithOpensslAesCtrCryptoCodec.java:43) {code} This happened due to the following openssl change: https://github.com/openssl/openssl/commit/ff4b7fafb315df5f8374e9b50c302460e068f188 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7577) Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart
Miklos Szegedi created YARN-7577: Summary: Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart Key: YARN-7577 URL: https://issues.apache.org/jira/browse/YARN-7577 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Miklos Szegedi This happens, if Fair Scheduler is the default. The test should run with both schedulers -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7712) Add ability to ignore timestamps in localized files
Miklos Szegedi created YARN-7712: Summary: Add ability to ignore timestamps in localized files Key: YARN-7712 URL: https://issues.apache.org/jira/browse/YARN-7712 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Miklos Szegedi Assignee: Miklos Szegedi YARN currently requires and checks the timestamp of localized files and fails, if the file on HDFS does not match to the one requested. This jira adds the ability to ignore the timestamp based on the request of the client. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7713) Add parallel copying of directories into
Miklos Szegedi created YARN-7713: Summary: Add parallel copying of directories into Key: YARN-7713 URL: https://issues.apache.org/jira/browse/YARN-7713 Project: Hadoop YARN Issue Type: Improvement Reporter: Miklos Szegedi Assignee: Miklos Szegedi YARN currently copies directories sequentially when localizing. This could be improved to do in parallel, since the source blocks are normally on different nodes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7734) YARN-5418 breaks TestContainerLogsPage.testContainerLogPageAccess
Miklos Szegedi created YARN-7734: Summary: YARN-5418 breaks TestContainerLogsPage.testContainerLogPageAccess Key: YARN-7734 URL: https://issues.apache.org/jira/browse/YARN-7734 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Xuan Gong It adds a call to LogAggregationFileControllerFactory where the context is not filled in with the configuration in the mock in the unit test. {code} [ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.492 s <<< FAILURE! - in org.apache.hadoop.yarn.server.nodemanager.webapp.TestContainerLogsPage [ERROR] testContainerLogPageAccess(org.apache.hadoop.yarn.server.nodemanager.webapp.TestContainerLogsPage) Time elapsed: 0.208 s <<< ERROR! java.lang.NullPointerException at org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileControllerFactory.(LogAggregationFileControllerFactory.java:68) at org.apache.hadoop.yarn.server.nodemanager.webapp.ContainerLogsPage$ContainersLogsBlock.(ContainerLogsPage.java:100) at org.apache.hadoop.yarn.server.nodemanager.webapp.TestContainerLogsPage.testContainerLogPageAccess(TestContainerLogsPage.java:268) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7758) Add an additional check to the validity of container and application ids passed to container-executor
Miklos Szegedi created YARN-7758: Summary: Add an additional check to the validity of container and application ids passed to container-executor Key: YARN-7758 URL: https://issues.apache.org/jira/browse/YARN-7758 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Yufei Gu I would make sure that they contain characters a-z 0-9 and _/ (underscore and dash) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7775) Unit test fail: Testing resolve_config_path in 2.7
Miklos Szegedi created YARN-7775: Summary: Unit test fail: Testing resolve_config_path in 2.7 Key: YARN-7775 URL: https://issues.apache.org/jira/browse/YARN-7775 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi I see this in the latest branch-2.7 running test-container-executor Testing resolve_config_path FAIL: failed to resolve config_name on an absolute path name: /bin/ls -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7803) TestZKFailoverController occasionally fails in trunk
Miklos Szegedi created YARN-7803: Summary: TestZKFailoverController occasionally fails in trunk Key: YARN-7803 URL: https://issues.apache.org/jira/browse/YARN-7803 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.1.0 Reporter: Miklos Szegedi [ERROR] testGracefulFailoverMultipleZKfcs(org.apache.hadoop.ha.TestZKFailoverController) Time elapsed: 70.35 s <<< ERROR! org.apache.hadoop.ha.ServiceFailedException: Unable to become active. Local node did not get an opportunity to do so from ZooKeeper, or the local node took too long to transition to active. at org.apache.hadoop.ha.ZKFailoverController.doGracefulFailover(ZKFailoverController.java:692) at org.apache.hadoop.ha.ZKFailoverController.access$400(ZKFailoverController.java:60) at org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:609) at org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:606) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965) at org.apache.hadoop.ha.ZKFailoverController.gracefulFailoverToYou(ZKFailoverController.java:606) at org.apache.hadoop.ha.ZKFCRpcServer.gracefulFailover(ZKFCRpcServer.java:94) at org.apache.hadoop.ha.TestZKFailoverController.testGracefulFailoverMultipleZKfcs(TestZKFailoverController.java:586) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:55) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:53) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7876) Workaround ZipInputStream limitation for YARN-2185
Miklos Szegedi created YARN-7876: Summary: Workaround ZipInputStream limitation for YARN-2185 Key: YARN-7876 URL: https://issues.apache.org/jira/browse/YARN-7876 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Miklos Szegedi YARN-2185 added the ability to localize jar files as a stream instead of copying to local disk and then extracting. ZipInputStream does not need the end of the file. Let's read it out. This helps with an additional TeeInputStream on the input. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8039) Clean up log dir configuration in TestLinuxContainerExecutorWithMocks.testStartLocalizer
Miklos Szegedi created YARN-8039: Summary: Clean up log dir configuration in TestLinuxContainerExecutorWithMocks.testStartLocalizer Key: YARN-8039 URL: https://issues.apache.org/jira/browse/YARN-8039 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Miklos Szegedi -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8090) Race conditions in FadvisedChunkedFile
Miklos Szegedi created YARN-8090: Summary: Race conditions in FadvisedChunkedFile Key: YARN-8090 URL: https://issues.apache.org/jira/browse/YARN-8090 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.1.0 Reporter: Miklos Szegedi Assignee: Miklos Szegedi {code:java} 11:04:33.605 AM WARNFadvisedChunkedFile Failed to manage OS cache for /var/run/100/yarn/nm/usercache/systest/appcache/application_1521665017379_0062/output/attempt_1521665017379_0062_m_012797_0/file.out EBADF: Bad file descriptor at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method) at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267) at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146) at org.apache.hadoop.mapred.FadvisedChunkedFile.close(FadvisedChunkedFile.java:76) at org.jboss.netty.handler.stream.ChunkedWriteHandler.closeInput(ChunkedWriteHandler.java:303) at org.jboss.netty.handler.stream.ChunkedWriteHandler.discard(ChunkedWriteHandler.java:163) at org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:192) at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:137) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelClosed(SimpleChannelUpstreamHandler.java:225) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.cleanup(ReplayingDecoder.java:570) at org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493) at org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371) at org.jboss.netty.handler.ssl.SslHandler.channelClosed(SslHandler.java:1667) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:375) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8244) ContainersLauncher.ContainerLaunch can throw ConcurrentModificationException
Miklos Szegedi created YARN-8244: Summary: ContainersLauncher.ContainerLaunch can throw ConcurrentModificationException Key: YARN-8244 URL: https://issues.apache.org/jira/browse/YARN-8244 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi {code:java} 2018-05-03 17:31:35,028 WARN [ContainersLauncher #1] launcher.ContainerLaunch (ContainerLaunch.java:call(329)) - Failed to launch container. java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437) at java.util.HashMap$EntryIterator.next(HashMap.java:1471) at java.util.HashMap$EntryIterator.next(HashMap.java:1469) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch$ShellScriptBuilder.orderEnvByDependencies(ContainerLaunch.java:1311) at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.writeLaunchEnv(ContainerExecutor.java:388) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:290) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:101) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8262) get_executable in container-executor should provide meaningful error codes
Miklos Szegedi created YARN-8262: Summary: get_executable in container-executor should provide meaningful error codes Key: YARN-8262 URL: https://issues.apache.org/jira/browse/YARN-8262 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Currently it calls exit(-1) that makes it difficult to debug without stderr. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-1014) Configure OOM Killer to kill OPPORTUNISTIC containers first
[ https://issues.apache.org/jira/browse/YARN-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi resolved YARN-1014. -- Resolution: Won't Fix > Configure OOM Killer to kill OPPORTUNISTIC containers first > --- > > Key: YARN-1014 > URL: https://issues.apache.org/jira/browse/YARN-1014 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.0.0-alpha1 >Reporter: Arun C Murthy >Priority: Major > Attachments: YARN-1014.00.patch, YARN-1014.01.patch, > YARN-1014.02.patch > > > YARN-2882 introduces the notion of OPPORTUNISTIC containers. These containers > should be killed first should the system run out of memory. > - > Previous description: > Once RM allocates 'speculative containers' we need to get LCE to schedule > them at lower priorities via cgroups. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8437) Build oom-listener on older versions
Miklos Szegedi created YARN-8437: Summary: Build oom-listener on older versions Key: YARN-8437 URL: https://issues.apache.org/jira/browse/YARN-8437 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Assignee: Miklos Szegedi oom-listener was introduced in YARN-4599. We have seen some build issues on centos6. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8452) FairScheduler.update can take long time if yarn.scheduler.fair.sizebasedweight is on
Miklos Szegedi created YARN-8452: Summary: FairScheduler.update can take long time if yarn.scheduler.fair.sizebasedweight is on Key: YARN-8452 URL: https://issues.apache.org/jira/browse/YARN-8452 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Miklos Szegedi Assignee: Szilard Nemeth Basically we recalculate the weight every time, even if the inputs did not change. This causes high cpu usage, if the cluster has lots of apps. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8470) Fair scheduler exception with SLS
Miklos Szegedi created YARN-8470: Summary: Fair scheduler exception with SLS Key: YARN-8470 URL: https://issues.apache.org/jira/browse/YARN-8470 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi I ran into the following exception with sls: 2018-06-26 13:34:04,358 ERROR resourcemanager.ResourceManager: Received RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, FSPreemptionThread, that exited unexpectedly: java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptOnNode(FSPreemptionThread.java:207) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptForOneContainer(FSPreemptionThread.java:161) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreempt(FSPreemptionThread.java:121) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.run(FSPreemptionThread.java:81) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10050) NodeManagerCGroupsMemory.md does not show up in the official documentation
Miklos Szegedi created YARN-10050: - Summary: NodeManagerCGroupsMemory.md does not show up in the official documentation Key: YARN-10050 URL: https://issues.apache.org/jira/browse/YARN-10050 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi I looked at this doc: [https://github.com/apache/hadoop/blob/9636fe4114eed9035cdc80108a026c657cd196d9/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerCGroupsMemory.md] It does not show up here: [https://hadoop.apache.org/docs/stable/] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org