[jira] [Created] (YARN-8707) It's not reasonable to decide whether app is starved by fairShare
Zhaohui Xin created YARN-8707: - Summary: It's not reasonable to decide whether app is starved by fairShare Key: YARN-8707 URL: https://issues.apache.org/jira/browse/YARN-8707 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 3.0.0-alpha3 Reporter: Zhaohui Xin Assignee: Zhaohui Xin When app's usage reached demand, it's still be considered fairShare starved. Obviously, that's not reasonable! {code:java} boolean isStarvedForFairShare() { return isUsageBelowShare(getResourceUsage(), getFairShare()); } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8704) Improve the error message for an invalid docker rw mount to be more informative
[ https://issues.apache.org/jira/browse/YARN-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved YARN-8704. --- Resolution: Invalid > Improve the error message for an invalid docker rw mount to be more > informative > --- > > Key: YARN-8704 > URL: https://issues.apache.org/jira/browse/YARN-8704 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: Weiwei Yang >Priority: Minor > > Seeing following error message while starting a privileged docker container > {noformat} > Error constructing docker command, docker error code=14, error > message='Invalid docker read-write mount' > {noformat} > it would be good if it tells us which mount is invalid and how to fix it. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Closed] (YARN-8704) Improve the error message for an invalid docker rw mount to be more informative
[ https://issues.apache.org/jira/browse/YARN-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang closed YARN-8704. - > Improve the error message for an invalid docker rw mount to be more > informative > --- > > Key: YARN-8704 > URL: https://issues.apache.org/jira/browse/YARN-8704 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: Weiwei Yang >Priority: Minor > > Seeing following error message while starting a privileged docker container > {noformat} > Error constructing docker command, docker error code=14, error > message='Invalid docker read-write mount' > {noformat} > it would be good if it tells us which mount is invalid and how to fix it. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8706) DelayedProcessKiller is executed for Docker containers even though docker stop sends a KILL signal after the specified grace period
Chandni Singh created YARN-8706: --- Summary: DelayedProcessKiller is executed for Docker containers even though docker stop sends a KILL signal after the specified grace period Key: YARN-8706 URL: https://issues.apache.org/jira/browse/YARN-8706 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chandni Singh Assignee: Chandni Singh {{DockerStopCommand}} adds a grace period of 10 seconds. 10 seconds is also the default grace time use by docker stop [https://docs.docker.com/engine/reference/commandline/stop/] Documentation of the docker stop: {quote}the main process inside the container will receive {{SIGTERM}}, and after a grace period, {{SIGKILL}}. {quote} There is a {{DelayedProcessKiller}} in {{ContainerExcecutor}} which executes for all containers after a delay when {{sleepDelayBeforeSigKill>0}}. By default this is set to {{250 milliseconds}} and so irrespective of the container type, it will get always get executed. For a docker container, {{docker stop}} takes care of sending a {{SIGKILL}} after the grace period, so having {{DelayedProcessKiller}} seems redundant. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8705) Refactor in preparation for YARN-8696
Botong Huang created YARN-8705: -- Summary: Refactor in preparation for YARN-8696 Key: YARN-8705 URL: https://issues.apache.org/jira/browse/YARN-8705 Project: Hadoop YARN Issue Type: Task Reporter: Botong Huang Assignee: Botong Huang Refactor the UAM heartbeat thread as well as call back method in preparation for YARN-8696 FederationInterceptor upgrade -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8704) Improve the error message for an invalid docker rw mount to be more informative
Weiwei Yang created YARN-8704: - Summary: Improve the error message for an invalid docker rw mount to be more informative Key: YARN-8704 URL: https://issues.apache.org/jira/browse/YARN-8704 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.2.0 Reporter: Weiwei Yang Seeing following error message while starting a privileged docker container {noformat} Error constructing docker command, docker error code=14, error message='Invalid docker read-write mount' {noformat} it would be good if it tells us which mount is invalid and how to fix it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/ [Aug 22, 2018 3:43:40 AM] (yqlin) HDFS-13821. RBF: Add dfs.federation.router.mount-table.cache.enable so [Aug 22, 2018 5:04:15 PM] (hanishakoneru) HDDS-265. Move numPendingDeletionBlocks and deleteTransactionId from [Aug 22, 2018 5:54:10 PM] (xyao) HDDS-350. ContainerMapping#flushContainerInfo doesn't set containerId. [Aug 22, 2018 9:48:22 PM] (aengineer) HDDS-342. Add example byteman script to print out hadoop rpc traffic. [Aug 23, 2018 1:55:14 AM] (aengineer) HDDS-356. Support ColumnFamily based RockDBStore and TableStore. -1 overall The following subsystems voted -1: asflicense findbugs pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine Unread field:FSBasedSubmarineStorageImpl.java:[line 39] Found reliance on default encoding in org.apache.hadoop.yarn.submarine.runtimes.yarnservice.YarnServiceJobSubmitter.generateCommandLaunchScript(RunJobParameters, TaskType, Component):in org.apache.hadoop.yarn.submarine.runtimes.yarnservice.YarnServiceJobSubmitter.generateCommandLaunchScript(RunJobParameters, TaskType, Component): new java.io.FileWriter(File) At YarnServiceJobSubmitter.java:[line 192] org.apache.hadoop.yarn.submarine.runtimes.yarnservice.YarnServiceJobSubmitter.generateCommandLaunchScript(RunJobParameters, TaskType, Component) may fail to clean up java.io.Writer on checked exception Obligation to clean up resource created at YarnServiceJobSubmitter.java:to clean up java.io.Writer on checked exception Obligation to clean up resource created at YarnServiceJobSubmitter.java:[line 192] is not discharged org.apache.hadoop.yarn.submarine.runtimes.yarnservice.YarnServiceUtils.getComponentArrayJson(String, int, String) concatenates strings using + in a loop At YarnServiceUtils.java:using + in a loop At YarnServiceUtils.java:[line 72] Failed CTEST tests : test_test_libhdfs_threaded_hdfs_static test_libhdfs_threaded_hdfspp_test_shim_static Failed junit tests : hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure hadoop.hdfs.server.balancer.TestBalancer hadoop.hdfs.web.TestWebHdfsTimeouts hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy hadoop.yarn.applications.distributedshell.TestDistributedShell hadoop.mapred.TestMRTimelineEventHandling cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/diff-compile-javac-root.txt [328K] checkstyle: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/diff-checkstyle-root.txt [17M] pathlen: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/pathlen.txt [12K] pylint: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/diff-patch-pylint.txt [24K] shellcheck: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/diff-patch-shellcheck.txt [20K] shelldocs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/diff-patch-shelldocs.txt [16K] whitespace: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/whitespace-eol.txt [9.4M] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/whitespace-tabs.txt [1.1M] xml: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/xml.txt [4.0K] findbugs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-submarine-warnings.html [12K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/branch-findbugs-hadoop-hdds_client.txt [68K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/branch-findbugs-hadoop-hdds_container-service.txt [60K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/branch-findbugs-hadoop-hdds_framework.txt [8.0K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/branch-findbugs-hadoop-hdds_server-scm.txt [60K] http
[jira] [Created] (YARN-8703) Localized resource may leak on disk if container is killed while localizing
Jason Lowe created YARN-8703: Summary: Localized resource may leak on disk if container is killed while localizing Key: YARN-8703 URL: https://issues.apache.org/jira/browse/YARN-8703 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Jason Lowe If a container is killed while localizing then it releases all of its resources. If the resource count goes to zero and it is in the DOWNLOADING state then the resource bookkeeping is removed in the resource tracker. Shortly afterwards the localizer could heartbeat in and report the successful localization of the resource that was just removed. When the LocalResourcesTrackerImpl receives the LOCALIZED event but does not find the corresponding LocalResource for the event then it simply logs a "localized without a location" warning. At that point I think the localized resource has been leaked on the disk since the NM has removed bookkeeping for the resource without removing it on disk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Windows/x64
For more details, see https://builds.apache.org/job/hadoop-trunk-win/567/ [Aug 22, 2018 11:18:55 PM] (aw) YETUS-660. checkstyle should report when it fails to execute [Aug 22, 2018 11:19:40 PM] (aw) YETUS-611. xml test should specfically say which files are broken [Aug 22, 2018 11:25:05 PM] (aw) YETUS-668. EOL 0.4.0 and 0.5.0 [Aug 22, 2018 5:04:15 PM] (hanishakoneru) HDDS-265. Move numPendingDeletionBlocks and deleteTransactionId from [Aug 22, 2018 5:54:10 PM] (xyao) HDDS-350. ContainerMapping#flushContainerInfo doesn't set containerId. [Aug 22, 2018 9:48:22 PM] (aengineer) HDDS-342. Add example byteman script to print out hadoop rpc traffic. [Aug 23, 2018 1:55:14 AM] (aengineer) HDDS-356. Support ColumnFamily based RockDBStore and TableStore. [Aug 23, 2018 4:35:43 AM] (sunilg) YARN-8015. Support all types of placement constraint support for ERROR: File 'out/email-report.txt' does not exist - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8702) TestContainerSchedulerQueuing.testKillOnlyRequiredOpportunisticContainers() failing randomly
Rakesh Shah created YARN-8702: - Summary: TestContainerSchedulerQueuing.testKillOnlyRequiredOpportunisticContainers() failing randomly Key: YARN-8702 URL: https://issues.apache.org/jira/browse/YARN-8702 Project: Hadoop YARN Issue Type: Bug Components: container-queuing Affects Versions: 3.1.1, 3.1.0, 2.8.3 Reporter: Rakesh Shah Fix For: 3.1.1 this ut fails because of the container status not getting correctly -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8691) AMRMClient unregisterApplicationMaster Api's appMessage should have a maximum size
[ https://issues.apache.org/jira/browse/YARN-8691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yicong Cai resolved YARN-8691. -- Resolution: Duplicate Fix Version/s: (was: 2.7.7) 3.0.0-alpha4 Target Version/s: (was: 2.7.7) > AMRMClient unregisterApplicationMaster Api's appMessage should have a maximum > size > -- > > Key: YARN-8691 > URL: https://issues.apache.org/jira/browse/YARN-8691 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.3 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Critical > Fix For: 3.0.0-alpha4 > > > SparkSQL AM Codegen ERROR,then call unregister AM API and send the error > message to RM, RM receive the AM state and update to RMStateStore. The > Codegen error message maybe is huge, (Our case is about 200MB). If the > RMStateStore is ZKRMStateStore, it causes the same exception as YARN-6125, > but YARN-6125 doesn't cover the unregisterApplicationMaster's message cut. > > SparkSQL Codegen error message show below: > 18/08/18 08:34:54 ERROR codegen.CodeGenerator: failed to compile: > org.codehaus.janino.JaninoRuntimeException: Constant pool has grown past JVM > limit of 0x > /* 001 */ public java.lang.Object generate(Object[] references) > { /* 002 */ return new SpecificSafeProjection(references); /* 003 */ } > /* 004 */ > /* 005 */ class SpecificSafeProjection extends > org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection { > .. > about 2 million lines. > .. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org