[jira] [Created] (YARN-8707) It's not reasonable to decide whether app is starved by fairShare

2018-08-23 Thread Zhaohui Xin (JIRA)
Zhaohui Xin created YARN-8707:
-

 Summary: It's not reasonable to decide whether app is starved by 
fairShare
 Key: YARN-8707
 URL: https://issues.apache.org/jira/browse/YARN-8707
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 3.0.0-alpha3
Reporter: Zhaohui Xin
Assignee: Zhaohui Xin


When app's usage reached demand, it's still be considered fairShare starved. 
Obviously, that's not reasonable!
{code:java}
boolean isStarvedForFairShare() {
return isUsageBelowShare(getResourceUsage(), getFairShare());
}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8704) Improve the error message for an invalid docker rw mount to be more informative

2018-08-23 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved YARN-8704.
---
Resolution: Invalid

> Improve the error message for an invalid docker rw mount to be more 
> informative
> ---
>
> Key: YARN-8704
> URL: https://issues.apache.org/jira/browse/YARN-8704
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Weiwei Yang
>Priority: Minor
>
> Seeing following error message while starting a privileged docker container
> {noformat}
> Error constructing docker command, docker error code=14, error 
> message='Invalid docker read-write mount'
> {noformat}
> it would be good if it tells us which mount is invalid and how to fix it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Closed] (YARN-8704) Improve the error message for an invalid docker rw mount to be more informative

2018-08-23 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang closed YARN-8704.
-

> Improve the error message for an invalid docker rw mount to be more 
> informative
> ---
>
> Key: YARN-8704
> URL: https://issues.apache.org/jira/browse/YARN-8704
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Weiwei Yang
>Priority: Minor
>
> Seeing following error message while starting a privileged docker container
> {noformat}
> Error constructing docker command, docker error code=14, error 
> message='Invalid docker read-write mount'
> {noformat}
> it would be good if it tells us which mount is invalid and how to fix it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8706) DelayedProcessKiller is executed for Docker containers even though docker stop sends a KILL signal after the specified grace period

2018-08-23 Thread Chandni Singh (JIRA)
Chandni Singh created YARN-8706:
---

 Summary: DelayedProcessKiller is executed for Docker containers 
even though docker stop sends a KILL signal after the specified grace period
 Key: YARN-8706
 URL: https://issues.apache.org/jira/browse/YARN-8706
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chandni Singh
Assignee: Chandni Singh


{{DockerStopCommand}} adds a grace period of 10 seconds.

10 seconds is also the default grace time use by docker stop
 [https://docs.docker.com/engine/reference/commandline/stop/]

Documentation of the docker stop:
{quote}the main process inside the container will receive {{SIGTERM}}, and 
after a grace period, {{SIGKILL}}.
{quote}
There is a {{DelayedProcessKiller}} in {{ContainerExcecutor}} which executes 
for all containers after a delay when {{sleepDelayBeforeSigKill>0}}. By default 
this is set to {{250 milliseconds}} and so irrespective of the container type, 
it will get always get executed.
 
For a docker container, {{docker stop}} takes care of sending a {{SIGKILL}} 
after the grace period, so having {{DelayedProcessKiller}} seems redundant.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8705) Refactor in preparation for YARN-8696

2018-08-23 Thread Botong Huang (JIRA)
Botong Huang created YARN-8705:
--

 Summary: Refactor in preparation for YARN-8696
 Key: YARN-8705
 URL: https://issues.apache.org/jira/browse/YARN-8705
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Botong Huang
Assignee: Botong Huang


Refactor the UAM heartbeat thread as well as call back method in preparation 
for YARN-8696 FederationInterceptor upgrade



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8704) Improve the error message for an invalid docker rw mount to be more informative

2018-08-23 Thread Weiwei Yang (JIRA)
Weiwei Yang created YARN-8704:
-

 Summary: Improve the error message for an invalid docker rw mount 
to be more informative
 Key: YARN-8704
 URL: https://issues.apache.org/jira/browse/YARN-8704
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.2.0
Reporter: Weiwei Yang


Seeing following error message while starting a privileged docker container

{noformat}

Error constructing docker command, docker error code=14, error message='Invalid 
docker read-write mount'
{noformat}

it would be good if it tells us which mount is invalid and how to fix it.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2018-08-23 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/

[Aug 22, 2018 3:43:40 AM] (yqlin) HDFS-13821. RBF: Add 
dfs.federation.router.mount-table.cache.enable so
[Aug 22, 2018 5:04:15 PM] (hanishakoneru) HDDS-265. Move 
numPendingDeletionBlocks and deleteTransactionId from
[Aug 22, 2018 5:54:10 PM] (xyao) HDDS-350. ContainerMapping#flushContainerInfo 
doesn't set containerId.
[Aug 22, 2018 9:48:22 PM] (aengineer) HDDS-342. Add example byteman script to 
print out hadoop rpc traffic.
[Aug 23, 2018 1:55:14 AM] (aengineer) HDDS-356. Support ColumnFamily based 
RockDBStore and TableStore.




-1 overall


The following subsystems voted -1:
asflicense findbugs pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml
 

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine
 
   Unread field:FSBasedSubmarineStorageImpl.java:[line 39] 
   Found reliance on default encoding in 
org.apache.hadoop.yarn.submarine.runtimes.yarnservice.YarnServiceJobSubmitter.generateCommandLaunchScript(RunJobParameters,
 TaskType, Component):in 
org.apache.hadoop.yarn.submarine.runtimes.yarnservice.YarnServiceJobSubmitter.generateCommandLaunchScript(RunJobParameters,
 TaskType, Component): new java.io.FileWriter(File) At 
YarnServiceJobSubmitter.java:[line 192] 
   
org.apache.hadoop.yarn.submarine.runtimes.yarnservice.YarnServiceJobSubmitter.generateCommandLaunchScript(RunJobParameters,
 TaskType, Component) may fail to clean up java.io.Writer on checked exception 
Obligation to clean up resource created at YarnServiceJobSubmitter.java:to 
clean up java.io.Writer on checked exception Obligation to clean up resource 
created at YarnServiceJobSubmitter.java:[line 192] is not discharged 
   
org.apache.hadoop.yarn.submarine.runtimes.yarnservice.YarnServiceUtils.getComponentArrayJson(String,
 int, String) concatenates strings using + in a loop At 
YarnServiceUtils.java:using + in a loop At YarnServiceUtils.java:[line 72] 

Failed CTEST tests :

   test_test_libhdfs_threaded_hdfs_static 
   test_libhdfs_threaded_hdfspp_test_shim_static 

Failed junit tests :

   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure 
   hadoop.hdfs.server.balancer.TestBalancer 
   hadoop.hdfs.web.TestWebHdfsTimeouts 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy 
   hadoop.yarn.applications.distributedshell.TestDistributedShell 
   hadoop.mapred.TestMRTimelineEventHandling 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/diff-compile-javac-root.txt
  [328K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/diff-checkstyle-root.txt
  [17M]

   pathlen:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/diff-patch-pylint.txt
  [24K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/diff-patch-shellcheck.txt
  [20K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/diff-patch-shelldocs.txt
  [16K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/whitespace-eol.txt
  [9.4M]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/whitespace-tabs.txt
  [1.1M]

   xml:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/xml.txt
  [4.0K]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-submarine-warnings.html
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/branch-findbugs-hadoop-hdds_client.txt
  [68K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/branch-findbugs-hadoop-hdds_container-service.txt
  [60K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/branch-findbugs-hadoop-hdds_framework.txt
  [8.0K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/877/artifact/out/branch-findbugs-hadoop-hdds_server-scm.txt
  [60K]
   
http

[jira] [Created] (YARN-8703) Localized resource may leak on disk if container is killed while localizing

2018-08-23 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-8703:


 Summary: Localized resource may leak on disk if container is 
killed while localizing
 Key: YARN-8703
 URL: https://issues.apache.org/jira/browse/YARN-8703
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Jason Lowe


If a container is killed while localizing then it releases all of its 
resources.  If the resource count goes to zero and it is in the DOWNLOADING 
state then the resource bookkeeping is removed in the resource tracker.  
Shortly afterwards the localizer could heartbeat in and report the successful 
localization of the resource that was just removed.  When the 
LocalResourcesTrackerImpl receives the LOCALIZED event but does not find the 
corresponding LocalResource for the event then it simply logs a "localized 
without a location" warning.  At that point I think the localized resource has 
been leaked on the disk since the NM has removed bookkeeping for the resource 
without removing it on disk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Windows/x64

2018-08-23 Thread Apache Jenkins Server
For more details, see https://builds.apache.org/job/hadoop-trunk-win/567/

[Aug 22, 2018 11:18:55 PM] (aw) YETUS-660. checkstyle should report when it 
fails to execute
[Aug 22, 2018 11:19:40 PM] (aw) YETUS-611. xml test should specfically say 
which files are broken
[Aug 22, 2018 11:25:05 PM] (aw) YETUS-668. EOL 0.4.0 and 0.5.0
[Aug 22, 2018 5:04:15 PM] (hanishakoneru) HDDS-265. Move 
numPendingDeletionBlocks and deleteTransactionId from
[Aug 22, 2018 5:54:10 PM] (xyao) HDDS-350. ContainerMapping#flushContainerInfo 
doesn't set containerId.
[Aug 22, 2018 9:48:22 PM] (aengineer) HDDS-342. Add example byteman script to 
print out hadoop rpc traffic.
[Aug 23, 2018 1:55:14 AM] (aengineer) HDDS-356. Support ColumnFamily based 
RockDBStore and TableStore.
[Aug 23, 2018 4:35:43 AM] (sunilg) YARN-8015. Support all types of placement 
constraint support for


ERROR: File 'out/email-report.txt' does not exist

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-8702) TestContainerSchedulerQueuing.testKillOnlyRequiredOpportunisticContainers() failing randomly

2018-08-23 Thread Rakesh Shah (JIRA)
Rakesh Shah created YARN-8702:
-

 Summary: 
TestContainerSchedulerQueuing.testKillOnlyRequiredOpportunisticContainers() 
failing randomly
 Key: YARN-8702
 URL: https://issues.apache.org/jira/browse/YARN-8702
 Project: Hadoop YARN
  Issue Type: Bug
  Components: container-queuing
Affects Versions: 3.1.1, 3.1.0, 2.8.3
Reporter: Rakesh Shah
 Fix For: 3.1.1


this ut fails because of the container status not getting correctly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8691) AMRMClient unregisterApplicationMaster Api's appMessage should have a maximum size

2018-08-23 Thread Yicong Cai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yicong Cai resolved YARN-8691.
--
  Resolution: Duplicate
   Fix Version/s: (was: 2.7.7)
  3.0.0-alpha4
Target Version/s:   (was: 2.7.7)

> AMRMClient unregisterApplicationMaster Api's appMessage should have a maximum 
> size
> --
>
> Key: YARN-8691
> URL: https://issues.apache.org/jira/browse/YARN-8691
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Critical
> Fix For: 3.0.0-alpha4
>
>
> SparkSQL AM Codegen ERROR,then call unregister AM API and send the error 
> message to RM, RM receive the AM state and update to RMStateStore. The  
> Codegen error message maybe is huge, (Our case is about 200MB). If the 
> RMStateStore is ZKRMStateStore, it causes the same exception as YARN-6125, 
> but YARN-6125 doesn't cover the unregisterApplicationMaster's message cut.
>  
> SparkSQL Codegen error message show below:
> 18/08/18 08:34:54 ERROR codegen.CodeGenerator: failed to compile: 
> org.codehaus.janino.JaninoRuntimeException: Constant pool has grown past JVM 
> limit of 0x
>  /* 001 */ public java.lang.Object generate(Object[] references)
> { /* 002 */ return new SpecificSafeProjection(references); /* 003 */ }
> /* 004 */
>  /* 005 */ class SpecificSafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection {
>  ..
> about 2 million lines.
> ..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org