[jira] [Resolved] (YARN-9050) [Umbrella] Usability improvements for scheduler activities

2020-03-13 Thread Weiwei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved YARN-9050.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

> [Umbrella] Usability improvements for scheduler activities
> --
>
> Key: YARN-9050
> URL: https://issues.apache.org/jira/browse/YARN-9050
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: image-2018-11-23-16-46-38-138.png
>
>
> We have did some usability improvements for scheduler activities based on 
> YARN3.1 in our cluster as follows:
>  1. Not available for multi-thread asynchronous scheduling. App and node 
> activities maybe confused when multiple scheduling threads record activities 
> of different allocation processes in the same variables like appsAllocation 
> and recordingNodesAllocation in ActivitiesManager. I think these variables 
> should be thread-local to make activities clear among multiple threads.
>  2. Incomplete activities for multi-node lookup mechanism, since 
> ActivitiesLogger will skip recording through \{{if (node == null || 
> activitiesManager == null) }} when node is null which represents this 
> allocation is for multi-nodes. We need support recording activities for 
> multi-node lookup mechanism.
>  3. Current app activities can not meet requirements of diagnostics, for 
> example, we can know that node doesn't match request but hard to know why, 
> especially when using placement constraints, it's difficult to make a 
> detailed diagnosis manually. So I propose to improve the diagnoses of 
> activities, add diagnosis for placement constraints check, update 
> insufficient resource diagnosis with detailed info (like 'insufficient 
> resource names:[memory-mb]') and so on.
>  4. Add more useful fields for app activities, in some scenarios we need to 
> distinguish different requests but can't locate requests based on app 
> activities info, there are some other fields can help to filter what we want 
> such as allocation tags. We have added containerPriority, allocationRequestId 
> and allocationTags fields in AppAllocation.
>  5. Filter app activities by key fields, sometimes the results of app 
> activities is massive, it's hard to find what we want. We have support filter 
> by allocation-tags to meet requirements from some apps, more over, we can 
> take container-priority and allocation-request-id as candidates if necessary.
>  6. Aggregate app activities by diagnoses. For a single allocation process, 
> activities still can be massive in a large cluster, we frequently want to 
> know why request can't be allocated in cluster, it's hard to check every node 
> manually in a large cluster, so that aggregation for app activities by 
> diagnoses is necessary to solve this trouble. We have added groupingType 
> parameter for app-activities REST API for this, supports grouping by 
> diagnostics.
> I think we can have a discuss about these points, useful improvements which 
> can be accepted will be added into the patch. Thanks.
> Running design doc is attached 
> [here|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.2jnaobmmfne5].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Apache Hadoop Ozone 0.5.0-beta RC1

2020-03-13 Thread Arpit Agarwal
HDDS-3116 is now fixed in the ozone-0.5.0 branch.

Folks - any more potential blockers before Dinesh spins RC2? I don’t see 
anything in Jira at the moment:

https://issues.apache.org/jira/issues/?jql=project%20in%20(%22HDDS%22)%20and%20%22Target%20Version%2Fs%22%20in%20(0.5.0)%20and%20resolution%20in%20(Unresolved)%20and%20priority%20in%20(Blocker)
 


Thanks,
Arpit


> On Mar 9, 2020, at 6:15 PM, Shashikant Banerjee 
>  wrote:
> 
> I think https://issues.apache.org/jira/browse/HDDS-3116 
>  is a blocker for
> the release. Because of this, datanodes fail to communicate with SCM and
> marked dead and don't seem to recover.
> This has been observed in multiple test setups.
> 
> Thanks
> Shashi
> 
> On Mon, Mar 9, 2020 at 9:20 PM Attila Doroszlai  >
> wrote:
> 
>> +1
>> 
>> * Verified GPG signature and SHA512 checksum
>> * Compiled sources
>> * Ran ozone smoke test against both binary and locally compiled versions
>> 
>> Thanks Dinesh for RC1.
>> 
>> -Attila
>> 
>> On Sun, Mar 8, 2020 at 2:34 AM Arpit Agarwal
>>  wrote:
>>> 
>>> +1 (binding)
>>> Verified mds, sha512
>>> Verified signatures
>>> Built from source
>>> Deployed to 3 node cluster
>>> Tried a few ozone shell and filesystem commands
>>> Ran freon load generator
>>> Thanks Dinesh for putting the RC1 together.
>>> 
>>> 
>>> 
 On Mar 6, 2020, at 4:46 PM, Dinesh Chitlangia 
>> wrote:
 
 Hi Folks,
 
 We have put together RC1 for Apache Hadoop Ozone 0.5.0-beta.
 
 The RC artifacts are at:
 https://home.apache.org/~dineshc/ozone-0.5.0-rc1/
 
 The public key used for signing the artifacts can be found at:
 https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
 
 The maven artifacts are staged at:
 
>> https://repository.apache.org/content/repositories/orgapachehadoop-1260
 
 The RC tag in git is at:
 https://github.com/apache/hadoop-ozone/tree/ozone-0.5.0-beta-RC1
 
 This release contains 800+ fixes/improvements [1].
 Thanks to everyone who put in the effort to make this happen.
 
 *The vote will run for 7 days, ending on March 13th 2020 at 11:59 pm
>> PST.*
 
 Note: This release is beta quality, it’s not recommended to use in
 production but we believe that it’s stable enough to try out the
>> feature
 set and collect feedback.
 
 
 [1] https://s.apache.org/ozone-0.5.0-fixed-issues
 
 Thanks,
 Dinesh Chitlangia
>>> 
>> 
>> -
>> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org 
>> 
>> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org 
>> 


Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2020-03-13 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1437/

[Mar 12, 2020 9:33:12 AM] (sunilg) YARN-10191. FS-CS converter: call 
System.exit function call for every
[Mar 12, 2020 11:18:37 AM] (snemeth) YARN-10193. FS-CS converter: fix incorrect 
capacity conversion.
[Mar 12, 2020 11:29:03 AM] (snemeth) YARN-9997. Code cleanup in 
ZKConfigurationStore. Contributed by Andras
[Mar 12, 2020 1:29:17 PM] (surendralilhore) HDFS-14442. Disagreement between 
HAUtil.getAddressOfActive and
[Mar 12, 2020 2:13:55 PM] (github) HADOOP-15430. hadoop fs -mkdir -p 
path-ending-with-slash/ fails with
[Mar 13, 2020 12:22:11 AM] (inigoiri) HDFS-14612. SlowDiskReport won't update 
when SlowDisks is always empty




-1 overall


The following subsystems voted -1:
asflicense findbugs pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml
 

FindBugs :

   module:hadoop-cloud-storage-project/hadoop-cos 
   Redundant nullcheck of dir, which is known to be non-null in 
org.apache.hadoop.fs.cosn.BufferPool.createDir(String) Redundant null check at 
BufferPool.java:is known to be non-null in 
org.apache.hadoop.fs.cosn.BufferPool.createDir(String) Redundant null check at 
BufferPool.java:[line 66] 
   org.apache.hadoop.fs.cosn.CosNInputStream$ReadBuffer.getBuffer() may 
expose internal representation by returning CosNInputStream$ReadBuffer.buffer 
At CosNInputStream.java:by returning CosNInputStream$ReadBuffer.buffer At 
CosNInputStream.java:[line 87] 
   Found reliance on default encoding in 
org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFile(String, File, 
byte[]):in org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFile(String, 
File, byte[]): new String(byte[]) At CosNativeFileSystemStore.java:[line 199] 
   Found reliance on default encoding in 
org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFileWithRetry(String, 
InputStream, byte[], long):in 
org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.storeFileWithRetry(String, 
InputStream, byte[], long): new String(byte[]) At 
CosNativeFileSystemStore.java:[line 178] 
   org.apache.hadoop.fs.cosn.CosNativeFileSystemStore.uploadPart(File, 
String, String, int) may fail to clean up java.io.InputStream Obligation to 
clean up resource created at CosNativeFileSystemStore.java:fail to clean up 
java.io.InputStream Obligation to clean up resource created at 
CosNativeFileSystemStore.java:[line 252] is not discharged 

Failed junit tests :

   hadoop.hdfs.server.balancer.TestBalancer 
   hadoop.hdfs.server.namenode.TestNameNodeMXBean 
   hadoop.yarn.applications.distributedshell.TestDistributedShell 
   hadoop.yarn.sls.TestSLSStreamAMSynth 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1437/artifact/out/diff-compile-cc-root.txt
  [8.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1437/artifact/out/diff-compile-javac-root.txt
  [424K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1437/artifact/out/diff-checkstyle-root.txt
  [16M]

   pathlen:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1437/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1437/artifact/out/diff-patch-pylint.txt
  [24K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1437/artifact/out/diff-patch-shellcheck.txt
  [16K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1437/artifact/out/diff-patch-shelldocs.txt
  [44K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1437/artifact/out/whitespace-eol.txt
  [9.9M]
   

[jira] [Created] (YARN-10198) [managedParent].%primary_group placement doesn't work after YARN-9868

2020-03-13 Thread Peter Bacsko (Jira)
Peter Bacsko created YARN-10198:
---

 Summary: [managedParent].%primary_group placement doesn't work 
after YARN-9868
 Key: YARN-10198
 URL: https://issues.apache.org/jira/browse/YARN-10198
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Reporter: Peter Bacsko
Assignee: Peter Bacsko


YARN-9868 introduced an unnecessary check if we have the following placement 
rule:

[managedParentQueue].%primary_group

Here, {{%primary_group}} is expected to be created if it doesn't exist. 
However, there is this validation code which is not necessary:

{noformat}
  } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
if (this.queueManager
.getQueue(groups.getGroups(user).get(0)) != null) {
  return getPlacementContext(mapping,
  groups.getGroups(user).get(0));
} else {
  return null;
}
{noformat}

We should revert this part to the original version:
{noformat}
  } else if (mapping.queue.equals(PRIMARY_GROUP_MAPPING)) {
return getPlacementContext(mapping, groups.getGroups(user).get(0));
}
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: branch2.10+JDK7 on Linux/x86

2020-03-13 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/623/

[Mar 12, 2020 3:49:38 AM] (iwasakims) HDFS-15077. Fix intermittent failure of




-1 overall


The following subsystems voted -1:
asflicense findbugs hadolint pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml
 
   hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml 
   hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml
 

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client
 
   Boxed value is unboxed and then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:[line 335] 

Failed junit tests :

   hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys 
   hadoop.hdfs.server.datanode.TestFsDatasetCache 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.registry.secure.TestSecureLogins 
   hadoop.yarn.client.api.impl.TestAMRMProxy 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/623/artifact/out/diff-compile-cc-root-jdk1.7.0_95.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/623/artifact/out/diff-compile-javac-root-jdk1.7.0_95.txt
  [328K]

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/623/artifact/out/diff-compile-cc-root-jdk1.8.0_242.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/623/artifact/out/diff-compile-javac-root-jdk1.8.0_242.txt
  [308K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/623/artifact/out/diff-checkstyle-root.txt
  [16M]

   hadolint:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/623/artifact/out/diff-patch-hadolint.txt
  [4.0K]

   pathlen:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/623/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/623/artifact/out/diff-patch-pylint.txt
  [24K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/623/artifact/out/diff-patch-shellcheck.txt
  [56K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/623/artifact/out/diff-patch-shelldocs.txt
  [8.0K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/623/artifact/out/whitespace-eol.txt
  [12M]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/623/artifact/out/whitespace-tabs.txt
  [1.3M]

   xml:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/623/artifact/out/xml.txt
  [12K]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/623/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-hbase_hadoop-yarn-server-timelineservice-hbase-client-warnings.html
  [8.0K]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/623/artifact/out/diff-javadoc-javadoc-root-jdk1.7.0_95.txt
  [16K]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/623/artifact/out/diff-javadoc-javadoc-root-jdk1.8.0_242.txt
  [1.1M]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/623/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [284K]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/623/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/623/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-registry.txt
  [12K]
   

[jira] [Created] (YARN-10197) FS-CS converter: fix emitted ordering policy string and max-am-resource percent value

2020-03-13 Thread Peter Bacsko (Jira)
Peter Bacsko created YARN-10197:
---

 Summary: FS-CS converter: fix emitted ordering policy string and 
max-am-resource percent value
 Key: YARN-10197
 URL: https://issues.apache.org/jira/browse/YARN-10197
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Peter Bacsko
Assignee: Peter Bacsko






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-10196) destroying app leaks zookeeper connection

2020-03-13 Thread kyungwan nam (Jira)
kyungwan nam created YARN-10196:
---

 Summary: destroying app leaks zookeeper connection
 Key: YARN-10196
 URL: https://issues.apache.org/jira/browse/YARN-10196
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: kyungwan nam
Assignee: kyungwan nam


when destroying app, curatorClient in ServiceClient is started. but It is never 
closed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org