Re: [VOTE] Release Apache Hadoop 3.1.4 (RC4)
+1 (binding). **TEST STEPS** 1. Build from sources (see Maven / Java and OS details below) 2. Distribute Hadoop to all nodes 3. Start HDFS services + YARN services on nodes 4. Run Mapreduce pi job (QuasiMontecarlo) 5. Verifed that application was successful through YARN RM Web UI 6. Verified version of Hadoop release from YARN RM Web UI **OS version** $ cat /etc/os-release NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/"; BUG_REPORT_URL="https://bugs.centos.org/"; CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7" **Maven version** $ mvn -v Apache Maven 3.0.5 (Red Hat 3.0.5-17) Maven home: /usr/share/maven **Java version** Java version: 1.8.0_191, vendor: Oracle Corporation Java home: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-0.el7_5.x86_64/jre Default locale: en_US, platform encoding: ANSI_X3.4-1968 OS name: "linux", version: "3.10.0-1062.el7.x86_64", arch: "amd64", family: "unix" **Maven command to build from sources** mvn clean package -Pdist -DskipTests -Dmaven.javadoc.skip=true **OTHER NOTES** 1. Had to manually install maven in order to manually compile Hadoop based on these steps: https://gist.github.com/miroslavtamas/cdca97f2eafdd6c28b844434eaa3b631 2. Had to manually install protoc + other required libraries with the following commands (in this particular order): sudo yum install -y protobuf-devel sudo yum install -y gcc gcc-c++ make sudo yum install -y openssl-devel sudo yum install -y libgsasl Thanks, Szilard On Thu, Jul 23, 2020 at 4:05 PM Masatake Iwasaki < iwasak...@oss.nttdata.co.jp> wrote: > +1 (binding). > > * verified the checksum and signature of the source tarball. > * built from source tarball with native profile on CentOS 7 and OpenJDK 8. > * built documentation and skimmed the contents. > * ran example jobs on 3 nodes docker cluster with NN-HA and RM-HA enblaed. > * launched pseudo-distributed cluster with Kerberos and SSL enabled, ran > basic EZ operation, ran example MR jobs. > * followed the reproduction step reported in HDFS-15313 to see if the > fix works. > > Thanks, > Masatake Iwasaki > > On 2020/07/21 21:50, Gabor Bota wrote: > > Hi folks, > > > > I have put together a release candidate (RC4) for Hadoop 3.1.4. > > > > * > > The RC includes in addition to the previous ones: > > * fix for HDFS-15313. Ensure inodes in active filesystem are not > > deleted during snapshot delete > > * fix for YARN-10347. Fix double locking in > > CapacityScheduler#reinitialize in branch-3.1 > > (https://issues.apache.org/jira/browse/YARN-10347) > > * the revert of HDFS-14941, as it caused > > HDFS-15421. IBR leak causes standby NN to be stuck in safe mode. > > (https://issues.apache.org/jira/browse/HDFS-15421) > > * HDFS-15323, as requested. > > (https://issues.apache.org/jira/browse/HDFS-15323) > > * > > > > The RC is available at: > http://people.apache.org/~gabota/hadoop-3.1.4-RC4/ > > The RC tag in git is here: > > https://github.com/apache/hadoop/releases/tag/release-3.1.4-RC4 > > The maven artifacts are staged at > > https://repository.apache.org/content/repositories/orgapachehadoop-1275/ > > > > You can find my public key at: > > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS > > and http://keys.gnupg.net/pks/lookup?op=get&search=0xB86249D83539B38C > > > > Please try the release and vote. The vote will run for 8 weekdays, > > until July 31. 2020. 23:00 CET. > > > > > > Thanks, > > Gabor > > > > - > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > > - > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org > >
Re: [E] Re: [DISCUSS] Change project style guidelines to allow line length 100
Thanks for this initiative, Sean. +1 for increasing the length to 100 characters. I can't see a VOTE thread regarding this subject. Am I missing something? Best, Szilard On Mon, May 24, 2021 at 11:49 PM Jonathan Eagles wrote: > In apache tez, formal line length is 120 characters. So, I recommend 120+ > > On Mon, May 24, 2021 at 4:46 PM Kihwal Lee .invalid> > wrote: > > > +1 for the 100 char limit. > > But I would have liked 132 columns more. :) > > > > Kihwal > > > > On Mon, May 24, 2021 at 1:46 PM Sean Busbey > > wrote: > > > > > Hi folks! > > > > > > The consensus seems pretty strongly in favor of increasing the line > > length > > > limit. Do folks still want to see a formal VOTE thread? > > > > > > > > > > On May 19, 2021, at 4:22 PM, Sean Busbey > > > wrote: > > > > > > > > Hello! > > > > > > > > What do folks think about changing our line length guidelines to > allow > > > for 100 character width? > > > > > > > > Currently, we tell folks to follow the sun style guide with some > > > exception unrelated to line length. That guide says width of 80 is the > > > standard and our current check style rules act as enforcement. > > > > > > > > Looking at the current trunk codebase our nightly build shows a total > > of > > > ~15k line length violations; it’s about 18% of identified checkstyle > > issues. > > > > > > > > The vast majority of those line length violations are <= 100 > characters > > > long. 100 characters happens to be the length for the Google Java Style > > > Guide, another commonly adopted style guide for java projects, so I > > suspect > > > these longer lines leaking past the checkstyle precommit warning might > > be a > > > reflection of committers working across multiple java codebases. > > > > > > > > I don’t feel strongly about lines being longer, but I would like to > > move > > > towards more consistent style enforcement as a project. Updating our > > > project guidance to allow for 100 character lines would reduce the > > > likelihood that folks bringing in new contributions need a precommit > test > > > cycle to get the formatting correct. > > > > > > > > Does anyone feel strongly about keeping the line length limit at 80 > > > characters? > > > > > > > > Does anyone feel strongly about contributions coming in that clear up > > > line length violations? > > > > > > > > > > - > > > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org > > > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org > > > > > > > > >
Re: [HELP] Review request for YARN-9214 and YARN-9401
Hi, Sure, I can do that tomorrow. Szilard On Tue, Mar 26, 2019 at 3:41 PM Wanqiang Ji wrote: > Hi folks, > > Can someone help to review YARN-9214 and YARN-9401? > YARN-9214. Add AbstractYarnScheduler#getValidQueues method to resolve > duplicate code > YARN-9401. Fix `yarn version` print the version info is the same as `hadoop > version` > > > -Wanqiang Ji >
ResourceManager & Resource Types topic to discuss: SafeMode
Hi, This could be interesting for anyone interested in RM / Resource Types: I filed a jira recently: https://issues.apache.org/jira/browse/YARN-9421 (Implement SafeMode for ResourceManager by defining a resource threshold). The issue in one sentence: If an app is submitted while RM still haven't received all registration requests from NMs and if the demand of the app contains any custom resource (e.g. GPU), it can happen that the app will be rejected quickly with a InvalidResourceRequestException. Later on, the same app submitted later could be accepted, if the NMs are registered (most likely couple of seconds later). In this sense, the behavior of RM is not consistent. Please read through the jira, I think the issue is well described there! Thanks a lot, Szilard
Tool to view current status (applicability) of patches
Hi, I developed a tool recently, that can help to see the status of patches and checks if they can be applied to trunk (or any specified branch). The motivation for the project was quite trivial: It's very cumbersome to keep track of the status of all our pending patches with our YARN team. We already have a sheet to keep track of pending patches of our upstream work, so there came my idea: Let's write a script that checks if the patches still apply to trunk (or any specified branch / branches). The project currently has jira and Google Sheets integration (read/write). On a more longer-term, I'm planning to provide some javascript script that would place Red/Green status lights next to the patches that indicate their applicability to trunk. I would also pay attention to minimize requests sent to jira, so I'm planning to introduce some caching and provide a "force-refresh status" button to get the current status of the patch. Do you guys think this is a good idea and it's worth to spend some more time on this? This would require a moderate amount of work but my main concern is where to host this service. Is there an Apache server (or any other infra) that could host this application? The memory / cpu footprint is quite moderate, it requires some network bandwidth, though. Here's the link for the git repo of the project: https://github.com/szilard-nemeth/hadoop-reviewsync The project is still in a PoC phase so the code is not the cleanest, I'm planning to improve this in the near-future. Please feel free to share your thoughs, feedback, ideas, anything! Thanks, Szilard ╒═══╤╤═══╤╤══╤═══╤╤═╤══╤═══╕ │ Row │ Issue │ Patch apply │ Owner │ Patch file │ Branch│ Explicit │ Result │ Number of conflicted files │ Overall result │ ╞═══╪╪═══╪╪══╪═══╪╪═╪══╪═══╡ │ 1 │ YARN-8553 │ 1 │ Szilard Nemeth │ YARN-8553.003.patch │ origin/trunk │ Yes│ CONFLICT │ 1│ origin/trunk: CONFLICT, origin/branch-3.2: OK, origin/branch-3.1: CONFLICT│ ├───┼┼───┼┼──┼───┼┼─┼──┼───┤ │ 2 │ YARN-8553 │ 2 │ Szilard Nemeth │ YARN-8553.003.patch │ origin/branch-3.2 │ No │ APPLIES CLEANLY │ N/A │ origin/trunk: CONFLICT, origin/branch-3.2: OK, origin/branch-3.1: CONFLICT│ ├───┼┼───┼┼──┼───┼┼─┼──┼───┤ │ 3 │ YARN-8553 │ 3 │ Szilard Nemeth │ YARN-8553.003.patch │ origin/branch-3.1 │ No │ CONFLICT │ 1│ origin/trunk: CONFLICT, origin/branch-3.2: OK, origin/branch-3.1: CONFLICT│ ├───┼┼───┼┼──┼───┼┼─┼──┼───┤ │ 4 │ YARN-5464 │ 1 │ Antal Bálint Steinbach │ YARN-5464.005.patch │ origin/trunk │ Yes│ CONFLICT │ 3│ origin/trunk: CONFLICT, origin/branch-3.2: CONFLICT, origin/branch-3.1: CONFLICT │ ├───┼┼───┼┼──┼───┼┼─┼──┼───┤ │ 5 │ YARN-5464 │ 2 │ Antal Bálint Steinbach │ YARN-5464.005.patch │ origin/branch-3.2 │ No │ CONFLICT │ 12 │ origin/trunk: CONFLICT, origin/branc
Re: [VOTE] Release Apache Hadoop Submarine 0.2.0 - RC0
+1 (non-binding) On Fri, Jun 21, 2019, 09:09 Weiwei Yang wrote: > +1 (binding) > > Thanks > Weiwei > On Jun 21, 2019, 5:33 AM +0800, Wangda Tan , wrote: > +1 Binding. Tested in local cluster and reviewed docs. > > Thanks! > > On Wed, Jun 19, 2019 at 3:20 AM Sunil Govindan wrote: > > +1 binding > > - tested in local cluster. > - tried tony run time as well > - doc seems fine now. > > - Sunil > > > On Thu, Jun 6, 2019 at 6:53 PM Zhankun Tang wrote: > > Hi folks, > > Thanks to all of you who have contributed in this submarine 0.2.0 > release. > We now have a release candidate (RC0) for Apache Hadoop Submarine 0.2.0. > > > The Artifacts for this Submarine-0.2.0 RC0 are available here: > > https://home.apache.org/~ztang/submarine-0.2.0-rc0/ > > > It's RC tag in git is "submarine-0.2.0-RC0". > > > > The maven artifacts are available via repository.apache.org at > https://repository.apache.org/content/repositories/orgapachehadoop-1221/ > > > This vote will run 7 days (5 weekdays), ending on 13th June at 11:59 pm > PST. > > > > The highlights of this release. > > 1. Linkedin's TonY runtime support in Submarine > > 2. PyTorch enabled in Submarine with both YARN native service runtime > (single node) and TonY runtime > > 3. Support uber jar of Submarine to submit the job > > 4. The YAML file to describe a job > > 5. The Notebook support (by Apache Zeppelin Submarine interpreter) > > > Thanks to Sunil, Wangda, Xun, Zac, Keqiu, Szilard for helping me in > preparing the release. > > I have done a few testing with my pseudo cluster. My +1 (non-binding) to > start. > > > > Regards, > Zhankun > > >
Re: Any thoughts making Submarine a separate Apache project?
+1, this is a very great idea. As Hadoop repository has already grown huge and contains many projects, I think in general it's a good idea to separate projects in the early phase. On Wed, Jul 17, 2019, 08:50 runlin zhang wrote: > +1 ,That will be great ! > > > 在 2019年7月10日,下午3:34,Xun Liu 写道: > > > > Hi all, > > > > This is Xun Liu contributing to the Submarine project for deep learning > > workloads running with big data workloads together on Hadoop clusters. > > > > There are a bunch of integrations of Submarine to other projects are > > finished or going on, such as Apache Zeppelin, TonY, Azkaban. The next > step > > of Submarine is going to integrate with more projects like Apache Arrow, > > Redis, MLflow, etc. & be able to handle end-to-end machine learning use > > cases like model serving, notebook management, advanced training > > optimizations (like auto parameter tuning, memory cache optimizations for > > large datasets for training, etc.), and make it run on other platforms > like > > Kubernetes or natively on Cloud. LinkedIn also wants to donate TonY > project > > to Apache so we can put Submarine and TonY together to the same codebase > > (Page #30. > > > https://www.slideshare.net/xkrogen/hadoop-meetup-jan-2019-tony-tensorflow-on-yarn-and-beyond#30 > > ). > > > > This expands the scope of the original Submarine project in exciting new > > ways. Toward that end, would it make sense to create a separate Submarine > > project at Apache? This can make faster adoption of Submarine, and allow > > Submarine to grow to a full-blown machine learning platform. > > > > There will be lots of technical details to work out, but any initial > > thoughts on this? > > > > Best Regards, > > Xun Liu > > > - > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > >
Re: yarn nm builds breaking on 3.1 and 3.2
Hi Steve! Reverted YARN-9128 commits on branch-3.2 / branch-3.1 Sorry for the hassle! Best, Szilard On Wed, Oct 9, 2019 at 7:46 PM Sunil Govindan wrote: > YARN-9128 caused this, Szilard is checking this. > > > Thanks > > Sunil > > > On Wed, Oct 9, 2019 at 10:51 PM Steve Loughran > > wrote: > > > I'm seeing the yarn branch 3.1 and 3.2 builds breaking right now; one of > > the patches is gone in the last 24 hours has done it. > > > > [INFO] Finished at: 2019-10-09T18:18:31+01:00 > > [INFO] > > > > [ERROR] Failed to execute goal > > org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile > > (default-testCompile) on project hadoop-yarn-server-nodemanager: > > Compilation failure: Compilation failure: > > [ERROR] > > > > > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestResourceMappings.java:[22,66] > > package org.apache.hadoop.yarn.server.nodemanager.api.deviceplugin does > not > > exist > > [ERROR] > > > > > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestResourceMappings.java:[41,15] > > package Device does not exist > > [ERROR] > > > > > hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestResourceMappings.java:[49,15] > > package Device does not exist > > [ERROR] -> [Help 1] > > [ERROR] > > > > > > If I revert either branch to HADOOP-16491. Upgrade jetty version to > 9.3.27 > > then everything works again; > > > > I don't want to blindly revert all three; can someone take a a look and > fix > > these up, or, if there's no easy fix, pull the commit at fault. > > > > thanks > > > > Steve > > >
[jira] [Created] (YARN-10264) Add container launch related env / classpath debug info to container logs when a container fails
Szilard Nemeth created YARN-10264: - Summary: Add container launch related env / classpath debug info to container logs when a container fails Key: YARN-10264 URL: https://issues.apache.org/jira/browse/YARN-10264 Project: Hadoop YARN Issue Type: Task Reporter: Szilard Nemeth Assignee: Szilard Nemeth Sometimes when a container fails to launch, it can be pretty hard to figure out why it failed. Similar to YARN-4309, we can add a switch to control if the printing of environment variables and Java classpath should be done. As a bonus, [jdeps|https://docs.oracle.com/javase/8/docs/technotes/tools/unix/jdeps.html] could also be utilized to print some verbose info about the classpath. When log aggregation occurs, all this information will automatically get collected and make debugging such container launch failures much easier. Below is an example output when the user faces a classpath configuration issue: {code:java} End of LogType:prelaunch.err ** 2020-04-19 05:49:12,145 DEBUG:app_info:Diagnostics of the failed app 2020-04-19 05:49:12,145 DEBUG:app_info:Application application_1587300264561_0001 failed 2 times due to AM Container for appattempt_1587300264561_0001_02 exited with exitCode: 1 Failing this attempt.Diagnostics: [2020-04-19 12:45:01.955]Exception from container-launch. Container id: container_e60_1587300264561_0001_02_01 Exit code: 1 Exception message: Launch container failed Shell output: main : command provided 1 main : run as user is systest main : requested yarn user is systest Getting exit code file... Creating script paths... Writing pid file... Writing to tmp file /dataroot/ycloud/yarn/nm/nmPrivate/application_1587300264561_0001/container_e60_1587300264561_0001_02_01/container_e60_1587300264561_0001_02_01.pid.tmp Writing to cgroup task files... Creating local dirs... Launching container... Getting exit code file... Creating script paths... [2020-04-19 12:45:01.984]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster Please check whether your etc/hadoop/mapred-site.xml contains the below configuration: yarn.app.mapreduce.am.env HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory} mapreduce.map.env HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory} mapreduce.reduce.env HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory} [2020-04-19 12:45:01.985]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster Please check whether your etc/hadoop/mapred-site.xml contains the below configuration: yarn.app.mapreduce.am.env HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory} mapreduce.map.env HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory} mapreduce.reduce.env HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory} For more detailed output, check the application tracking page: http://quasar-plnefj-2.quasar-plnefj.root.hwx.site:8088/cluster/app/application_1587300264561_0001 Then click on links to logs of each attempt. ... 2020-04-19 05:49:12,148 INFO:util:* End test_app_API (yarn.suite.YarnAPITests) * {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10321) Break down TestUserGroupMappingPlacementRule#testMapping into test scenarios
Szilard Nemeth created YARN-10321: - Summary: Break down TestUserGroupMappingPlacementRule#testMapping into test scenarios Key: YARN-10321 URL: https://issues.apache.org/jira/browse/YARN-10321 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth Assignee: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10323) [Umbrella] YARN Debuggability and Supportability Improvements
Szilard Nemeth created YARN-10323: - Summary: [Umbrella] YARN Debuggability and Supportability Improvements Key: YARN-10323 URL: https://issues.apache.org/jira/browse/YARN-10323 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth Troubleshooting YARN problems can be difficult on a production environment. Collecting data before problems occur or actively collecting data in an on-demand basis could truly help tracking down issues. Some examples: 1. If application is hanging, application logs along with RM / NM logs could be collected, plus jstack of either the YARN daemons or the application container. 2. Similarly, when an application fails we may collect data. 3. Scheduler issues are quite common so good tooling that helps to spot issues would be crucial. Design document will be added later. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10488) Several typos in package: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair
Szilard Nemeth created YARN-10488: - Summary: Several typos in package: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair Key: YARN-10488 URL: https://issues.apache.org/jira/browse/YARN-10488 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth 1. Typo in field name: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.VisitedResourceRequestTracker.TrackerPerPriorityResource#racksVisted 2. Typo in method: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager#setChildResourceLimits There's a comment: "... max reource ...", typo in the word 'resource'. 3. Typo in javadoc of method: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt#reserve "bookeeping" -> "bookkeeping" 4. There's a local variable in this method: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt#updateAMDiagnosticMsg, called diagnosticMessageBldr. It's an abbreviation, but could be changed to something more meaningful. 5. Typo in javadoc of method: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.MaxRunningAppsEnforcer#updateRunnabilityOnReload "reinitilized" --> "reinitialized" 6. And last but not least, a funny typo in the method name of: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.DominantResourceFairnessPolicy.DominantResourceFairnessComparator#compareAttribrutes -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10547) Decouple job parsing logic from SLSRunner
Szilard Nemeth created YARN-10547: - Summary: Decouple job parsing logic from SLSRunner Key: YARN-10547 URL: https://issues.apache.org/jira/browse/YARN-10547 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth Assignee: Szilard Nemeth SLSRunner has too many responsibilities. One of them is to parse the job details from the SLS input formats and launch the AMs and task containers. As a first step, the job parser logic could be decoupled from this class. There are 3 types of inputs: - SLS trace - Synth - Rumen Their job parsing method are: - SLS trace: https://github.com/apache/hadoop/blob/005b854f6bad66defafae0abf95dabc6c36ca8b1/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java#L479-L526 - Synth: https://github.com/apache/hadoop/blob/005b854f6bad66defafae0abf95dabc6c36ca8b1/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java#L722-L790 - Rumen: https://github.com/apache/hadoop/blob/005b854f6bad66defafae0abf95dabc6c36ca8b1/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java#L651-L716 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10548) CLONE - Decouple job parsing logic from SLSRunner
Szilard Nemeth created YARN-10548: - Summary: CLONE - Decouple job parsing logic from SLSRunner Key: YARN-10548 URL: https://issues.apache.org/jira/browse/YARN-10548 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth Assignee: Szilard Nemeth SLSRunner has too many responsibilities. One of them is to parse the job details from the SLS input formats and launch the AMs and task containers. As a first step, the job parser logic could be decoupled from this class. There are 3 types of inputs: - SLS trace - Synth - Rumen Their job parsing method are: - SLS trace: https://github.com/apache/hadoop/blob/005b854f6bad66defafae0abf95dabc6c36ca8b1/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java#L479-L526 - Synth: https://github.com/apache/hadoop/blob/005b854f6bad66defafae0abf95dabc6c36ca8b1/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java#L722-L790 - Rumen: https://github.com/apache/hadoop/blob/005b854f6bad66defafae0abf95dabc6c36ca8b1/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java#L651-L716 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10549) Decouple RM runner logic from SLSRunner
Szilard Nemeth created YARN-10549: - Summary: Decouple RM runner logic from SLSRunner Key: YARN-10549 URL: https://issues.apache.org/jira/browse/YARN-10549 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth Assignee: Szilard Nemeth SLSRunner has too many responsibilities. One of them is to parse the job details from the SLS input formats and launch the AMs and task containers. The AM runner logic could be decoupled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10550) Decouple NM runner logic from SLSRunner
Szilard Nemeth created YARN-10550: - Summary: Decouple NM runner logic from SLSRunner Key: YARN-10550 URL: https://issues.apache.org/jira/browse/YARN-10550 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth Assignee: Szilard Nemeth SLSRunner has too many responsibilities. One of them is to parse the job details from the SLS input formats and launch the AMs and task containers. The RM runner logic could be decoupled. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10552) Eliminate code duplication in SLSCapacityScheduler and SLSFairScheduler
Szilard Nemeth created YARN-10552: - Summary: Eliminate code duplication in SLSCapacityScheduler and SLSFairScheduler Key: YARN-10552 URL: https://issues.apache.org/jira/browse/YARN-10552 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth Assignee: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10579) CLONE - CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include mode of operation for CS
Szilard Nemeth created YARN-10579: - Summary: CLONE - CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include mode of operation for CS Key: YARN-10579 URL: https://issues.apache.org/jira/browse/YARN-10579 Project: Hadoop YARN Issue Type: Sub-task Reporter: Benjamin Teke Assignee: Szilard Nemeth Under this umbrella (YARN-10496), weight-mode has been implemented for CS with YARN-10504. We would like to expose the mode of operation with the RM's /scheduler REST endpoint. The field name will be 'mode'. All queue representations in the response will be uniformly hold any of the mode values of: "percentage", "absolute", "weight". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10580) Fix some issues in TestRMWebServicesCapacitySchedDynamicConfig
Szilard Nemeth created YARN-10580: - Summary: Fix some issues in TestRMWebServicesCapacitySchedDynamicConfig Key: YARN-10580 URL: https://issues.apache.org/jira/browse/YARN-10580 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth YARN-10512 introduced some changes that could be fixed, [~pbacsko] highlighted those issues with this comment. Pasting the contents of the comment as a reference #1 In TestRMWebServicesCapacitySchedDynamicConfig: {noformat} config.set(YarnConfiguration.SCHEDULER_CONFIGURATION_STORE_CLASS, YarnConfiguration.MEMORY_CONFIGURATION_STORE); {noformat} This call is repeated multiple times, this could be set somewhere else. #2 In TestRMWebServicesCapacitySchedDynamicConfig: {noformat} validateSchedulerInfo(json, "weight", "root.default", "root.test1", "root.test2"); {noformat} "root.default", "root.test1" and "root.test2" are the same in all cases, you might want to drop them #3 In TestRMWebServicesCapacitySchedDynamicConfig {noformat} @Before @Override public void setUp() throws Exception { super.setUp(); } {noformat} This method does nothing, can be removed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10581) CLONE - CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include weight values for queues
Szilard Nemeth created YARN-10581: - Summary: CLONE - CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include weight values for queues Key: YARN-10581 URL: https://issues.apache.org/jira/browse/YARN-10581 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth Assignee: Szilard Nemeth Under this umbrella (YARN-10496), weight-mode has been implemented for CS with YARN-10504. We would like to expose the weight values for all queues with the RM's /scheduler REST endpoint. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10672) All testcases in TestReservations are flaky
Szilard Nemeth created YARN-10672: - Summary: All testcases in TestReservations are flaky Key: YARN-10672 URL: https://issues.apache.org/jira/browse/YARN-10672 Project: Hadoop YARN Issue Type: Bug Reporter: Szilard Nemeth Assignee: Szilard Nemeth Running a particular test in TestReservations 100 times never passes all the time. For example, let's run testReservationNoContinueLook 100 times. For me, it produced 39 failed and 61 passed results. Screenshot is attached. Stacktrace: {code} java.lang.AssertionError: Expected :2048 Actual :0 at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.failNotEquals(Assert.java:835) at org.junit.Assert.assertEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:633) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.testReservationNoContinueLook(TestReservations.java:642) {code} The test fails here: {code} // Start testing... // Only AM TestUtils.applyResourceCommitRequest(clusterResource, a.assignContainers(clusterResource, node_0, new ResourceLimits(clusterResource), SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY), nodes, apps); assertEquals(2 * GB, a.getUsedResources().getMemorySize()); {code} With some debugging (patch attached), I realized that sometimes there are no registered nodes so the AM can't be allocated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10675) Consolidate YARN-10672 and YARN-10447
Szilard Nemeth created YARN-10675: - Summary: Consolidate YARN-10672 and YARN-10447 Key: YARN-10675 URL: https://issues.apache.org/jira/browse/YARN-10675 Project: Hadoop YARN Issue Type: Bug Reporter: Szilard Nemeth Assignee: Szilard Nemeth Let's consolidate the solution applied for YARN-10672 and apply it to the code changes introduced with YARN-10447. Quoting [~pbacsko]: {quote} The solution is much straightforward than mine in YARN-10447. Actually we might consider applying this to TestLeafQueue with undoing my changes, because that's more complicated (I had no patience to go deeper with Mockito internal behavior, I just thought well, disable that thread and that's enough). {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10676) Improve code quality in TestTimelineAuthenticationFilterForV1
Szilard Nemeth created YARN-10676: - Summary: Improve code quality in TestTimelineAuthenticationFilterForV1 Key: YARN-10676 URL: https://issues.apache.org/jira/browse/YARN-10676 Project: Hadoop YARN Issue Type: Bug Reporter: Szilard Nemeth Assignee: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10677) Logger of SLSFairScheduler is provided with the wrong class
Szilard Nemeth created YARN-10677: - Summary: Logger of SLSFairScheduler is provided with the wrong class Key: YARN-10677 URL: https://issues.apache.org/jira/browse/YARN-10677 Project: Hadoop YARN Issue Type: Bug Reporter: Szilard Nemeth Assignee: Szilard Nemeth In SLSFairScheduler, the Logger definition looks like: https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L69 We need to fix this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10678) Try blocks without catch blocks in SLS scheduler classes can swallow other exceptions
Szilard Nemeth created YARN-10678: - Summary: Try blocks without catch blocks in SLS scheduler classes can swallow other exceptions Key: YARN-10678 URL: https://issues.apache.org/jira/browse/YARN-10678 Project: Hadoop YARN Issue Type: Bug Reporter: Szilard Nemeth Assignee: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10679) Better logging of uncaught exceptions throughout SLS
Szilard Nemeth created YARN-10679: - Summary: Better logging of uncaught exceptions throughout SLS Key: YARN-10679 URL: https://issues.apache.org/jira/browse/YARN-10679 Project: Hadoop YARN Issue Type: Bug Reporter: Szilard Nemeth Assignee: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10680) CLONE - Better logging of uncaught exceptions throughout SLS
Szilard Nemeth created YARN-10680: - Summary: CLONE - Better logging of uncaught exceptions throughout SLS Key: YARN-10680 URL: https://issues.apache.org/jira/browse/YARN-10680 Project: Hadoop YARN Issue Type: Bug Reporter: Szilard Nemeth Assignee: Szilard Nemeth In our internal environment, there was a test failure while running SLS tests with Jenkins. It's difficult to align the uncaught exceptions (in this case an NPE) and the log itself as the exception is logged with {{e.printStackTrace()}}. This jira is to replace printStackTrace calls in SLS with {{LOG.error("msg", exception)}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10681) Fix assertion failure message in BaseSLSRunnerTest
Szilard Nemeth created YARN-10681: - Summary: Fix assertion failure message in BaseSLSRunnerTest Key: YARN-10681 URL: https://issues.apache.org/jira/browse/YARN-10681 Project: Hadoop YARN Issue Type: Bug Reporter: Szilard Nemeth Assignee: Szilard Nemeth There is this failure message: https://github.com/apache/hadoop/blob/a89ca56a1b0eb949f56e7c6c5c25fdf87914a02f/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/BaseSLSRunnerTest.java#L129-L130 "catched" should be replaced with "caught". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10736) Fix GetApplicationsRequest JavaDoc
[ https://issues.apache.org/jira/browse/YARN-10736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-10736. --- Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > Fix GetApplicationsRequest JavaDoc > -- > > Key: YARN-10736 > URL: https://issues.apache.org/jira/browse/YARN-10736 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Gergely >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > getName and setName javadoc comments are mixed up -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10766) [UI2] Bump moment-timezone to 0.5.33
[ https://issues.apache.org/jira/browse/YARN-10766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-10766. --- Hadoop Flags: Reviewed Resolution: Fixed > [UI2] Bump moment-timezone to 0.5.33 > > > Key: YARN-10766 > URL: https://issues.apache.org/jira/browse/YARN-10766 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn, yarn-ui-v2 >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Fix For: 3.4.0 > > Attachments: UI2_Correct_Timezone_After_Bump.png, > UI2_Wrong_Timezone_Before_Bump.png, YARN-10766.001.patch > > > A handful of timezone related fixes were added into 0.5.33 release of > moment-timezone. An example for a scenario in which current UI2 behaviour is > not correct is a user from Australia, where the submission time showed on UI2 > is one hour ahead of the actual time. > Unfortunately moment-timezone data range files have been renamed, which is a > breaking change from the point of view of emberjs. Including all timezones > will increase the overall size of UI2 by an additional ~6 kbs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10787) Queue submit ACL check is wrong when CS queue is ambiguous
Szilard Nemeth created YARN-10787: - Summary: Queue submit ACL check is wrong when CS queue is ambiguous Key: YARN-10787 URL: https://issues.apache.org/jira/browse/YARN-10787 Project: Hadoop YARN Issue Type: Bug Reporter: Szilard Nemeth Assignee: Gergely Pollak Let's suppose we have a Capacity Scheduler configuration with 2 or more leaf queues with the same name in the queue hierarchy. That's what we call an ambiguous queue name. Let's also enable ACL checks and define acl_submit_applications / acl_administer_queue configs with the correct value, adding the username to the ACL value there. Here's a minimalistic YARN + CS config: 1. YARN config snippet: {code} yarn.acl.enabletrue {code} 2. CS config snippet: {code} yarn.scheduler.capacity.root.someparent1.queues anyotherqueue1,somequeue,anyotherqueue2 yarn.scheduler.capacity.root.someparent2.queues anyotherqueue3,somequeue,anyotherqueue4 yarn.scheduler.capacity.root.someparent1.somequeue.acl_submit_applications someuser1 yarn.scheduler.capacity.root.someparent2.somequeue.acl_submit_applications someuser1 yarn.scheduler.capacity.root.someparent1.somequeue.acl_administer_queue someuser1 yarn.scheduler.capacity.root.someparent2.somequeue.acl_administer_queue someuser1 {code} So in this case, we have an ambiguous queue named "somequeue" under 2 different paths: - root.someparent1.somequeue - root.someparent2.somequeue When a user submits an application correctly with the full queue path e.g. root.someparent1.somequeue, YARN will still fail to place the application to that queue and will use the short name. 3. LOG SNIPPET {code} 2021-05-20 22:04:32,031 DEBUG org.apache.hadoop.yarn.server.resourcemanager.placement.CSMappingPlacementRule: Placement final result 'root.someparent1.somequeue' for application 'application_1621540945412_0001' 2021-05-20 22:04:32,031 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Placed application with ID application_1621540945412_0001 in queue: somequeue, original submission queue was: root.someparent1.somequeue 2021-05-20 22:04:32,031 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Ambiguous queue reference: somequeue please use full queue path instead. 2021-05-20 22:04:32,031 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application 'application_1621540945412_0001' is submitted without priority hence considering default queue/cluster priority: 0 2021-05-20 22:04:32,032 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Priority '0' is acceptable in queue : somequeue for application: application_1621540945412_0001 2021-05-20 22:04:32,993 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Exception in submitting application_1621540945412_0001 org.apache.hadoop.yarn.exceptions.YarnException: org.apache.hadoop.security.AccessControlException: User someuser1 does not have permission to submit application_1621540945412_0001 to queue somequeue at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38) {code} 4. FULL STACKTRACE: {code} org.apache.hadoop.yarn.exceptions.YarnException: org.apache.hadoop.security.AccessControlException: User someuser1 does not have permission to submit application_1621540945412_0001 to queue somequeue at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:433) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:330) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:650) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) a
[jira] [Created] (YARN-10797) Logging parameter issues in scheduler package
Szilard Nemeth created YARN-10797: - Summary: Logging parameter issues in scheduler package Key: YARN-10797 URL: https://issues.apache.org/jira/browse/YARN-10797 Project: Hadoop YARN Issue Type: Bug Reporter: Szilard Nemeth Assignee: Szilard Nemeth 1. There is a LOG.error call in org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy#editSchedule that provides logging arguments without a placeholder in the message. {code} LOG.error("Failed to reload capacity scheduler config file - " + "will use existing conf.", e.getMessage()); {code} 2. There is a LOG.debug call in org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp#moveReservation that has a placeholder in the logging message but the argument is an instance of Throwable so the message does not require a placeholder. {code} } catch (IllegalStateException e) { LOG.debug("Reserve on target node failed, e={}", e); return false; } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10798) Enhancements in RMAppManager: createAndPopulateNewRMApp and copyPlacementQueueToSubmissionContext
Szilard Nemeth created YARN-10798: - Summary: Enhancements in RMAppManager: createAndPopulateNewRMApp and copyPlacementQueueToSubmissionContext Key: YARN-10798 URL: https://issues.apache.org/jira/browse/YARN-10798 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth Assignee: Szilard Nemeth As a follow-up of YARN-10787, we need to do the following: 1. Rename RMAppManager#copyPlacementQueueToSubmissionContext: This method not really copies anything, it simply overrides the queue value. 2. Add Debug log to print csqueue object before the authorization code: [Code block|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L459-L475] 3. Fix log messages: As 'copyPlacementQueueToSubmissionContext' overrides (not copies) the original queue name with the queue name from the PlacementContext, all calls to submissionContext.getQueue() will return the short queue name. This results in very misleading log messages as well, including the exception message itself: {code} org.apache.hadoop.yarn.exceptions.YarnException: org.apache.hadoop.security.AccessControlException: User someuser1 does not have permission to submit application_1621540945412_0001 to queue somequeue {code} All log messages should print the original submission queue, if possible. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10799) Follow up of YARN-10787: Eliminate queue name replacement in ApplicationSubmissionContext based on placement context
Szilard Nemeth created YARN-10799: - Summary: Follow up of YARN-10787: Eliminate queue name replacement in ApplicationSubmissionContext based on placement context Key: YARN-10799 URL: https://issues.apache.org/jira/browse/YARN-10799 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth Assignee: Szilard Nemeth This is the long-term fix for YARN-10787: The task is to investigate if it's possible to eliminate RMAppManager#copyPlacementQueueToSubmissionContext. This could introduce nasty backward incompatible issues with recovery, so it should be thought through really carefully. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10849) Clarify testcase documentation for TestServiceAM#testContainersReleasedWhenPreLaunchFails
Szilard Nemeth created YARN-10849: - Summary: Clarify testcase documentation for TestServiceAM#testContainersReleasedWhenPreLaunchFails Key: YARN-10849 URL: https://issues.apache.org/jira/browse/YARN-10849 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth Assignee: Szilard Nemeth There's a small comment added to testcase: org.apache.hadoop.yarn.service.TestServiceAM#testContainersReleasedWhenPreLaunchFails: {code} // Test to verify that the containers are released and the // component instance is added to the pending queue when building the launch // context fails. {code} However, it was not clear for me why the "launch context" would fail. While the test passes, it throws an Exception that tells the story. {code} 2021-07-06 18:31:04,438 ERROR [pool-275-thread-1] containerlaunch.ContainerLaunchService (ContainerLaunchService.java:run(122)) - [COMPINSTANCE compa-0 : container_1625589063422_0001_01_01]: Failed to launch container. java.lang.IllegalArgumentException: Can not create a Path from a null string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:164) at org.apache.hadoop.fs.Path.(Path.java:180) at org.apache.hadoop.yarn.service.provider.tarball.TarballProviderService.processArtifact(TarballProviderService.java:39) at org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:144) at org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:107) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} This exception is thrown because the id of the Artifact object is unset (null) and TarballProviderService.processArtifact verifies it and it does not allow such artifacts. The aim of this jira is to add a clarification comment or javadoc to this method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10853) Add more tests to TestUsersManager
Szilard Nemeth created YARN-10853: - Summary: Add more tests to TestUsersManager Key: YARN-10853 URL: https://issues.apache.org/jira/browse/YARN-10853 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth Assignee: Szilard Nemeth Attachments: UsersManager.html Running TestUsersManager with code coverage measurements only gives 18% line coverage for class "UsersManager". This value is pretty low. See the attached coverage report for that class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9551) TestTimelineClientV2Impl.testSyncCall fails intermittently
[ https://issues.apache.org/jira/browse/YARN-9551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-9551. -- Hadoop Flags: Reviewed Resolution: Fixed > TestTimelineClientV2Impl.testSyncCall fails intermittently > -- > > Key: YARN-9551 > URL: https://issues.apache.org/jira/browse/YARN-9551 > Project: Hadoop YARN > Issue Type: Bug > Components: ATSv2, test >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Andras Gyori >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.1.5 > > Time Spent: 40m > Remaining Estimate: 0h > > TestTimelineClientV2Impl.testSyncCall fails intermittent > {code:java} > Failed > org.apache.hadoop.yarn.client.api.impl.TestTimelineClientV2Impl.testSyncCall > Failing for the past 1 build (Since #24083 ) > Took 1.5 sec. > Error Message > TimelineEntities not published as desired expected:<3> but was:<4> > Stacktrace > java.lang.AssertionError: TimelineEntities not published as desired > expected:<3> but was:<4> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at > org.apache.hadoop.yarn.client.api.impl.TestTimelineClientV2Impl.testSyncCall(TestTimelineClientV2Impl.java:251) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > Standard Output > 2019-05-13 15:33:46,596 WARN [main] util.NativeCodeLoader > (NativeCodeLoader.java:(60)) - Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 2019-05-13 15:33:47,763 INFO [main] impl.TestTimelineClientV2Impl > (TestTimelineClientV2Impl.java:printReceivedEntities(413)) - Entities > Published @ index 0 : 1, > 2019-05-13 15:33:47,764 INFO [main] impl.TestTimelineClientV2Impl > (TestTimelineClientV2Impl.java:printReceivedEntities(413)) - Entities > Published @ index 1 : 2, > 2019-05-13 15:33:47,764 INFO [main] impl.Te
[jira] [Resolved] (YARN-6221) Entities missing from ATS when summary log file info got returned to the ATS before the domain log
[ https://issues.apache.org/jira/browse/YARN-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-6221. -- Hadoop Flags: Reviewed Resolution: Fixed > Entities missing from ATS when summary log file info got returned to the ATS > before the domain log > -- > > Key: YARN-6221 > URL: https://issues.apache.org/jira/browse/YARN-6221 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sushmitha Sreenivasan >Assignee: Xiaomin Zhang >Priority: Critical > Fix For: 3.4.0, 3.2.3, 3.3.2, 3.1.5 > > Attachments: YARN-6221.02.patch, YARN-6221.02.patch, > YARN-6221.branch-3.1.001.patch, YARN-6221.branch-3.2.001.patch, > YARN-6221.branch-3.3.001.patch, YARN-6221.branch-3.3.002.patch, > YARN-6221.patch, YARN-6221.patch > > > Events data missing for the following entities: > REQUEST: > {code:java} > curl -k --negotiate -u: > http://:8188/ws/v1/timeline/TEZ_APPLICATION_ATTEMPT/tez_appattempt_1487706062210_0012_01 > {code} > RESPONSE: > {code:java} > {"events":[],"entitytype":"TEZ_APPLICATION_ATTEMPT","entity":"tez_appattempt_1487706062210_0012_01","starttime":1487711606077,"domain":"Tez_ATS_application_1487706062210_0012","relatedentities":{"TEZ_DAG_ID":["dag_1487706062210_0012_2","dag_1487706062210_0012_1"]},"primaryfilters":{},"otherinfo":{}} > {code} > LOGS: > {code:title=Timeline Server log entry} > WARN timeline.TimelineDataManager > (TimelineDataManager.java:doPostEntities(366)) - Skip the timeline entity: { > id: tez_application_1487706062210_0012, type: TEZ_APPLICATION } > org.apache.hadoop.yarn.exceptions.YarnException: Domain information of the > timeline entity { id: tez_application_1487706062210_0012, type: > TEZ_APPLICATION } doesn't exist. > at > org.apache.hadoop.yarn.server.timeline.security.TimelineACLsManager.checkAccess(TimelineACLsManager.java:122) > at > org.apache.hadoop.yarn.server.timeline.TimelineDataManager.doPostEntities(TimelineDataManager.java:356) > at > org.apache.hadoop.yarn.server.timeline.TimelineDataManager.postEntities(TimelineDataManager.java:316) > at > org.apache.hadoop.yarn.server.timeline.EntityLogInfo.doParse(LogInfo.java:204) > at > org.apache.hadoop.yarn.server.timeline.LogInfo.parsePath(LogInfo.java:156) > at > org.apache.hadoop.yarn.server.timeline.LogInfo.parseForStore(LogInfo.java:113) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore$AppLogs.parseSummaryLogs(EntityGroupFSTimelineStore.java:682) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore$AppLogs.parseSummaryLogs(EntityGroupFSTimelineStore.java:657) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore$ActiveLogParser.run(EntityGroupFSTimelineStore.java:870) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10874) Refactor NM ContainerLaunch#getEnvDependencies's unit tests
[ https://issues.apache.org/jira/browse/YARN-10874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-10874. --- Hadoop Flags: Reviewed Resolution: Fixed > Refactor NM ContainerLaunch#getEnvDependencies's unit tests > --- > > Key: YARN-10874 > URL: https://issues.apache.org/jira/browse/YARN-10874 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > The YARN-10355 ticket states that the unit tests contains repeated code and > the test methods are too long. We decided to split that ticket into two > parts. The YARN-10355 will contain only the production code change (for the > windows variant, the linux variant refactor is not feasible with regex, the > original code is not the nicest, but it does it's thing). > > Acceptance criteria: > * refactor the unit tests (e.g.: parameterised tests) > * extend the tests with extra checks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10877) SLSSchedulerCommons: Consider using application map from AbstractYarnScheduler and make event handling more consistent
Szilard Nemeth created YARN-10877: - Summary: SLSSchedulerCommons: Consider using application map from AbstractYarnScheduler and make event handling more consistent Key: YARN-10877 URL: https://issues.apache.org/jira/browse/YARN-10877 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth Assignee: Szilard Nemeth This is a follow-up of YARN-10552. The improvements and things to check are coming from [this comment|https://issues.apache.org/jira/browse/YARN-10552?focusedCommentId=17277991&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17277991]. {quote} appQueueMap was not present in SLSFairScheduler before (it was in SLSCapacityScheduler) however from https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L163, it seems that the super class of the schedulers - https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java#L159 has this already. As such, do we really need to define a new map as a common map at all in SLSSchedulerCommons or can we somehow reuse the super class's map? It might need some code updates though. In regards to the above point, considering SLSFairScheduler did not previously have any of the following code in handle() method: {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10882) Fix branch-3.1 build: zstd library is missing from the Dockerfile
[ https://issues.apache.org/jira/browse/YARN-10882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-10882. --- Hadoop Flags: Reviewed Resolution: Fixed > Fix branch-3.1 build: zstd library is missing from the Dockerfile > - > > Key: YARN-10882 > URL: https://issues.apache.org/jira/browse/YARN-10882 > Project: Hadoop YARN > Issue Type: Bug > Components: build >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > The branch-3.1 did not build on the Jenkins slave, because the zstd is > missing from the Dockerfile. > > {code:java} > [INFO] --- hadoop-maven-plugins:3.1.5-SNAPSHOT:cmake-compile (cmake-compile) > @ hadoop-common --- > [INFO] Running cmake > /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-3286/src/hadoop-common-project/hadoop-common/src > > -DGENERATED_JAVAH=/home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-3286/src/hadoop-common-project/hadoop-common/target/native/javah > -DJVM_ARCH_DATA_MODEL=64 -DREQUIRE_BZIP2=false -DREQUIRE_ISAL=false > -DREQUIRE_OPENSSL=true -DREQUIRE_SNAPPY=true -DREQUIRE_ZSTD=true -G Unix > Makefiles > [INFO] with extra environment variables {} > [WARNING] -- The C compiler identification is GNU 7.5.0 > [WARNING] -- The CXX compiler identification is GNU 7.5.0 > [WARNING] -- Check for working C compiler: /usr/bin/cc > [WARNING] -- Check for working C compiler: /usr/bin/cc -- works > [WARNING] -- Detecting C compiler ABI info > [WARNING] -- Detecting C compiler ABI info - done > [WARNING] -- Detecting C compile features > [WARNING] -- Detecting C compile features - done > [WARNING] -- Check for working CXX compiler: /usr/bin/c++ > [WARNING] -- Check for working CXX compiler: /usr/bin/c++ -- works > [WARNING] -- Detecting CXX compiler ABI info > [WARNING] -- Detecting CXX compiler ABI info - done > [WARNING] -- Detecting CXX compile features > [WARNING] -- Detecting CXX compile features - done > [WARNING] -- Looking for pthread.h > [WARNING] -- Looking for pthread.h - found > [WARNING] -- Looking for pthread_create > [WARNING] -- Looking for pthread_create - not found > [WARNING] -- Looking for pthread_create in pthreads > [WARNING] -- Looking for pthread_create in pthreads - not found > [WARNING] -- Looking for pthread_create in pthread > [WARNING] -- Looking for pthread_create in pthread - found > [WARNING] -- Found Threads: TRUE > [WARNING] JAVA_HOME=, > JAVA_JVM_LIBRARY=/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so > [WARNING] JAVA_INCLUDE_PATH=/usr/lib/jvm/java-8-openjdk-amd64/include, > JAVA_INCLUDE_PATH2=/usr/lib/jvm/java-8-openjdk-amd64/include/linux > [WARNING] Located all JNI components successfully. > [WARNING] -- Found JNI: > /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libjawt.so > [WARNING] -- Found ZLIB: /lib/x86_64-linux-gnu/libz.so.1 (found version > "1.2.11") > [WARNING] -- Found Snappy: /usr/lib/x86_64-linux-gnu/libsnappy.so.1 > [WARNING] CMake Error at CMakeLists.txt:120 (MESSAGE): > [WARNING] Required zstandard library could not be found. > [WARNING] ZSTD_LIBRARY=/usr/lib/x86_64-linux-gnu/libzstd.so.1, > ZSTD_INCLUDE_DIR=, > [WARNING] CUSTOM_ZSTD_INCLUDE_DIR=, CUSTOM_ZSTD_PREFIX=, > CUSTOM_ZSTD_INCLUDE= {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10886) Cluster based and parent based max capacity in Capacity Scheduler
Szilard Nemeth created YARN-10886: - Summary: Cluster based and parent based max capacity in Capacity Scheduler Key: YARN-10886 URL: https://issues.apache.org/jira/browse/YARN-10886 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth We want to introduce the percentage modes relative to the cluster, not the parent, i.e The property root.users.maximum-capacity will mean one of the following things: *Either Parent Percentage:* maximum capacity relative to its parent. If it’s set to 50, then it means that the capacity is capped with respect to the parent. This can be covered by the current format, no change there. *Or Cluster Percentage:* maximum capacity expressed as a percentage of the overall cluster capacity. This case is the new scenario, for example: yarn.scheduler.capacity.root.users.max-capacity = c:50% yarn.scheduler.capacity.root.users.max-capacity = c:50%, c:30% -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10887) Investigation: Decouple capacity and max-capacity modes
Szilard Nemeth created YARN-10887: - Summary: Investigation: Decouple capacity and max-capacity modes Key: YARN-10887 URL: https://issues.apache.org/jira/browse/YARN-10887 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth Currently Fair Scheduler supports the following 3 kinds of settings: * Single percentage (relative to parent) i.e. "X%" * A set of percentages (relative to parent) i.e. "X% cpu, Y% memory" * Absolute resources i.e. "X mb, Y vcores" Please note, that the new, recommended format does not support the single percentage mode, only the last 2, like: “vcores=X, memory-mb=Y” or “vcores=X%, memory-mb=Y%” respectively. It is recommended that all three formats are supported for maximum-capacity in CS after introducing weight mode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10888) [Umbrella] New capacity modes for CS
Szilard Nemeth created YARN-10888: - Summary: [Umbrella] New capacity modes for CS Key: YARN-10888 URL: https://issues.apache.org/jira/browse/YARN-10888 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10889) [Umbrella] Flexible Auto Queue Creation in Capacity Scheduler - Tech debts
Szilard Nemeth created YARN-10889: - Summary: [Umbrella] Flexible Auto Queue Creation in Capacity Scheduler - Tech debts Key: YARN-10889 URL: https://issues.apache.org/jira/browse/YARN-10889 Project: Hadoop YARN Issue Type: Improvement Reporter: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10505) Extend the maximum-capacity property to support Fair Scheduler migration
[ https://issues.apache.org/jira/browse/YARN-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-10505. --- Resolution: Duplicate > Extend the maximum-capacity property to support Fair Scheduler migration > > > Key: YARN-10505 > URL: https://issues.apache.org/jira/browse/YARN-10505 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > > Currently Fair Scheduler supports the following 3 kinds of settings: > * Single percentage (relative to parent) i.e. "X%" > * A set of percentages (relative to parent) i.e. "X% cpu, Y% memory" > * Absolute resources i.e. "X mb, Y vcores" > Please note, that the new, recommended format does not support the single > percentage mode, only the last 2, like: “vcores=X, memory-mb=Y” or > “vcores=X%, memory-mb=Y%” respectively. > Tasks to accomplish: > # It is recommended that all three formats are supported for > maximum-capacity in CS after introducing weight mode. > # Also we want to introduce the percentage modes relative to the cluster, > not the parent, i.e The property root.users.maximum-capacity will mean one of > the following things: > ## Either Parent Percentage: maximum capacity relative to its parent. If > it’s set to 50, then it means that the capacity is capped with respect to the > parent. This can be covered by the current format, no change there. > ## Or Cluster Percentage: maximum capacity expressed as a percentage of the > overall cluster capacity. This case is the new scenario, for example: > {{yarn.scheduler.capacity.root.users.max-capacity = c:50%}} > {{yarn.scheduler.capacity.root.users.max-capacity = c:50%, c:30%}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9904) Investigate how resource allocation configuration could be more consistent in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-9904. -- Resolution: Duplicate Duplicate of YARN-10888 > Investigate how resource allocation configuration could be more consistent in > CapacityScheduler > --- > > Key: YARN-9904 > URL: https://issues.apache.org/jira/browse/YARN-9904 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Gergely Pollák >Priority: Major > > It would be nice if everywhere where a capacity can be defined could be > defined the same way: > * With fixed amounts (eg 1GB memory, 8 vcores, 3 GPU) > * With percentages > ** Percentage of all resources (eg 10% of all memory, vcore, GPU) > ** Percentage per resource type (eg 10% memory, 25% vcore, 50% GPU) > We need to determine all configuration options where capacities can be > defined, and see if it is possible to extend the configuration, or if it > makes sense in that case. > The outcome is a proposal for all the configurations which could/should be > changed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10891) Extend QueueInfo with max-parallel-apps in CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-10891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-10891. --- Hadoop Flags: Reviewed Resolution: Fixed > Extend QueueInfo with max-parallel-apps in CapacityScheduler > > > Key: YARN-10891 > URL: https://issues.apache.org/jira/browse/YARN-10891 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Add max-parallel-apps to the Cluster Scheduler API's > [response|[https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API]] > and extend the Yarn-API's QueueInfoProto with the max-parallel-apps property. > > The REST api can be tested with: > {code:java} > curl "http://localhost:8088/ws/v1/cluster/scheduler"; | jq {code} > > The protobuf api can be tested with the yarn client: > {code:java} > yarn queue --status root.queue.foo > Queue Information : > Queue Name : foo > Queue Path : root.queue.foo > State : RUNNING > Capacity : 75.00% > Current Capacity : .00% > Maximum Capacity : 100.00% > Weight : -1.00 > Maximum Parallel Apps : 9 > Default Node Label expression : > Accessible Node Labels : * > Preemption : disabled > Intra-queue Preemption : disabled {code} > > About the max-parallel-apps: > Maximum number of applications that can run at the same time. Unlike to > {{maximum-applications}}, application submissions are _not_ rejected when > this limit is reached. Instead they stay in {{ACCEPTED}} state until they are > eligible to run. This can be set for all queues with > {{yarn.scheduler.capacity.max-parallel-apps}} and can also be overridden on a > per queue basis by setting > {{yarn.scheduler.capacity..max-parallel-apps}}. Integer value is > expected. By default, there is no limit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10904) Investigate: Remove unnecessary fields from AbstractCSQueue
Szilard Nemeth created YARN-10904: - Summary: Investigate: Remove unnecessary fields from AbstractCSQueue Key: YARN-10904 URL: https://issues.apache.org/jira/browse/YARN-10904 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10905) Investigate if AbstractCSQueue#configuredNodeLabels vs. QueueCapacities#getExistingNodeLabels holds the same data
Szilard Nemeth created YARN-10905: - Summary: Investigate if AbstractCSQueue#configuredNodeLabels vs. QueueCapacities#getExistingNodeLabels holds the same data Key: YARN-10905 URL: https://issues.apache.org/jira/browse/YARN-10905 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth The task is to investigate the field AbstractCSQueue#configuredNodeLabels holds the same data or not with QueueCapacities#getExistingNodeLabels. Obviously, we don't want double-entry bookkeeping so if the data is the same, we can remove this or that. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10906) Create QueueConfig object for generic queue-specifiec fields
Szilard Nemeth created YARN-10906: - Summary: Create QueueConfig object for generic queue-specifiec fields Key: YARN-10906 URL: https://issues.apache.org/jira/browse/YARN-10906 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth This is about config fields in AbstractCSQueue. Document if a config is only coming from the Configuration object or being altered or used for other purposes. Also, restrict the visibilty and surface of modifcation from subclasses as much as we can. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10907) Investigate: Minimize usages of AbstractCSQueue#csContext
Szilard Nemeth created YARN-10907: - Summary: Investigate: Minimize usages of AbstractCSQueue#csContext Key: YARN-10907 URL: https://issues.apache.org/jira/browse/YARN-10907 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth Context objects can be a sign of a code smell as they can contain many, possible loosely related references to other objects. CapacitySchedulerContext seems like this. This task is to investigate how the field AbstractCSQueue#csContext is being used from this class and possibly keeping the usage of this context class on the bare minimum. Related article: https://wiki.c2.com/?ContextObjectsAreEvil -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10908) Investigate: Why AbstractCSQueue#authorizer is constructed for each queue
Szilard Nemeth created YARN-10908: - Summary: Investigate: Why AbstractCSQueue#authorizer is constructed for each queue Key: YARN-10908 URL: https://issues.apache.org/jira/browse/YARN-10908 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth AbstractCSQueue#hasAccess checks if a certain user with an ACL has permission to submit an app to the queue. Checking the permission itself is performed by calling ConfiguredYarnAuthorizer#checkPermission. Interestingly, all queue objects have a reference to a YarnAuthorizationProvider instance. What looks weird is how the authorizer is initialized: https://github.com/apache/hadoop/blob/ac0a4e7f589e7280268013c56339b3b257d332a0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java#L428 It just calls YarnAuthorizationProvider.getInstance with the Configuration object as an argument so actually, all queue objects have an instance constructed with the same configuration, and the getInstance method does not gather any queue-specific configuration value from the object so this is a waste of memory. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10909) AbstractCSQueue: Check for methods added for test code but not annotated with VisibleForTesting
Szilard Nemeth created YARN-10909: - Summary: AbstractCSQueue: Check for methods added for test code but not annotated with VisibleForTesting Key: YARN-10909 URL: https://issues.apache.org/jira/browse/YARN-10909 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth For example, AbstractCSQueue#setMaxCapacity(float) is only used for testing, but not annotated. There can be other methods in this class like this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10910) AbstractCSQueue#setupQueueConfigs: Separate validation logic from initialization logic
Szilard Nemeth created YARN-10910: - Summary: AbstractCSQueue#setupQueueConfigs: Separate validation logic from initialization logic Key: YARN-10910 URL: https://issues.apache.org/jira/browse/YARN-10910 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth AbstractCSQueue#setupQueueConfigs contains initialization + validation logic. The task is to factor out validation logic from this method to a separate method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10911) AbstractCSQueue: Create a separate class for usernames and weights that are travelling in a Map
Szilard Nemeth created YARN-10911: - Summary: AbstractCSQueue: Create a separate class for usernames and weights that are travelling in a Map Key: YARN-10911 URL: https://issues.apache.org/jira/browse/YARN-10911 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth Related methods that are using the Map: AbstractCSQueue#getUserWeightsFromHierarchy CapacitySchedulerConfiguration#getAllUserWeightsForQueue -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10912) AbstractCSQueue#updateConfigurableResourceRequirement: Separate validation logic from initialization logic
Szilard Nemeth created YARN-10912: - Summary: AbstractCSQueue#updateConfigurableResourceRequirement: Separate validation logic from initialization logic Key: YARN-10912 URL: https://issues.apache.org/jira/browse/YARN-10912 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth AbstractCSQueue#updateConfigurableResourceRequirement contains initialization + validation logic. The task is to factor out validation logic from this method to a separate method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10913) AbstractCSQueue: Group preemption methods and fields into a separate class
Szilard Nemeth created YARN-10913: - Summary: AbstractCSQueue: Group preemption methods and fields into a separate class Key: YARN-10913 URL: https://issues.apache.org/jira/browse/YARN-10913 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth Relevant methods: isQueueHierarchyPreemptionDisabled, isIntraQueueHierarchyPreemptionDisabled, getTotalKillableResource, getKillableContainers -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10914) Simplify duplicated code for tracking ResourceUsage
Szilard Nemeth created YARN-10914: - Summary: Simplify duplicated code for tracking ResourceUsage Key: YARN-10914 URL: https://issues.apache.org/jira/browse/YARN-10914 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth Alternatively, those could be moved to some computation class, too. Relevant methods: incReservedResource, decReservedResource, incPendingResource, decPendingResource, incUsedResource, decUsedResource -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10915) AbstractCSQueue: Simplify complex logic in methods: deriveCapacityFromAbsoluteConfigurations and updateEffectiveResources
Szilard Nemeth created YARN-10915: - Summary: AbstractCSQueue: Simplify complex logic in methods: deriveCapacityFromAbsoluteConfigurations and updateEffectiveResources Key: YARN-10915 URL: https://issues.apache.org/jira/browse/YARN-10915 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10917) Investigate and simplify CapacitySchedulerConfigValidator#validateQueueHierarchy
Szilard Nemeth created YARN-10917: - Summary: Investigate and simplify CapacitySchedulerConfigValidator#validateQueueHierarchy Key: YARN-10917 URL: https://issues.apache.org/jira/browse/YARN-10917 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10916) Investigate and simplify GuaranteedOrZeroCapacityOverTimePolicy#computeQueueManagementChanges
Szilard Nemeth created YARN-10916: - Summary: Investigate and simplify GuaranteedOrZeroCapacityOverTimePolicy#computeQueueManagementChanges Key: YARN-10916 URL: https://issues.apache.org/jira/browse/YARN-10916 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10918) Simplify code of method: CapacitySchedulerQueueManager#parseQueue
Szilard Nemeth created YARN-10918: - Summary: Simplify code of method: CapacitySchedulerQueueManager#parseQueue Key: YARN-10918 URL: https://issues.apache.org/jira/browse/YARN-10918 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10919) Remove LeafQueue#scheduler field
Szilard Nemeth created YARN-10919: - Summary: Remove LeafQueue#scheduler field Key: YARN-10919 URL: https://issues.apache.org/jira/browse/YARN-10919 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth As it is the same object as AbstractCSQueue#csContext (from parent class). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10920) Created a dedicated class for Node Labels
Szilard Nemeth created YARN-10920: - Summary: Created a dedicated class for Node Labels Key: YARN-10920 URL: https://issues.apache.org/jira/browse/YARN-10920 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth In the current codebase, Node labels are simple strings. It's very error-prone to use Strings as it can contain basically anything. Moreover, it's easier to keep track of all usages if we have a dedicated class for Node labels. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10921) AbstractCSQueue: Node Labels logic is scattered and iteration logic is repeated all over the place
Szilard Nemeth created YARN-10921: - Summary: AbstractCSQueue: Node Labels logic is scattered and iteration logic is repeated all over the place Key: YARN-10921 URL: https://issues.apache.org/jira/browse/YARN-10921 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth Assignee: Szilard Nemeth TODO items: - Check original Node labels epic / jiras? - Think about ways to improve repetitive iteration on configuredNodeLabels - Search for: "String label" in code Code blocks to handle Node labels: - AbstractCSQueue#setupQueueConfigs - AbstractCSQueue#getQueueConfigurations - AbstractCSQueue#accessibleToPartition - AbstractCSQueue#getNodeLabelsForQueue - AbstractCSQueue#updateAbsoluteCapacities - AbstractCSQueue#updateConfigurableResourceRequirement - CSQueueUtils#loadCapacitiesByLabelsFromConf - AutoCreatedLeafQueue -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10922) Investigation: Verify if legacy AQC works as documented
Szilard Nemeth created YARN-10922: - Summary: Investigation: Verify if legacy AQC works as documented Key: YARN-10922 URL: https://issues.apache.org/jira/browse/YARN-10922 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth Quoting from the Capacity Scheduler documentation: https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html Section: "Dynamic Auto-Creation and Management of Leaf Queues" The task is to verify if legacy AQC works like this: {quote} The parent queue which has been enabled for auto leaf queue creation, supports the configuration of template parameters for automatic configuration of the auto-created leaf queues. The auto-created queues support all of the leaf queue configuration parameters except for Queue ACL, Absolute Resource configurations. Queue ACLs are currently inherited from the parent queue i.e they are not configurable on the leaf queue template {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10923) Investigate if creating separate classes for Dynamic Leaf / Dynamic Parent queues makes sense
Szilard Nemeth created YARN-10923: - Summary: Investigate if creating separate classes for Dynamic Leaf / Dynamic Parent queues makes sense Key: YARN-10923 URL: https://issues.apache.org/jira/browse/YARN-10923 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth First, create 2 new classes: DynamicLeaf / DynamicParent. Then, gradually move AQC functionality from ManagedParentQueue / AutoCreatedLeafQueue. Revisit if AbstractManagedParentQueue makes sense at all. ManagedParent / Parent: Is there an actual need for the two classes? - Currently the two different parents can cause confusion and chaos - Can be a “back two the drawing board” task The ultimate goal is to have a common class for AQC-enabled parent and investigate if separate class for AutoCreatedLeafQueue is required. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10924) Clean up CapacityScheduler#initScheduler
Szilard Nemeth created YARN-10924: - Summary: Clean up CapacityScheduler#initScheduler Key: YARN-10924 URL: https://issues.apache.org/jira/browse/YARN-10924 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth The task is to define methods in order to initialize related fields together and call these method from initScheduler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10925) Simplify AbstractCSQueue#setupQueueConfigs
Szilard Nemeth created YARN-10925: - Summary: Simplify AbstractCSQueue#setupQueueConfigs Key: YARN-10925 URL: https://issues.apache.org/jira/browse/YARN-10925 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10926) Test validation after YARN-10504 and YARN-10506: Check if modified test expectations are correct or not
Szilard Nemeth created YARN-10926: - Summary: Test validation after YARN-10504 and YARN-10506: Check if modified test expectations are correct or not Key: YARN-10926 URL: https://issues.apache.org/jira/browse/YARN-10926 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth YARN-10504 and YARN-10506 modified some test expectations. The task is to verify if those expectations are correct. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10927) Explain assertion literals in testcases of CapacityScheduler and related test classes
Szilard Nemeth created YARN-10927: - Summary: Explain assertion literals in testcases of CapacityScheduler and related test classes Key: YARN-10927 URL: https://issues.apache.org/jira/browse/YARN-10927 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth In the existing tests the assertion literals could be explained for easier understanding As there are too many test classes, we can tackle this more easily in a feature by feature fashion -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10929) Refrain from creating new Configuration object in AbstractManagedParentQueue#initializeLeafQueueConfigs
Szilard Nemeth created YARN-10929: - Summary: Refrain from creating new Configuration object in AbstractManagedParentQueue#initializeLeafQueueConfigs Key: YARN-10929 URL: https://issues.apache.org/jira/browse/YARN-10929 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth AbstractManagedParentQueue#initializeLeafQueueConfigs creates a new CapacitySchedulerConfiguration with templated configs only. We should stop doing this. Also, there is a sorting of config keys in this method, but in the end the configs are added to the Configuration object which is an enhanced Map. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10901) Permission checking error on an existing directory in LogAggregationFileController#verifyAndCreateRemoteLogDir
[ https://issues.apache.org/jira/browse/YARN-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-10901. --- Hadoop Flags: Reviewed Resolution: Fixed > Permission checking error on an existing directory in > LogAggregationFileController#verifyAndCreateRemoteLogDir > -- > > Key: YARN-10901 > URL: https://issues.apache.org/jira/browse/YARN-10901 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.2.2, 3.3.1 >Reporter: Tamas Domok >Assignee: Tamas Domok >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > *LogAggregationFileController.verifyAndCreateRemoteLogDir* tries to check > whether the remote file system has set/modify permissions on the > _yarn.nodemanager.remote-app-log-dir:_ > > {code:java} > //Check if FS has capability to set/modify permissions > try { > remoteFS.setPermission(qualified, new > FsPermission(TLDIR_PERMISSIONS)); > } catch (UnsupportedOperationException use) { > LOG.info("Unable to set permissions for configured filesystem since" > + " it does not support this", remoteFS.getScheme()); > fsSupportsChmod = false; > } catch (IOException e) { > LOG.warn("Failed to check if FileSystem suppports permissions on " > + "remoteLogDir [" + remoteRootLogDir + "]", e); > } {code} > But it will fail if the _yarn.nodemanager.remote-app-log-dir_'s owner is not > the same as the NodeManager's user. > > Example error > {code:java} > 2021-08-27 11:33:21,649 WARN > org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController: > Failed to check if FileSystem suppports permissions on remoteLogDir > [/tmp/logs]2021-08-27 11:33:21,649 WARN > org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController: > Failed to check if FileSystem suppports permissions on remoteLogDir > [/tmp/logs]org.apache.hadoop.security.AccessControlException: Permission > denied. user=yarn is not the owner of inode=/tmp/logs at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkOwner(FSPermissionChecker.java:464) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:407) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermissionWithContext(FSPermissionChecker.java:417) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:297) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1950) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1931) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1876) > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setPermission(FSDirAttrOp.java:64) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1976) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:858) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:548) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.la
[jira] [Resolved] (YARN-10852) Optimise CSConfiguration getAllUserWeightsForQueue
[ https://issues.apache.org/jira/browse/YARN-10852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-10852. --- Hadoop Flags: Reviewed Resolution: Fixed > Optimise CSConfiguration getAllUserWeightsForQueue > -- > > Key: YARN-10852 > URL: https://issues.apache.org/jira/browse/YARN-10852 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > CapacitySchedulerConfiguration#getAllUsersWeightsForQueue is called in a > O(n^2) fashion in AbstractCSQueue#setupQueueConfigs. This could be optimised > by incorporating the ConfigurationProperties introduced in YARN-10838. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10872) Replace getPropsWithPrefix calls in AutoCreatedQueueTemplate
[ https://issues.apache.org/jira/browse/YARN-10872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-10872. --- Hadoop Flags: Reviewed Resolution: Fixed > Replace getPropsWithPrefix calls in AutoCreatedQueueTemplate > > > Key: YARN-10872 > URL: https://issues.apache.org/jira/browse/YARN-10872 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Andras Gyori >Assignee: Benjamin Teke >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > With the introduction of YARN-10838, it is now possible to optimise > AutoCreatedQueueTemplate and replace calls of getPropsWithPrefix. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10908) Investigate: Why AbstractCSQueue#authorizer is constructed for each queue
[ https://issues.apache.org/jira/browse/YARN-10908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-10908. --- Resolution: Invalid > Investigate: Why AbstractCSQueue#authorizer is constructed for each queue > - > > Key: YARN-10908 > URL: https://issues.apache.org/jira/browse/YARN-10908 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Szilard Nemeth > Assignee: Szilard Nemeth >Priority: Minor > > AbstractCSQueue#hasAccess checks if a certain user with an ACL has permission > to submit an app to the queue. > Checking the permission itself is performed by calling > ConfiguredYarnAuthorizer#checkPermission. > Interestingly, all queue objects have a reference to a > YarnAuthorizationProvider instance. > What looks weird is how the authorizer is initialized: > https://github.com/apache/hadoop/blob/ac0a4e7f589e7280268013c56339b3b257d332a0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java#L428 > It just calls YarnAuthorizationProvider.getInstance with the Configuration > object as an argument so actually, all queue objects have an instance > constructed with the same configuration, and the getInstance method does not > gather any queue-specific configuration value from the object so this is a > waste of memory. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10942) CLONE - Investigate: Remove unnecessary fields from AbstractCSQueue or group fields by feature if possible
Szilard Nemeth created YARN-10942: - Summary: CLONE - Investigate: Remove unnecessary fields from AbstractCSQueue or group fields by feature if possible Key: YARN-10942 URL: https://issues.apache.org/jira/browse/YARN-10942 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth Assignee: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10912) AbstractCSQueue#updateConfigurableResourceRequirement: Separate validation logic from initialization logic
[ https://issues.apache.org/jira/browse/YARN-10912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-10912. --- Hadoop Flags: Reviewed Resolution: Fixed > AbstractCSQueue#updateConfigurableResourceRequirement: Separate validation > logic from initialization logic > -- > > Key: YARN-10912 > URL: https://issues.apache.org/jira/browse/YARN-10912 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Szilard Nemeth >Assignee: Tamas Domok >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > AbstractCSQueue#updateConfigurableResourceRequirement contains initialization > + validation logic. The task is to factor out validation logic from this > method to a separate method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10870) Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM Scheduler page
[ https://issues.apache.org/jira/browse/YARN-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-10870. --- Resolution: Fixed > Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM > Scheduler page > > > Key: YARN-10870 > URL: https://issues.apache.org/jira/browse/YARN-10870 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Siddharth Ahuja >Assignee: Gergely Pollák >Priority: Major > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Attachments: YARN-10870.001.patch, YARN-10870.002.patch, > YARN-10870.branch-3.1.002.patch, YARN-10870.branch-3.2.002.patch, > YARN-10870.branch-3.3.002.patch > > > Non-permissible users are (incorrectly) able to view application submitted by > another user on the RM's Scheduler UI (not Applications UI), where > _non-permissible users_ are non-application-owners and are not present in the > application ACL -> mapreduce.job.acl-view-job, nor present in the Queue ACL > as a Queue admin to which this job was submitted to" (see [1] where both the > filter setting introduced by YARN-8319 & ACL checks are performed): > The issue can be reproduced easily by having the setting > {{yarn.webapp.filter-entity-list-by-user}} set to true in yarn-site.xml. > The above disallows non-permissible users from viewing another user's > applications in the Applications page, but not in the Scheduler's page. > The filter setting seems to be getting checked only on the getApps() call but > not while rendering the apps information on the Scheduler page. This seems to > be a "missed" feature from YARN-8319. > Following pre-requisites are needed to reproduce the issue: > * Kerberized cluster, > * SPNEGO enabled for HDFS & YARN, > * Add test users - systest and user1 on all nodes. > * Add kerberos princs for the above users. > * Create HDFS user dirs for above users and chown them appropriately. > * Run a sample MR Sleep job and test. > Steps to reproduce the issue: > * kinit as "systest" user and run a sample MR sleep job from one of the nodes > in the cluster: > {code} > yarn jar sleep -m 1 -mt > 360 > {code} > * kinit as "user1" from Mac as an example (this assumes you've copied the > /etc/krb5.conf from the cluster to your Mac's /private/etc folder already for > Spengo auth). > * Open the Applications page. user1 cannot view the job being run by systest. > This is correct. > * Open the Scheduler page. user1 *CAN* view the job being run by systest. > This is *INCORRECT*. > [1] > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L676 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10943) AbstractCSQueue: Create separate class for encapsulating Min / Max Resource
Szilard Nemeth created YARN-10943: - Summary: AbstractCSQueue: Create separate class for encapsulating Min / Max Resource Key: YARN-10943 URL: https://issues.apache.org/jira/browse/YARN-10943 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth Assignee: Szilard Nemeth There are certain methods where min and max Resources are used in tandem. Some examples of these kind of methods: - getMinimumAbsoluteResource / getMaximumAbsoluteResource - updateConfigurableResourceLimits: - It invokes setConfiguredMinResource / setConfiguredMaxResource on QueueResourceQuotas. That object could define a simple method that receives the MinMaxResource alone. - Validator methods are also receiving min/max resources as separate parameters, which could be tied together. - updateEffectiveResources: It performs operations with effective min/max resources. Alternatively, 2 classes could be created: - One for EffectiveMinMaxResource - And another for AbsoluteMinMaxResource -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10944) AbstractCSQueue: Eliminate code duplication in overloaded versions of setMaxCapacity
Szilard Nemeth created YARN-10944: - Summary: AbstractCSQueue: Eliminate code duplication in overloaded versions of setMaxCapacity Key: YARN-10944 URL: https://issues.apache.org/jira/browse/YARN-10944 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth Methods are: - AbstractCSQueue#setMaxCapacity(float) - AbstractCSQueue#setMaxCapacity(java.lang.String, float) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10945) Add javadoc to all methods of AbstractCSQueue
Szilard Nemeth created YARN-10945: - Summary: Add javadoc to all methods of AbstractCSQueue Key: YARN-10945 URL: https://issues.apache.org/jira/browse/YARN-10945 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10946) AbstractCSQueue: Create separate class for constructing Queue API objects
Szilard Nemeth created YARN-10946: - Summary: AbstractCSQueue: Create separate class for constructing Queue API objects Key: YARN-10946 URL: https://issues.apache.org/jira/browse/YARN-10946 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth Relevant methods are: - org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueConfigurations - org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueInfo - org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueStatistics -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10948) Rename SchedulerQueue#activeQueue to activateQueue
Szilard Nemeth created YARN-10948: - Summary: Rename SchedulerQueue#activeQueue to activateQueue Key: YARN-10948 URL: https://issues.apache.org/jira/browse/YARN-10948 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10947) Simplify AbstractCSQueue#initializeQueueState
Szilard Nemeth created YARN-10947: - Summary: Simplify AbstractCSQueue#initializeQueueState Key: YARN-10947 URL: https://issues.apache.org/jira/browse/YARN-10947 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10949) Simplify AbstractCSQueue#updateMaxAppRelatedField and find a more meaningful name for this method
Szilard Nemeth created YARN-10949: - Summary: Simplify AbstractCSQueue#updateMaxAppRelatedField and find a more meaningful name for this method Key: YARN-10949 URL: https://issues.apache.org/jira/browse/YARN-10949 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10950) Code cleanup in QueueCapacities
Szilard Nemeth created YARN-10950: - Summary: Code cleanup in QueueCapacities Key: YARN-10950 URL: https://issues.apache.org/jira/browse/YARN-10950 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth - Make fields final: capacitiesMap, readLock, writeLock - Remove explicit type arguments, e.g. new HashMap(); - Remove abbrevations and avoid string concatenation in QueueCapacities.Capacities#toString - Remove unnecessary comments, e.g. "/* Used Capacity Getter and Setter */" & "/* Absolute Used Capacity Getter and Setter */" - And probably many more.. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10951) CapacityScheduler: Move all fields and initializer code that belongs to async scheduling to a new class
Szilard Nemeth created YARN-10951: - Summary: CapacityScheduler: Move all fields and initializer code that belongs to async scheduling to a new class Key: YARN-10951 URL: https://issues.apache.org/jira/browse/YARN-10951 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth There are certain if-statements that control whether to initialize some async-scheduling related fields, based on the value of field called 'scheduleAsynchronously'. We could move these fields to a separate class for clarity. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10952) Move CapacityScheduler#updatePlacementRules elsewhere
Szilard Nemeth created YARN-10952: - Summary: Move CapacityScheduler#updatePlacementRules elsewhere Key: YARN-10952 URL: https://issues.apache.org/jira/browse/YARN-10952 Project: Hadoop YARN Issue Type: Sub-task Environment: This method does belong strongly to this class, as it techniqually just a parser for MappingRules based on the provided Configuration object. The method could be static and should also receive rmContext.getQueuePlacementManager() along with the Configuration. The updateRules method of PlacementManager is already public. Reporter: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10953) Make CapacityScheduler#getOrCreateQueueFromPlacementContext more easy to comprehend
Szilard Nemeth created YARN-10953: - Summary: Make CapacityScheduler#getOrCreateQueueFromPlacementContext more easy to comprehend Key: YARN-10953 URL: https://issues.apache.org/jira/browse/YARN-10953 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth 1. Most of the method body is wrapped in an if-statement that checks if the queue is null. We could negate this and return immediately if the queue != null, so we don't need a large if statement. 2. Similarly in that large if body, there's a check for fallbackContext.hasParentQueue(). If it's true, we are having yet another large if-body. We should also negate this condition and return immediately if it's false. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10954) Remove commented code block from CSQueueUtils#loadCapacitiesByLabelsFromConf
Szilard Nemeth created YARN-10954: - Summary: Remove commented code block from CSQueueUtils#loadCapacitiesByLabelsFromConf Key: YARN-10954 URL: https://issues.apache.org/jira/browse/YARN-10954 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10959) CLONE - AbstractCSQueue: Group preemption methods and fields into a separate class
Szilard Nemeth created YARN-10959: - Summary: CLONE - AbstractCSQueue: Group preemption methods and fields into a separate class Key: YARN-10959 URL: https://issues.apache.org/jira/browse/YARN-10959 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth Assignee: Szilard Nemeth Relevant methods: isQueueHierarchyPreemptionDisabled, isIntraQueueHierarchyPreemptionDisabled, getTotalKillableResource, getKillableContainers -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10937) Fix log message arguments in LogAggregationFileController
[ https://issues.apache.org/jira/browse/YARN-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-10937. --- Hadoop Flags: Reviewed Resolution: Fixed > Fix log message arguments in LogAggregationFileController > - > > Key: YARN-10937 > URL: https://issues.apache.org/jira/browse/YARN-10937 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.3.1 >Reporter: Tamas Domok >Assignee: Tibor Kovács >Priority: Trivial > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > More arguments provided (1) than placeholders specified (0) in the following > log message: > {code:java} > LOG.info("Unable to set permissions for configured filesystem since" > + " it does not support this", remoteFS.getScheme());{code} > This is logged two times, both of them is affected. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10983) Follow-up changes for YARN-10904
Szilard Nemeth created YARN-10983: - Summary: Follow-up changes for YARN-10904 Key: YARN-10983 URL: https://issues.apache.org/jira/browse/YARN-10983 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Szilard Nemeth Assignee: Szilard Nemeth Links to Github comments from [~gandras]: - https://github.com/apache/hadoop/pull/3551#discussion_r730728783 - https://github.com/apache/hadoop/pull/3551#discussion_r730729218 - https://github.com/apache/hadoop/pull/3551#discussion_r730729717 - https://github.com/apache/hadoop/pull/3551#discussion_r730736115 - https://github.com/apache/hadoop/pull/3551#discussion_r730741596 The required changes are the following: - QueueNodeLabelsSettings: Incorporate QueuePath - QueueAppLifetimeAndLimitSettings: Simplify parentQueue null check - QueueAllocationSettings: Remove comment starting with: "/* YARN-10869: When using AutoCreatedLeafQueues, the passed configuration" - Only if YARN-10929 got merged. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10984) Add tests to CapacitySchedulerConfiguration
Szilard Nemeth created YARN-10984: - Summary: Add tests to CapacitySchedulerConfiguration Key: YARN-10984 URL: https://issues.apache.org/jira/browse/YARN-10984 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-10985) CLONE - Add tests to CapacitySchedulerConfiguration
Szilard Nemeth created YARN-10985: - Summary: CLONE - Add tests to CapacitySchedulerConfiguration Key: YARN-10985 URL: https://issues.apache.org/jira/browse/YARN-10985 Project: Hadoop YARN Issue Type: Sub-task Reporter: Szilard Nemeth -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10930) Introduce universal configured capacity vector
[ https://issues.apache.org/jira/browse/YARN-10930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-10930. --- Hadoop Flags: Reviewed Resolution: Fixed > Introduce universal configured capacity vector > -- > > Key: YARN-10930 > URL: https://issues.apache.org/jira/browse/YARN-10930 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: capacity_scheduler_queue_capacity.html > > Time Spent: 6h 10m > Remaining Estimate: 0h > > The proposal is to introduce a capacity resource vector that is universally > parsed for every queue. CapacityResourceVector is a way to unite the current > capacity modes (weight, percentage, absolute), while maintaining flexibility > and extendability. > CapacityResourceVector is a good fit for the existing capacity configs, for > example: > * percentage mode: root.example.capacity 50 is a syntactic sugar for > [memory=50%, vcores=50%, ] > * absolute mode: root.example.capacity [memory=1024, vcores=2] is a natural > fit for the vector, there is no need for additional settings > CapacityResourceVector will be used in a future refactor, to unify the > resource calculation and lift the limitation imposed on the queue hierarchy > capacity settings (eg. can not use both absolute resource and percentage in > the same hierarchy etc...) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10758) Mixed mode: Allow relative and absolute mode in the same queue hierarchy
[ https://issues.apache.org/jira/browse/YARN-10758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-10758. --- Resolution: Duplicate > Mixed mode: Allow relative and absolute mode in the same queue hierarchy > > > Key: YARN-10758 > URL: https://issues.apache.org/jira/browse/YARN-10758 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Andras Gyori >Assignee: Andras Gyori >Priority: Major > > Fair Scheduler supports mixed mode for shares (FS equivalent of capacity). An > example scenario of such configuration: > {noformat} > root.a.capacity [memory-mb=7268, vcores=8]{noformat} > {noformat} > root.a.a1.capacity 50{noformat} > {noformat} > root.a.a2.capacity 50{noformat} > This above scenario is not supported in CS today because despite CS already > permits using weight mode and relative/percentage mode in the same hierarchy > the absolute mode and relative mode is mutually exclusive. > This improvement is a natural extension of CS to lift this limitation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9936) Support vector of capacity percentages in Capacity Scheduler configuration
[ https://issues.apache.org/jira/browse/YARN-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-9936. -- Resolution: Invalid > Support vector of capacity percentages in Capacity Scheduler configuration > -- > > Key: YARN-9936 > URL: https://issues.apache.org/jira/browse/YARN-9936 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Zoltan Siegl >Assignee: Andras Gyori >Priority: Major > Attachments: Capacity Scheduler support of “vector of resources > percentage”.pdf > > > Currently, the Capacity Scheduler queue configuration supports two ways to > set queue capacity. > * In percentage of all available resources as a float ( eg. 25.0 ) means 25% > of the resources of its parent queue for all resource types equally (eg. 25% > of all memory, 25% of all CPU cores, and 25% of all available GPU in the > cluster) The percentages of all queues has to add up to 100%. > * In an absolute amount of resources ( e.g. > memory=4GB,vcores=20,yarn.io/gpu=4 ). The amount of all resources in the > queues has to be less than or equal to all resources in the > cluster.{color:#de350b}Actually, the above is not supported, we only support > memory and vcores now in absolute mode, we should extend in {color} > YARN-10503. > Apart from these two already existing ways, there is a demand to add capacity > percentage of each available resource type separately. (eg. > {{memory=20%,vcores=40%,yarn.io/gpu=100%}}). > At the same time, a similar concept should be included with queues > maximum-capacity as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org