from:"Szilard Nemeth"

Re: [VOTE] Release Apache Hadoop 3.1.4 (RC4)

2020-07-23 Thread Szilard Nemeth

+1 (binding).

**TEST STEPS**
1. Build from sources (see Maven / Java and OS details below)
2. Distribute Hadoop to all nodes
3. Start HDFS services + YARN services on nodes
4. Run Mapreduce pi job (QuasiMontecarlo)
5. Verifed that application was successful through YARN RM Web UI
6. Verified version of Hadoop release from YARN RM Web UI

**OS version**
$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/";
BUG_REPORT_URL="https://bugs.centos.org/";

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

**Maven version**
$ mvn -v
Apache Maven 3.0.5 (Red Hat 3.0.5-17)
Maven home: /usr/share/maven

**Java version**
Java version: 1.8.0_191, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-0.el7_5.x86_64/jre
Default locale: en_US, platform encoding: ANSI_X3.4-1968
OS name: "linux", version: "3.10.0-1062.el7.x86_64", arch: "amd64", family:
"unix"

**Maven command to build from sources**
mvn clean package -Pdist -DskipTests -Dmaven.javadoc.skip=true


**OTHER NOTES**
1. Had to manually install maven in order to manually compile Hadoop based
on these steps:
https://gist.github.com/miroslavtamas/cdca97f2eafdd6c28b844434eaa3b631

2. Had to manually install protoc + other required libraries with the
following commands (in this particular order):
sudo yum install -y protobuf-devel
sudo yum install -y gcc gcc-c++ make
sudo yum install -y openssl-devel
sudo yum install -y libgsasl


Thanks,
Szilard

On Thu, Jul 23, 2020 at 4:05 PM Masatake Iwasaki <
iwasak...@oss.nttdata.co.jp> wrote:

> +1 (binding).
>
> * verified the checksum and signature of the source tarball.
> * built from source tarball with native profile on CentOS 7 and OpenJDK 8.
> * built documentation and skimmed the contents.
> * ran example jobs on 3 nodes docker cluster with NN-HA and RM-HA enblaed.
> * launched pseudo-distributed cluster with Kerberos and SSL enabled, ran
> basic EZ operation, ran example MR jobs.
> * followed the reproduction step reported in  HDFS-15313 to see if the
> fix works.
>
> Thanks,
> Masatake Iwasaki
>
> On 2020/07/21 21:50, Gabor Bota wrote:
> > Hi folks,
> >
> > I have put together a release candidate (RC4) for Hadoop 3.1.4.
> >
> > *
> > The RC includes in addition to the previous ones:
> > * fix for HDFS-15313. Ensure inodes in active filesystem are not
> > deleted during snapshot delete
> > * fix for YARN-10347. Fix double locking in
> > CapacityScheduler#reinitialize in branch-3.1
> > (https://issues.apache.org/jira/browse/YARN-10347)
> > * the revert of HDFS-14941, as it caused
> > HDFS-15421. IBR leak causes standby NN to be stuck in safe mode.
> > (https://issues.apache.org/jira/browse/HDFS-15421)
> > * HDFS-15323, as requested.
> > (https://issues.apache.org/jira/browse/HDFS-15323)
> > *
> >
> > The RC is available at:
> http://people.apache.org/~gabota/hadoop-3.1.4-RC4/
> > The RC tag in git is here:
> > https://github.com/apache/hadoop/releases/tag/release-3.1.4-RC4
> > The maven artifacts are staged at
> > https://repository.apache.org/content/repositories/orgapachehadoop-1275/
> >
> > You can find my public key at:
> > https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> > and http://keys.gnupg.net/pks/lookup?op=get&search=0xB86249D83539B38C
> >
> > Please try the release and vote. The vote will run for 8 weekdays,
> > until July 31. 2020. 23:00 CET.
> >
> >
> > Thanks,
> > Gabor
> >
> > -
> > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>

Re: [E] Re: [DISCUSS] Change project style guidelines to allow line length 100

2021-06-12 Thread Szilard Nemeth

Thanks for this initiative, Sean.
+1 for increasing the length to 100 characters.
I can't see a VOTE thread regarding this subject. Am I missing something?


Best,
Szilard


On Mon, May 24, 2021 at 11:49 PM Jonathan Eagles  wrote:

> In apache tez, formal line length is 120 characters. So, I recommend 120+
>
> On Mon, May 24, 2021 at 4:46 PM Kihwal Lee  .invalid>
> wrote:
>
> > +1 for the 100 char limit.
> > But I would have liked 132 columns more.  :)
> >
> > Kihwal
> >
> > On Mon, May 24, 2021 at 1:46 PM Sean Busbey 
> > wrote:
> >
> > > Hi folks!
> > >
> > > The consensus seems pretty strongly in favor of increasing the line
> > length
> > > limit. Do folks still want to see a formal VOTE thread?
> > >
> > >
> > > > On May 19, 2021, at 4:22 PM, Sean Busbey 
> > > wrote:
> > > >
> > > > Hello!
> > > >
> > > > What do folks think about changing our line length guidelines to
> allow
> > > for 100 character width?
> > > >
> > > > Currently, we tell folks to follow the sun style guide with some
> > > exception unrelated to line length. That guide says width of 80 is the
> > > standard and our current check style rules act as enforcement.
> > > >
> > > > Looking at the current trunk codebase our nightly build shows a total
> > of
> > > ~15k line length violations; it’s about 18% of identified checkstyle
> > issues.
> > > >
> > > > The vast majority of those line length violations are <= 100
> characters
> > > long. 100 characters happens to be the length for the Google Java Style
> > > Guide, another commonly adopted style guide for java projects, so I
> > suspect
> > > these longer lines leaking past the checkstyle precommit warning might
> > be a
> > > reflection of committers working across multiple java codebases.
> > > >
> > > > I don’t feel strongly about lines being longer, but I would like to
> > move
> > > towards more consistent style enforcement as a project. Updating our
> > > project guidance to allow for 100 character lines would reduce the
> > > likelihood that folks bringing in new contributions need a precommit
> test
> > > cycle to get the formatting correct.
> > > >
> > > > Does anyone feel strongly about keeping the line length limit at 80
> > > characters?
> > > >
> > > > Does anyone feel strongly about contributions coming in that clear up
> > > line length violations?
> > > >
> > >
> > > -
> > > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> > > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> > >
> > >
> >
>

Re: [HELP] Review request for YARN-9214 and YARN-9401

2019-03-26 Thread Szilard Nemeth

Hi,

Sure, I can do that tomorrow.

Szilard

On Tue, Mar 26, 2019 at 3:41 PM Wanqiang Ji  wrote:

> Hi folks,
>
> Can someone help to review YARN-9214 and YARN-9401?
> YARN-9214. Add AbstractYarnScheduler#getValidQueues method to resolve
> duplicate code
> YARN-9401. Fix `yarn version` print the version info is the same as `hadoop
> version`
>
>
> -Wanqiang Ji
>

ResourceManager & Resource Types topic to discuss: SafeMode

2019-03-27 Thread Szilard Nemeth

Hi,

This could be interesting for anyone interested in RM / Resource Types:
I filed a jira recently: https://issues.apache.org/jira/browse/YARN-9421
(Implement SafeMode for ResourceManager by defining a resource threshold).

The issue in one sentence: If an app is submitted while RM still haven't
received all registration requests from NMs and if the demand of the app
contains any custom resource (e.g. GPU), it can happen that the app will be
rejected quickly with a InvalidResourceRequestException.
Later on, the same app submitted later could be accepted, if the NMs are
registered (most likely couple of seconds later). In this sense, the
behavior of RM is not consistent.

Please read through the jira, I think the issue is well described there!

Thanks a lot,
Szilard

Tool to view current status (applicability) of patches

2019-04-04 Thread Szilard Nemeth

Hi,

I developed a tool recently, that can help to see the status of patches and
checks if they can be applied to trunk (or any specified branch).
The motivation for the project was quite trivial: It's very cumbersome to
keep track of the status of all our pending patches with our YARN team.
We already have a sheet to keep track of pending patches of our upstream
work, so there came my idea: Let's write a script that checks if the
patches still apply to trunk (or any specified branch / branches).

The project currently has jira and Google Sheets integration (read/write).

On a more longer-term, I'm planning to provide some javascript script that
would place Red/Green status lights next to the patches that indicate their
applicability to trunk.
I would also pay attention to minimize requests sent to jira, so I'm
planning to introduce some caching and provide a "force-refresh status"
button to get the current status of the patch.


Do you guys think this is a good idea and it's worth to spend some more
time on this?
This would require a moderate amount of work but my main concern is where
to host this service.
Is there an Apache server (or any other infra) that could host this
application? The memory / cpu footprint is quite moderate, it requires some
network bandwidth, though.

Here's the link for the git repo of the project:
https://github.com/szilard-nemeth/hadoop-reviewsync
The project is still in a PoC phase so the code is not the cleanest, I'm
planning to improve this in the near-future.

Please feel free to share your thoughs, feedback, ideas, anything!


Thanks,
Szilard
╒═══╤╤═══╤╤══╤═══╤╤═╤══╤═══╕
│   Row │ Issue  │   Patch apply │ Owner  │ Patch file  
 │ Branch│ Explicit   │ Result  │ 
Number of conflicted files   │ Overall result   
 │
╞═══╪╪═══╪╪══╪═══╪╪═╪══╪═══╡
│ 1 │ YARN-8553  │ 1 │ Szilard Nemeth │ 
YARN-8553.003.patch  │ origin/trunk  │ Yes│ CONFLICT
│ 1│ origin/trunk: CONFLICT, origin/branch-3.2: 
OK, origin/branch-3.1: CONFLICT│
├───┼┼───┼┼──┼───┼┼─┼──┼───┤
│ 2 │ YARN-8553  │ 2 │ Szilard Nemeth │ 
YARN-8553.003.patch  │ origin/branch-3.2 │ No │ APPLIES CLEANLY 
│ N/A  │ origin/trunk: CONFLICT, origin/branch-3.2: 
OK, origin/branch-3.1: CONFLICT│
├───┼┼───┼┼──┼───┼┼─┼──┼───┤
│ 3 │ YARN-8553  │ 3 │ Szilard Nemeth │ 
YARN-8553.003.patch  │ origin/branch-3.1 │ No │ CONFLICT
│ 1│ origin/trunk: CONFLICT, origin/branch-3.2: 
OK, origin/branch-3.1: CONFLICT│
├───┼┼───┼┼──┼───┼┼─┼──┼───┤
│ 4 │ YARN-5464  │ 1 │ Antal Bálint Steinbach │ 
YARN-5464.005.patch  │ origin/trunk  │ Yes│ CONFLICT
│ 3│ origin/trunk: CONFLICT, origin/branch-3.2: 
CONFLICT, origin/branch-3.1: CONFLICT  │
├───┼┼───┼┼──┼───┼┼─┼──┼───┤
│ 5 │ YARN-5464  │ 2 │ Antal Bálint Steinbach │ 
YARN-5464.005.patch  │ origin/branch-3.2 │ No │ CONFLICT
│ 12   │ origin/trunk: CONFLICT, origin/branc

Re: [VOTE] Release Apache Hadoop Submarine 0.2.0 - RC0

2019-06-21 Thread Szilard Nemeth

+1 (non-binding)

On Fri, Jun 21, 2019, 09:09 Weiwei Yang  wrote:

> +1 (binding)
>
> Thanks
> Weiwei
> On Jun 21, 2019, 5:33 AM +0800, Wangda Tan , wrote:
> +1 Binding. Tested in local cluster and reviewed docs.
>
> Thanks!
>
> On Wed, Jun 19, 2019 at 3:20 AM Sunil Govindan  wrote:
>
> +1 binding
>
> - tested in local cluster.
> - tried tony run time as well
> - doc seems fine now.
>
> - Sunil
>
>
> On Thu, Jun 6, 2019 at 6:53 PM Zhankun Tang  wrote:
>
> Hi folks,
>
> Thanks to all of you who have contributed in this submarine 0.2.0
> release.
> We now have a release candidate (RC0) for Apache Hadoop Submarine 0.2.0.
>
>
> The Artifacts for this Submarine-0.2.0 RC0 are available here:
>
> https://home.apache.org/~ztang/submarine-0.2.0-rc0/
>
>
> It's RC tag in git is "submarine-0.2.0-RC0".
>
>
>
> The maven artifacts are available via repository.apache.org at
> https://repository.apache.org/content/repositories/orgapachehadoop-1221/
>
>
> This vote will run 7 days (5 weekdays), ending on 13th June at 11:59 pm
> PST.
>
>
>
> The highlights of this release.
>
> 1. Linkedin's TonY runtime support in Submarine
>
> 2. PyTorch enabled in Submarine with both YARN native service runtime
> (single node) and TonY runtime
>
> 3. Support uber jar of Submarine to submit the job
>
> 4. The YAML file to describe a job
>
> 5. The Notebook support (by Apache Zeppelin Submarine interpreter)
>
>
> Thanks to Sunil, Wangda, Xun, Zac, Keqiu, Szilard for helping me in
> preparing the release.
>
> I have done a few testing with my pseudo cluster. My +1 (non-binding) to
> start.
>
>
>
> Regards,
> Zhankun
>
>
>

Re: Any thoughts making Submarine a separate Apache project?

2019-07-16 Thread Szilard Nemeth

+1, this is a very great idea.
As Hadoop repository has already grown huge and contains many projects, I
think in general it's a good idea to separate projects in the early phase.


On Wed, Jul 17, 2019, 08:50 runlin zhang  wrote:

> +1 ，That will be great ！
>
> > 在 2019年7月10日，下午3:34，Xun Liu  写道：
> >
> > Hi all,
> >
> > This is Xun Liu contributing to the Submarine project for deep learning
> > workloads running with big data workloads together on Hadoop clusters.
> >
> > There are a bunch of integrations of Submarine to other projects are
> > finished or going on, such as Apache Zeppelin, TonY, Azkaban. The next
> step
> > of Submarine is going to integrate with more projects like Apache Arrow,
> > Redis, MLflow, etc. & be able to handle end-to-end machine learning use
> > cases like model serving, notebook management, advanced training
> > optimizations (like auto parameter tuning, memory cache optimizations for
> > large datasets for training, etc.), and make it run on other platforms
> like
> > Kubernetes or natively on Cloud. LinkedIn also wants to donate TonY
> project
> > to Apache so we can put Submarine and TonY together to the same codebase
> > (Page #30.
> >
> https://www.slideshare.net/xkrogen/hadoop-meetup-jan-2019-tony-tensorflow-on-yarn-and-beyond#30
> > ).
> >
> > This expands the scope of the original Submarine project in exciting new
> > ways. Toward that end, would it make sense to create a separate Submarine
> > project at Apache? This can make faster adoption of Submarine, and allow
> > Submarine to grow to a full-blown machine learning platform.
> >
> > There will be lots of technical details to work out, but any initial
> > thoughts on this?
> >
> > Best Regards,
> > Xun Liu
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>

Re: yarn nm builds breaking on 3.1 and 3.2

2019-10-09 Thread Szilard Nemeth

Hi Steve!

Reverted YARN-9128 commits on branch-3.2 / branch-3.1
Sorry for the hassle!

Best,
Szilard

On Wed, Oct 9, 2019 at 7:46 PM Sunil Govindan  wrote:

> YARN-9128 caused this, Szilard is checking this.
>
>
> Thanks
>
> Sunil
>
>
> On Wed, Oct 9, 2019 at 10:51 PM Steve Loughran  >
> wrote:
>
> > I'm seeing the yarn branch 3.1 and 3.2 builds breaking right now; one of
> > the patches is gone in the last 24 hours has done it.
> >
> > [INFO] Finished at: 2019-10-09T18:18:31+01:00
> > [INFO]
> > 
> > [ERROR] Failed to execute goal
> > org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile
> > (default-testCompile) on project hadoop-yarn-server-nodemanager:
> > Compilation failure: Compilation failure:
> > [ERROR]
> >
> >
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestResourceMappings.java:[22,66]
> > package org.apache.hadoop.yarn.server.nodemanager.api.deviceplugin does
> not
> > exist
> > [ERROR]
> >
> >
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestResourceMappings.java:[41,15]
> > package Device does not exist
> > [ERROR]
> >
> >
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestResourceMappings.java:[49,15]
> > package Device does not exist
> > [ERROR] -> [Help 1]
> > [ERROR]
> >
> >
> > If I revert either branch to HADOOP-16491. Upgrade jetty version to
> 9.3.27
> > then everything works again;
> >
> > I don't want to blindly revert all three; can someone take a a look and
> fix
> > these up, or, if there's no easy fix, pull the commit at fault.
> >
> > thanks
> >
> > Steve
> >
>

[jira] [Created] (YARN-10264) Add container launch related env / classpath debug info to container logs when a container fails

2020-05-13 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10264:
-

Summary: Add container launch related env / classpath debug info
to container logs when a container fails
Key: YARN-10264
URL: https://issues.apache.org/jira/browse/YARN-10264
Project: Hadoop YARN
Issue Type: Task
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth

Sometimes when a container fails to launch, it can be pretty hard to figure out
why it failed.

Similar to YARN-4309, we can add a switch to control if the printing of
environment variables and Java classpath should be done.
As a bonus,
[jdeps|https://docs.oracle.com/javase/8/docs/technotes/tools/unix/jdeps.html]
could also be utilized to print some verbose info about the classpath.

When log aggregation occurs, all this information will automatically get
collected and make debugging such container launch failures much easier.

Below is an example output when the user faces a classpath configuration issue:

{code:java}
End of LogType:prelaunch.err
**
2020-04-19 05:49:12,145 DEBUG:app_info:Diagnostics of the failed app
2020-04-19 05:49:12,145 DEBUG:app_info:Application
application_1587300264561_0001 failed 2 times due to AM Container for
appattempt_1587300264561_0001_02 exited with exitCode: 1
Failing this attempt.Diagnostics: [2020-04-19 12:45:01.955]Exception from
container-launch.
Container id: container_e60_1587300264561_0001_02_01
Exit code: 1
Exception message: Launch container failed
Shell output: main : command provided 1
main : run as user is systest
main : requested yarn user is systest
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file
/dataroot/ycloud/yarn/nm/nmPrivate/application_1587300264561_0001/container_e60_1587300264561_0001_02_01/container_e60_1587300264561_0001_02_01.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...

[2020-04-19 12:45:01.984]Container exited with a non-zero exit code 1. Error
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class
org.apache.hadoop.mapreduce.v2.app.MRAppMaster

Please check whether your etc/hadoop/mapred-site.xml contains the below
configuration:

yarn.app.mapreduce.am.env
HADOOP_MAPRED_HOME=${full path of your hadoop distribution
directory}

mapreduce.map.env
HADOOP_MAPRED_HOME=${full path of your hadoop distribution
directory}

mapreduce.reduce.env
HADOOP_MAPRED_HOME=${full path of your hadoop distribution
directory}

[2020-04-19 12:45:01.985]Container exited with a non-zero exit code 1. Error
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class
org.apache.hadoop.mapreduce.v2.app.MRAppMaster

Please check whether your etc/hadoop/mapred-site.xml contains the below
configuration:

yarn.app.mapreduce.am.env
HADOOP_MAPRED_HOME=${full path of your hadoop distribution
directory}

mapreduce.map.env
HADOOP_MAPRED_HOME=${full path of your hadoop distribution
directory}

mapreduce.reduce.env
HADOOP_MAPRED_HOME=${full path of your hadoop distribution
directory}

For more detailed output, check the application tracking page:
http://quasar-plnefj-2.quasar-plnefj.root.hwx.site:8088/cluster/app/application_1587300264561_0001
Then click on links to logs of each attempt.
...
2020-04-19 05:49:12,148 INFO:util:* End test_app_API (yarn.suite.YarnAPITests) *
{code}

--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10321) Break down TestUserGroupMappingPlacementRule#testMapping into test scenarios

2020-06-17 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10321:
-

 Summary: Break down TestUserGroupMappingPlacementRule#testMapping 
into test scenarios
 Key: YARN-10321
 URL: https://issues.apache.org/jira/browse/YARN-10321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10323) [Umbrella] YARN Debuggability and Supportability Improvements

2020-06-18 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10323:
-

 Summary: [Umbrella] YARN Debuggability and Supportability 
Improvements
 Key: YARN-10323
 URL: https://issues.apache.org/jira/browse/YARN-10323
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth


Troubleshooting YARN problems can be difficult on a production environment.
Collecting data before problems occur or actively collecting data in an 
on-demand basis could truly help tracking down issues.

Some examples: 
1. If application is hanging, application logs along with RM / NM logs could be 
collected, plus jstack of either the YARN daemons or the application container.
2. Similarly, when an application fails we may collect data.
3. Scheduler issues are quite common so good tooling that helps to spot issues 
would be crucial.

Design document will be added later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10488) Several typos in package: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair

2020-11-13 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10488:
-

 Summary: Several typos in package: 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair
 Key: YARN-10488
 URL: https://issues.apache.org/jira/browse/YARN-10488
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth


1. Typo in field name: 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.VisitedResourceRequestTracker.TrackerPerPriorityResource#racksVisted

2. Typo in method: 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager#setChildResourceLimits
There's a comment: "... max reource ...", typo in the word 'resource'.

3. Typo in javadoc of method: 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt#reserve
"bookeeping" -> "bookkeeping"

4. There's a local variable in this method: 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt#updateAMDiagnosticMsg,
 
called diagnosticMessageBldr. It's an abbreviation, but could be changed to 
something more meaningful.

5. Typo in javadoc of method: 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.MaxRunningAppsEnforcer#updateRunnabilityOnReload
"reinitilized" --> "reinitialized"

6. And last but not least, a funny typo in the method name of: 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.DominantResourceFairnessPolicy.DominantResourceFairnessComparator#compareAttribrutes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10547) Decouple job parsing logic from SLSRunner

2020-12-26 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10547:
-

 Summary: Decouple job parsing logic from SLSRunner
 Key: YARN-10547
 URL: https://issues.apache.org/jira/browse/YARN-10547
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


SLSRunner has too many responsibilities.
One of them is to parse the job details from the SLS input formats and launch 
the AMs and task containers.
As a first step, the job parser logic could be decoupled from this class.

There are 3 types of inputs: 
- SLS trace
- Synth
- Rumen

Their job parsing method are: 
- SLS trace: 
https://github.com/apache/hadoop/blob/005b854f6bad66defafae0abf95dabc6c36ca8b1/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java#L479-L526
- Synth: 
https://github.com/apache/hadoop/blob/005b854f6bad66defafae0abf95dabc6c36ca8b1/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java#L722-L790
- Rumen: 
https://github.com/apache/hadoop/blob/005b854f6bad66defafae0abf95dabc6c36ca8b1/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java#L651-L716



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10548) CLONE - Decouple job parsing logic from SLSRunner

2020-12-26 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10548:
-

 Summary: CLONE - Decouple job parsing logic from SLSRunner
 Key: YARN-10548
 URL: https://issues.apache.org/jira/browse/YARN-10548
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


SLSRunner has too many responsibilities.
One of them is to parse the job details from the SLS input formats and launch 
the AMs and task containers.
As a first step, the job parser logic could be decoupled from this class.

There are 3 types of inputs: 
- SLS trace
- Synth
- Rumen

Their job parsing method are: 
- SLS trace: 
https://github.com/apache/hadoop/blob/005b854f6bad66defafae0abf95dabc6c36ca8b1/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java#L479-L526
- Synth: 
https://github.com/apache/hadoop/blob/005b854f6bad66defafae0abf95dabc6c36ca8b1/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java#L722-L790
- Rumen: 
https://github.com/apache/hadoop/blob/005b854f6bad66defafae0abf95dabc6c36ca8b1/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java#L651-L716



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10549) Decouple RM runner logic from SLSRunner

2020-12-26 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10549:
-

 Summary: Decouple RM runner logic from SLSRunner
 Key: YARN-10549
 URL: https://issues.apache.org/jira/browse/YARN-10549
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


SLSRunner has too many responsibilities.
 One of them is to parse the job details from the SLS input formats and launch 
the AMs and task containers.
 The AM runner logic could be decoupled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10550) Decouple NM runner logic from SLSRunner

2020-12-26 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10550:
-

 Summary: Decouple NM runner logic from SLSRunner
 Key: YARN-10550
 URL: https://issues.apache.org/jira/browse/YARN-10550
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


SLSRunner has too many responsibilities.
 One of them is to parse the job details from the SLS input formats and launch 
the AMs and task containers.
 The RM runner logic could be decoupled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10552) Eliminate code duplication in SLSCapacityScheduler and SLSFairScheduler

2020-12-27 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10552:
-

 Summary: Eliminate code duplication in SLSCapacityScheduler and 
SLSFairScheduler
 Key: YARN-10552
 URL: https://issues.apache.org/jira/browse/YARN-10552
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10579) CLONE - CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include mode of operation for CS

2021-01-18 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10579:
-

 Summary: CLONE - CS Flexible Auto Queue Creation: Modify RM 
/scheduler endpoint to include mode of operation for CS
 Key: YARN-10579
 URL: https://issues.apache.org/jira/browse/YARN-10579
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Benjamin Teke
Assignee: Szilard Nemeth


Under this umbrella (YARN-10496), weight-mode has been implemented for CS with 
YARN-10504.
We would like to expose the mode of operation with the RM's /scheduler REST 
endpoint.
The field name will be 'mode'.
All queue representations in the response will be uniformly hold any of the 
mode values of: "percentage", "absolute", "weight".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10580) Fix some issues in TestRMWebServicesCapacitySchedDynamicConfig

2021-01-20 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10580:
-

 Summary: Fix some issues in 
TestRMWebServicesCapacitySchedDynamicConfig
 Key: YARN-10580
 URL: https://issues.apache.org/jira/browse/YARN-10580
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth


YARN-10512 introduced some changes that could be fixed, [~pbacsko] highlighted 
those issues with this comment.

Pasting the contents of the comment as a reference

#1 In TestRMWebServicesCapacitySchedDynamicConfig:
{noformat}
config.set(YarnConfiguration.SCHEDULER_CONFIGURATION_STORE_CLASS,
YarnConfiguration.MEMORY_CONFIGURATION_STORE);
{noformat}
This call is repeated multiple times, this could be set somewhere else.

 

#2 In TestRMWebServicesCapacitySchedDynamicConfig:
{noformat}
validateSchedulerInfo(json, "weight", "root.default", "root.test1", 
"root.test2");
{noformat}
"root.default", "root.test1" and "root.test2" are the same in all cases, you 
might want to drop them

 

#3 In TestRMWebServicesCapacitySchedDynamicConfig
{noformat}
  @Before
  @Override
  public void setUp() throws Exception {
super.setUp();
  }
{noformat}
This method does nothing, can be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10581) CLONE - CS Flexible Auto Queue Creation: Modify RM /scheduler endpoint to include weight values for queues

2021-01-20 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10581:
-

 Summary: CLONE - CS Flexible Auto Queue Creation: Modify RM 
/scheduler endpoint to include weight values for queues
 Key: YARN-10581
 URL: https://issues.apache.org/jira/browse/YARN-10581
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


Under this umbrella (YARN-10496), weight-mode has been implemented for CS with 
YARN-10504.
 We would like to expose the weight values for all queues with the RM's 
/scheduler REST endpoint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10672) All testcases in TestReservations are flaky

2021-03-04 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10672:
-

 Summary: All testcases in TestReservations are flaky
 Key: YARN-10672
 URL: https://issues.apache.org/jira/browse/YARN-10672
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


Running a particular test in TestReservations 100 times never passes all the 
time.
For example, let's run testReservationNoContinueLook 100 times. For me, it 
produced 39 failed and 61 passed results.
Screenshot is attached.

Stacktrace: 
{code}
java.lang.AssertionError: 
Expected :2048
Actual   :0


at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:633)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations.testReservationNoContinueLook(TestReservations.java:642)
{code}

The test fails here: 
{code}
 // Start testing...
// Only AM
TestUtils.applyResourceCommitRequest(clusterResource,
a.assignContainers(clusterResource, node_0,
new ResourceLimits(clusterResource),
SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY), nodes, apps);
assertEquals(2 * GB, a.getUsedResources().getMemorySize());
{code}

With some debugging (patch attached), I realized that sometimes there are no 
registered nodes so the AM can't be allocated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10675) Consolidate YARN-10672 and YARN-10447

2021-03-05 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10675:
-

 Summary: Consolidate YARN-10672 and YARN-10447
 Key: YARN-10675
 URL: https://issues.apache.org/jira/browse/YARN-10675
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


Let's consolidate the solution applied for YARN-10672 and apply it to the code 
changes introduced with YARN-10447.
Quoting [~pbacsko]: 
{quote}
The solution is much straightforward than mine in YARN-10447. Actually we might 
consider applying this to TestLeafQueue with undoing my changes, because that's 
more complicated (I had no patience to go deeper with Mockito internal 
behavior, I just thought well, disable that thread and that's enough).
{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10676) Improve code quality in TestTimelineAuthenticationFilterForV1

2021-03-05 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10676:
-

 Summary: Improve code quality in 
TestTimelineAuthenticationFilterForV1
 Key: YARN-10676
 URL: https://issues.apache.org/jira/browse/YARN-10676
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10677) Logger of SLSFairScheduler is provided with the wrong class

2021-03-07 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10677:
-

 Summary: Logger of SLSFairScheduler is provided with the wrong 
class
 Key: YARN-10677
 URL: https://issues.apache.org/jira/browse/YARN-10677
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


In SLSFairScheduler, the Logger definition looks like: 
https://github.com/apache/hadoop/blob/9cb51bf106802c78b1400fba9f1d1c7e772dd5e7/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L69
We need to fix this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10678) Try blocks without catch blocks in SLS scheduler classes can swallow other exceptions

2021-03-07 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10678:
-

 Summary: Try blocks without catch blocks in SLS scheduler classes 
can swallow other exceptions
 Key: YARN-10678
 URL: https://issues.apache.org/jira/browse/YARN-10678
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10679) Better logging of uncaught exceptions throughout SLS

2021-03-07 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10679:
-

 Summary: Better logging of uncaught exceptions throughout SLS
 Key: YARN-10679
 URL: https://issues.apache.org/jira/browse/YARN-10679
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10680) CLONE - Better logging of uncaught exceptions throughout SLS

2021-03-07 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10680:
-

 Summary: CLONE - Better logging of uncaught exceptions throughout 
SLS
 Key: YARN-10680
 URL: https://issues.apache.org/jira/browse/YARN-10680
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


In our internal environment, there was a test failure while running SLS tests 
with Jenkins.
It's difficult to align the uncaught exceptions (in this case an NPE) and the 
log itself as the exception is logged with {{e.printStackTrace()}}.
This jira is to replace printStackTrace calls in SLS with {{LOG.error("msg", 
exception)}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10681) Fix assertion failure message in BaseSLSRunnerTest

2021-03-07 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10681:
-

 Summary: Fix assertion failure message in BaseSLSRunnerTest
 Key: YARN-10681
 URL: https://issues.apache.org/jira/browse/YARN-10681
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


There is this failure message: 
https://github.com/apache/hadoop/blob/a89ca56a1b0eb949f56e7c6c5c25fdf87914a02f/hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/BaseSLSRunnerTest.java#L129-L130
"catched" should be replaced with "caught".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-10736) Fix GetApplicationsRequest JavaDoc

2021-04-14 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved YARN-10736.
---
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Fix GetApplicationsRequest JavaDoc
> --
>
> Key: YARN-10736
> URL: https://issues.apache.org/jira/browse/YARN-10736
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> getName and setName javadoc comments are mixed up



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-10766) [UI2] Bump moment-timezone to 0.5.33

2021-05-21 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved YARN-10766.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

> [UI2] Bump moment-timezone to 0.5.33
> 
>
> Key: YARN-10766
> URL: https://issues.apache.org/jira/browse/YARN-10766
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn, yarn-ui-v2
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: UI2_Correct_Timezone_After_Bump.png, 
> UI2_Wrong_Timezone_Before_Bump.png, YARN-10766.001.patch
>
>
> A handful of timezone related fixes were added into 0.5.33 release of 
> moment-timezone. An example for a scenario in which current UI2 behaviour is 
> not correct is a user from Australia, where the submission time showed on UI2 
> is one hour ahead of the actual time.
> Unfortunately moment-timezone data range files have been renamed, which is a 
> breaking change from the point of view of emberjs. Including all timezones 
> will increase the overall size of UI2 by an additional ~6 kbs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10787) Queue submit ACL check is wrong when CS queue is ambiguous

2021-05-25 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10787:
-

 Summary: Queue submit ACL check is wrong when CS queue is ambiguous
 Key: YARN-10787
 URL: https://issues.apache.org/jira/browse/YARN-10787
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Szilard Nemeth
Assignee: Gergely Pollak


Let's suppose we have a Capacity Scheduler configuration with 2 or more leaf 
queues with the same name in the queue hierarchy. That's what we call an 
ambiguous queue name.
Let's also enable ACL checks and define acl_submit_applications / 
acl_administer_queue configs with the correct value, adding the username to the 
ACL value there.


Here's a minimalistic YARN + CS config:

1. YARN config snippet: 
{code}
yarn.acl.enabletrue
{code}


2. CS config snippet:
{code}

yarn.scheduler.capacity.root.someparent1.queues
anyotherqueue1,somequeue,anyotherqueue2


yarn.scheduler.capacity.root.someparent2.queues
anyotherqueue3,somequeue,anyotherqueue4



yarn.scheduler.capacity.root.someparent1.somequeue.acl_submit_applications
someuser1 



yarn.scheduler.capacity.root.someparent2.somequeue.acl_submit_applications
someuser1 



yarn.scheduler.capacity.root.someparent1.somequeue.acl_administer_queue
someuser1 



yarn.scheduler.capacity.root.someparent2.somequeue.acl_administer_queue
someuser1 

{code}

So in this case, we have an ambiguous queue named "somequeue" under 2 different 
paths: 
- root.someparent1.somequeue
- root.someparent2.somequeue

When a user submits an application correctly with the full queue path e.g. 
root.someparent1.somequeue, YARN will still fail to place the application to 
that queue and will use the short name.



3. LOG SNIPPET
{code}
2021-05-20 22:04:32,031 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.placement.CSMappingPlacementRule: 
Placement final result 'root.someparent1.somequeue' for application 
'application_1621540945412_0001'
 2021-05-20 22:04:32,031 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Placed application 
with ID application_1621540945412_0001 in queue: somequeue, original submission 
queue was: root.someparent1.somequeue
 2021-05-20 22:04:32,031 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Ambiguous queue reference: somequeue please use full queue path instead.
 2021-05-20 22:04:32,031 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Application 'application_1621540945412_0001' is submitted without priority 
hence considering default queue/cluster priority: 0
 2021-05-20 22:04:32,032 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Priority '0' is acceptable in queue : somequeue for application: 
application_1621540945412_0001
 2021-05-20 22:04:32,993 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Exception in 
submitting application_1621540945412_0001
 org.apache.hadoop.yarn.exceptions.YarnException: 
org.apache.hadoop.security.AccessControlException: User someuser1 does not have 
permission to submit application_1621540945412_0001 to queue somequeue
 at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
{code}

4. FULL STACKTRACE:
{code}
 org.apache.hadoop.yarn.exceptions.YarnException: 
org.apache.hadoop.security.AccessControlException: User someuser1 does not have 
permission to submit application_1621540945412_0001 to queue somequeue
at 
org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:433)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:330)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:650)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
a

[jira] [Created] (YARN-10797) Logging parameter issues in scheduler package

2021-06-01 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10797:
-

 Summary: Logging parameter issues in scheduler package
 Key: YARN-10797
 URL: https://issues.apache.org/jira/browse/YARN-10797
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


1. There is a LOG.error call in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.QueueConfigurationAutoRefreshPolicy#editSchedule
 that provides logging arguments without a placeholder in the message.
{code}
  LOG.error("Failed to reload capacity scheduler config file - " +
"will use existing conf.", e.getMessage());
{code}

2. There is a LOG.debug call in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp#moveReservation
 that has a placeholder in the logging message but the argument is an instance 
of Throwable so the message does not require a placeholder.

{code}
 } catch (IllegalStateException e) {
  LOG.debug("Reserve on target node failed, e={}", e);
  return false;
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10798) Enhancements in RMAppManager: createAndPopulateNewRMApp and copyPlacementQueueToSubmissionContext

2021-06-01 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10798:
-

 Summary: Enhancements in RMAppManager: createAndPopulateNewRMApp 
and copyPlacementQueueToSubmissionContext
 Key: YARN-10798
 URL: https://issues.apache.org/jira/browse/YARN-10798
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


As a follow-up of YARN-10787, we need to do the following: 
1. Rename RMAppManager#copyPlacementQueueToSubmissionContext: This method not 
really copies anything, it simply overrides the queue value.
2. Add Debug log to print csqueue object before the authorization code: [Code 
block|https://github.com/apache/hadoop/blob/2541efa496ba0e7e096ee5ec3c08d64b62036402/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java#L459-L475]
3. Fix log messages: As 'copyPlacementQueueToSubmissionContext' overrides (not 
copies) the original queue name with the queue name from the PlacementContext, 
all calls to submissionContext.getQueue() will return the short queue name. 
This results in very misleading log messages as well, including the exception 
message itself:
{code}
 org.apache.hadoop.yarn.exceptions.YarnException: 
org.apache.hadoop.security.AccessControlException: User someuser1 does not have 
permission to submit application_1621540945412_0001 to queue somequeue
{code}
All log messages should print the original submission queue, if possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10799) Follow up of YARN-10787: Eliminate queue name replacement in ApplicationSubmissionContext based on placement context

2021-06-01 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10799:
-

 Summary: Follow up of YARN-10787: Eliminate queue name replacement 
in ApplicationSubmissionContext based on placement context
 Key: YARN-10799
 URL: https://issues.apache.org/jira/browse/YARN-10799
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


This is the long-term fix for YARN-10787: The task is to investigate if it's 
possible to eliminate RMAppManager#copyPlacementQueueToSubmissionContext.
This could introduce nasty backward incompatible issues with recovery, so it 
should be thought through really carefully.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10849) Clarify testcase documentation for TestServiceAM#testContainersReleasedWhenPreLaunchFails

2021-07-07 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10849:
-

 Summary: Clarify testcase documentation for 
TestServiceAM#testContainersReleasedWhenPreLaunchFails
 Key: YARN-10849
 URL: https://issues.apache.org/jira/browse/YARN-10849
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


There's a small comment added to testcase: 
org.apache.hadoop.yarn.service.TestServiceAM#testContainersReleasedWhenPreLaunchFails:
{code}
  // Test to verify that the containers are released and the
  // component instance is added to the pending queue when building the launch
  // context fails.
{code}

However, it was not clear for me why the "launch context" would fail.
While the test passes, it throws an Exception that tells the story. 

{code}
2021-07-06 18:31:04,438 ERROR [pool-275-thread-1] 
containerlaunch.ContainerLaunchService (ContainerLaunchService.java:run(122)) - 
[COMPINSTANCE compa-0 : container_1625589063422_0001_01_01]: Failed to 
launch container.
java.lang.IllegalArgumentException: Can not create a Path from a null string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:164)
at org.apache.hadoop.fs.Path.(Path.java:180)
at 
org.apache.hadoop.yarn.service.provider.tarball.TarballProviderService.processArtifact(TarballProviderService.java:39)
at 
org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:144)
at 
org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:107)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

This exception is thrown because the id of the Artifact object is unset (null) 
and TarballProviderService.processArtifact verifies it and it does not allow 
such artifacts.
The aim of this jira is to add a clarification comment or javadoc to this 
method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10853) Add more tests to TestUsersManager

2021-07-10 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10853:
-

 Summary: Add more tests to TestUsersManager
 Key: YARN-10853
 URL: https://issues.apache.org/jira/browse/YARN-10853
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth
 Attachments: UsersManager.html

Running TestUsersManager with code coverage measurements only gives 18% line 
coverage for class "UsersManager". This value is pretty low.
See the attached coverage report for that class.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-9551) TestTimelineClientV2Impl.testSyncCall fails intermittently

2021-07-28 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved YARN-9551.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

> TestTimelineClientV2Impl.testSyncCall fails intermittently
> --
>
> Key: YARN-9551
> URL: https://issues.apache.org/jira/browse/YARN-9551
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2, test
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Andras Gyori
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.1.5
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> TestTimelineClientV2Impl.testSyncCall fails intermittent
> {code:java}
> Failed
> org.apache.hadoop.yarn.client.api.impl.TestTimelineClientV2Impl.testSyncCall
> Failing for the past 1 build (Since #24083 )
> Took 1.5 sec.
> Error Message
> TimelineEntities not published as desired expected:<3> but was:<4>
> Stacktrace
> java.lang.AssertionError: TimelineEntities not published as desired 
> expected:<3> but was:<4>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestTimelineClientV2Impl.testSyncCall(TestTimelineClientV2Impl.java:251)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> Standard Output
> 2019-05-13 15:33:46,596 WARN  [main] util.NativeCodeLoader 
> (NativeCodeLoader.java:(60)) - Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 2019-05-13 15:33:47,763 INFO  [main] impl.TestTimelineClientV2Impl 
> (TestTimelineClientV2Impl.java:printReceivedEntities(413)) - Entities 
> Published @ index 0 : 1,
> 2019-05-13 15:33:47,764 INFO  [main] impl.TestTimelineClientV2Impl 
> (TestTimelineClientV2Impl.java:printReceivedEntities(413)) - Entities 
> Published @ index 1 : 2,
> 2019-05-13 15:33:47,764 INFO  [main] impl.Te

[jira] [Resolved] (YARN-6221) Entities missing from ATS when summary log file info got returned to the ATS before the domain log

2021-07-31 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved YARN-6221.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

> Entities missing from ATS when summary log file info got returned to the ATS 
> before the domain log
> --
>
> Key: YARN-6221
> URL: https://issues.apache.org/jira/browse/YARN-6221
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sushmitha Sreenivasan
>Assignee: Xiaomin Zhang
>Priority: Critical
> Fix For: 3.4.0, 3.2.3, 3.3.2, 3.1.5
>
> Attachments: YARN-6221.02.patch, YARN-6221.02.patch, 
> YARN-6221.branch-3.1.001.patch, YARN-6221.branch-3.2.001.patch, 
> YARN-6221.branch-3.3.001.patch, YARN-6221.branch-3.3.002.patch, 
> YARN-6221.patch, YARN-6221.patch
>
>
> Events data missing for the following entities:
> REQUEST: 
> {code:java}
> curl -k --negotiate -u: 
> http://:8188/ws/v1/timeline/TEZ_APPLICATION_ATTEMPT/tez_appattempt_1487706062210_0012_01
> {code}
> RESPONSE:
> {code:java}
> {"events":[],"entitytype":"TEZ_APPLICATION_ATTEMPT","entity":"tez_appattempt_1487706062210_0012_01","starttime":1487711606077,"domain":"Tez_ATS_application_1487706062210_0012","relatedentities":{"TEZ_DAG_ID":["dag_1487706062210_0012_2","dag_1487706062210_0012_1"]},"primaryfilters":{},"otherinfo":{}}
> {code}
> LOGS:
> {code:title=Timeline Server log entry}
> WARN  timeline.TimelineDataManager 
> (TimelineDataManager.java:doPostEntities(366)) - Skip the timeline entity: { 
> id: tez_application_1487706062210_0012, type: TEZ_APPLICATION }
> org.apache.hadoop.yarn.exceptions.YarnException: Domain information of the 
> timeline entity { id: tez_application_1487706062210_0012, type: 
> TEZ_APPLICATION } doesn't exist.
> at 
> org.apache.hadoop.yarn.server.timeline.security.TimelineACLsManager.checkAccess(TimelineACLsManager.java:122)
> at 
> org.apache.hadoop.yarn.server.timeline.TimelineDataManager.doPostEntities(TimelineDataManager.java:356)
> at 
> org.apache.hadoop.yarn.server.timeline.TimelineDataManager.postEntities(TimelineDataManager.java:316)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityLogInfo.doParse(LogInfo.java:204)
> at 
> org.apache.hadoop.yarn.server.timeline.LogInfo.parsePath(LogInfo.java:156)
> at 
> org.apache.hadoop.yarn.server.timeline.LogInfo.parseForStore(LogInfo.java:113)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore$AppLogs.parseSummaryLogs(EntityGroupFSTimelineStore.java:682)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore$AppLogs.parseSummaryLogs(EntityGroupFSTimelineStore.java:657)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore$ActiveLogParser.run(EntityGroupFSTimelineStore.java:870)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-10874) Refactor NM ContainerLaunch#getEnvDependencies's unit tests

2021-08-03 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved YARN-10874.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

> Refactor NM ContainerLaunch#getEnvDependencies's unit tests
> ---
>
> Key: YARN-10874
> URL: https://issues.apache.org/jira/browse/YARN-10874
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The YARN-10355 ticket states that the unit tests contains repeated code and 
> the test methods are too long. We decided to split that ticket into two 
> parts. The YARN-10355 will contain only the production code change (for the 
> windows variant, the linux variant refactor is not feasible with regex, the 
> original code is not the nicest, but it does it's thing).
>  
> Acceptance criteria:
>  * refactor the unit tests (e.g.: parameterised tests)
>  * extend the tests with extra checks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10877) SLSSchedulerCommons: Consider using application map from AbstractYarnScheduler and make event handling more consistent

2021-08-04 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10877:
-

 Summary: SLSSchedulerCommons: Consider using application map from 
AbstractYarnScheduler and make event handling more consistent
 Key: YARN-10877
 URL: https://issues.apache.org/jira/browse/YARN-10877
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


This is a follow-up of YARN-10552.
The improvements and things to check are coming from [this 
comment|https://issues.apache.org/jira/browse/YARN-10552?focusedCommentId=17277991&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17277991].

{quote}
appQueueMap was not present in SLSFairScheduler before (it was in 
SLSCapacityScheduler) however from 
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java#L163,
 it seems that the super class of the schedulers - 
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java#L159
 has this already. As such, do we really need to define a new map as a common 
map at all in SLSSchedulerCommons or can we somehow reuse the super class's 
map? It might need some code updates though.
In regards to the above point, considering SLSFairScheduler did not previously 
have any of the following code in handle() method:
{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-10882) Fix branch-3.1 build: zstd library is missing from the Dockerfile

2021-08-11 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved YARN-10882.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

> Fix branch-3.1 build: zstd library is missing from the Dockerfile
> -
>
> Key: YARN-10882
> URL: https://issues.apache.org/jira/browse/YARN-10882
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The branch-3.1 did not build on the Jenkins slave, because the zstd is 
> missing from the Dockerfile.
>  
> {code:java}
> [INFO] --- hadoop-maven-plugins:3.1.5-SNAPSHOT:cmake-compile (cmake-compile) 
> @ hadoop-common ---
> [INFO] Running cmake 
> /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-3286/src/hadoop-common-project/hadoop-common/src
>  
> -DGENERATED_JAVAH=/home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-3286/src/hadoop-common-project/hadoop-common/target/native/javah
>  -DJVM_ARCH_DATA_MODEL=64 -DREQUIRE_BZIP2=false -DREQUIRE_ISAL=false 
> -DREQUIRE_OPENSSL=true -DREQUIRE_SNAPPY=true -DREQUIRE_ZSTD=true -G Unix 
> Makefiles
> [INFO] with extra environment variables {}
> [WARNING] -- The C compiler identification is GNU 7.5.0
> [WARNING] -- The CXX compiler identification is GNU 7.5.0
> [WARNING] -- Check for working C compiler: /usr/bin/cc
> [WARNING] -- Check for working C compiler: /usr/bin/cc -- works
> [WARNING] -- Detecting C compiler ABI info
> [WARNING] -- Detecting C compiler ABI info - done
> [WARNING] -- Detecting C compile features
> [WARNING] -- Detecting C compile features - done
> [WARNING] -- Check for working CXX compiler: /usr/bin/c++
> [WARNING] -- Check for working CXX compiler: /usr/bin/c++ -- works
> [WARNING] -- Detecting CXX compiler ABI info
> [WARNING] -- Detecting CXX compiler ABI info - done
> [WARNING] -- Detecting CXX compile features
> [WARNING] -- Detecting CXX compile features - done
> [WARNING] -- Looking for pthread.h
> [WARNING] -- Looking for pthread.h - found
> [WARNING] -- Looking for pthread_create
> [WARNING] -- Looking for pthread_create - not found
> [WARNING] -- Looking for pthread_create in pthreads
> [WARNING] -- Looking for pthread_create in pthreads - not found
> [WARNING] -- Looking for pthread_create in pthread
> [WARNING] -- Looking for pthread_create in pthread - found
> [WARNING] -- Found Threads: TRUE  
> [WARNING] JAVA_HOME=, 
> JAVA_JVM_LIBRARY=/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
> [WARNING] JAVA_INCLUDE_PATH=/usr/lib/jvm/java-8-openjdk-amd64/include, 
> JAVA_INCLUDE_PATH2=/usr/lib/jvm/java-8-openjdk-amd64/include/linux
> [WARNING] Located all JNI components successfully.
> [WARNING] -- Found JNI: 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libjawt.so  
> [WARNING] -- Found ZLIB: /lib/x86_64-linux-gnu/libz.so.1 (found version 
> "1.2.11") 
> [WARNING] -- Found Snappy: /usr/lib/x86_64-linux-gnu/libsnappy.so.1
> [WARNING] CMake Error at CMakeLists.txt:120 (MESSAGE):
> [WARNING]   Required zstandard library could not be found.
> [WARNING]   ZSTD_LIBRARY=/usr/lib/x86_64-linux-gnu/libzstd.so.1, 
> ZSTD_INCLUDE_DIR=,
> [WARNING]   CUSTOM_ZSTD_INCLUDE_DIR=, CUSTOM_ZSTD_PREFIX=, 
> CUSTOM_ZSTD_INCLUDE= {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10886) Cluster based and parent based max capacity in Capacity Scheduler

2021-08-18 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10886:
-

 Summary: Cluster based and parent based max capacity in Capacity 
Scheduler
 Key: YARN-10886
 URL: https://issues.apache.org/jira/browse/YARN-10886
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth


We want to introduce the percentage modes relative to the cluster, not the 
parent, i.e 
 The property root.users.maximum-capacity will mean one of the following things:

*Either Parent Percentage:* maximum capacity relative to its parent. If it’s 
set to 50, then it means that the capacity is capped with respect to the 
parent. This can be covered by the current format, no change there.
 *Or Cluster Percentage:* maximum capacity expressed as a percentage of the 
overall cluster capacity. This case is the new scenario, for example:
 yarn.scheduler.capacity.root.users.max-capacity = c:50%
 yarn.scheduler.capacity.root.users.max-capacity = c:50%, c:30%



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10887) Investigation: Decouple capacity and max-capacity modes

2021-08-18 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10887:
-

 Summary: Investigation: Decouple capacity and max-capacity modes
 Key: YARN-10887
 URL: https://issues.apache.org/jira/browse/YARN-10887
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth


Currently Fair Scheduler supports the following 3 kinds of settings:
 * Single percentage (relative to parent) i.e. "X%"
 * A set of percentages (relative to parent) i.e. "X% cpu, Y% memory"
 * Absolute resources i.e. "X mb, Y vcores"

Please note, that the new, recommended format does not support the single 
percentage mode, only the last 2, like: “vcores=X, memory-mb=Y” or “vcores=X%, 
memory-mb=Y%” respectively.

It is recommended that all three formats are supported for maximum-capacity in 
CS after introducing weight mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10888) [Umbrella] New capacity modes for CS

2021-08-18 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10888:
-

 Summary: [Umbrella] New capacity modes for CS
 Key: YARN-10888
 URL: https://issues.apache.org/jira/browse/YARN-10888
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10889) [Umbrella] Flexible Auto Queue Creation in Capacity Scheduler - Tech debts

2021-08-18 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10889:
-

 Summary: [Umbrella] Flexible Auto Queue Creation in Capacity 
Scheduler - Tech debts
 Key: YARN-10889
 URL: https://issues.apache.org/jira/browse/YARN-10889
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-10505) Extend the maximum-capacity property to support Fair Scheduler migration

2021-08-18 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved YARN-10505.
---
Resolution: Duplicate

> Extend the maximum-capacity property to support Fair Scheduler migration
> 
>
> Key: YARN-10505
> URL: https://issues.apache.org/jira/browse/YARN-10505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
>
> Currently Fair Scheduler supports the following 3 kinds of settings:
>  * Single percentage (relative to parent) i.e. "X%"
>  * A set of percentages (relative to parent) i.e. "X% cpu, Y% memory"
>  * Absolute resources i.e. "X mb, Y vcores"
> Please note, that the new, recommended format does not support the single 
> percentage mode, only the last 2, like: “vcores=X, memory-mb=Y” or 
> “vcores=X%, memory-mb=Y%” respectively.
> Tasks to accomplish:
>  #  It is recommended that all three formats are supported for 
> maximum-capacity in CS after introducing weight mode.
>  # Also we want to introduce the percentage modes relative to the cluster, 
> not the parent, i.e The property root.users.maximum-capacity will mean one of 
> the following things: 
>  ## Either Parent Percentage: maximum capacity relative to its parent. If 
> it’s set to 50, then it means that the capacity is capped with respect to the 
> parent. This can be covered by the current format, no change there.
>  ## Or Cluster Percentage: maximum capacity expressed as a percentage of the 
> overall cluster capacity. This case is the new scenario, for example:
> {{yarn.scheduler.capacity.root.users.max-capacity = c:50%}}
> {{yarn.scheduler.capacity.root.users.max-capacity = c:50%, c:30%}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-9904) Investigate how resource allocation configuration could be more consistent in CapacityScheduler

2021-08-18 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved YARN-9904.
--
Resolution: Duplicate

Duplicate of YARN-10888

> Investigate how resource allocation configuration could be more consistent in 
> CapacityScheduler
> ---
>
> Key: YARN-9904
> URL: https://issues.apache.org/jira/browse/YARN-9904
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Gergely Pollák
>Priority: Major
>
> It would be nice if everywhere where a capacity can be defined could be 
> defined the same way:
>  * With fixed amounts (eg 1GB memory, 8 vcores, 3 GPU)
>  * With percentages
>  ** Percentage of all resources (eg 10% of all memory, vcore, GPU)
>  ** Percentage per resource type (eg 10% memory, 25% vcore, 50% GPU)
> We need to determine all configuration options where capacities can be 
> defined, and see if it is possible to extend the configuration, or if it 
> makes sense in that case.
> The outcome is a proposal for all the configurations which could/should be 
> changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-10891) Extend QueueInfo with max-parallel-apps in CapacityScheduler

2021-08-27 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved YARN-10891.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

> Extend QueueInfo with max-parallel-apps in CapacityScheduler
> 
>
> Key: YARN-10891
> URL: https://issues.apache.org/jira/browse/YARN-10891
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Add max-parallel-apps to the Cluster Scheduler API's 
> [response|[https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API]]
>  and extend the Yarn-API's QueueInfoProto with the max-parallel-apps property.
>  
> The REST api can be tested with:
> {code:java}
> curl "http://localhost:8088/ws/v1/cluster/scheduler"; | jq {code}
>  
> The protobuf api can be tested with the yarn client:
> {code:java}
> yarn queue --status root.queue.foo
> Queue Information :
> Queue Name : foo
> Queue Path : root.queue.foo
>   State : RUNNING
>   Capacity : 75.00%
>   Current Capacity : .00%
>   Maximum Capacity : 100.00%
>   Weight : -1.00
>   Maximum Parallel Apps : 9
>   Default Node Label expression : 
>   Accessible Node Labels : *
>   Preemption : disabled
>   Intra-queue Preemption : disabled {code}
>  
> About the max-parallel-apps:
> Maximum number of applications that can run at the same time. Unlike to 
> {{maximum-applications}}, application submissions are _not_ rejected when 
> this limit is reached. Instead they stay in {{ACCEPTED}} state until they are 
> eligible to run. This can be set for all queues with 
> {{yarn.scheduler.capacity.max-parallel-apps}} and can also be overridden on a 
> per queue basis by setting 
> {{yarn.scheduler.capacity..max-parallel-apps}}. Integer value is 
> expected. By default, there is no limit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10904) Investigate: Remove unnecessary fields from AbstractCSQueue

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10904:
-

 Summary: Investigate: Remove unnecessary fields from 
AbstractCSQueue
 Key: YARN-10904
 URL: https://issues.apache.org/jira/browse/YARN-10904
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10905) Investigate if AbstractCSQueue#configuredNodeLabels vs. QueueCapacities#getExistingNodeLabels holds the same data

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10905:
-

 Summary: Investigate if AbstractCSQueue#configuredNodeLabels vs. 
QueueCapacities#getExistingNodeLabels holds the same data
 Key: YARN-10905
 URL: https://issues.apache.org/jira/browse/YARN-10905
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


The task is to investigate the field AbstractCSQueue#configuredNodeLabels holds 
the same data or not with QueueCapacities#getExistingNodeLabels.
Obviously, we don't want double-entry bookkeeping so if the data is the same, 
we can remove this or that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10906) Create QueueConfig object for generic queue-specifiec fields

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10906:
-

 Summary: Create QueueConfig object for generic queue-specifiec 
fields
 Key: YARN-10906
 URL: https://issues.apache.org/jira/browse/YARN-10906
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


This is about config fields in AbstractCSQueue.
Document if a config is only coming from the Configuration object or being 
altered or used for other purposes.
Also, restrict the visibilty and surface of modifcation from subclasses as much 
as we can.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10907) Investigate: Minimize usages of AbstractCSQueue#csContext

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10907:
-

 Summary: Investigate: Minimize usages of AbstractCSQueue#csContext
 Key: YARN-10907
 URL: https://issues.apache.org/jira/browse/YARN-10907
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


Context objects can be a sign of a code smell as they can contain many, 
possible loosely related references to other objects.
CapacitySchedulerContext seems like this.
This task is to investigate how the field AbstractCSQueue#csContext is being 
used from this class and possibly keeping the usage of this context class on 
the bare minimum. 
Related article: https://wiki.c2.com/?ContextObjectsAreEvil



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10908) Investigate: Why AbstractCSQueue#authorizer is constructed for each queue

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10908:
-

 Summary: Investigate: Why AbstractCSQueue#authorizer is 
constructed for each queue
 Key: YARN-10908
 URL: https://issues.apache.org/jira/browse/YARN-10908
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


AbstractCSQueue#hasAccess checks if a certain user with an ACL has permission 
to submit an app to the queue.
Checking the permission itself is performed by calling 
ConfiguredYarnAuthorizer#checkPermission. 
Interestingly, all queue objects have a reference to a 
YarnAuthorizationProvider instance.
What looks weird is how the authorizer is initialized: 
https://github.com/apache/hadoop/blob/ac0a4e7f589e7280268013c56339b3b257d332a0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java#L428
It just calls YarnAuthorizationProvider.getInstance with the Configuration 
object as an argument so actually, all queue objects have an instance 
constructed with the same configuration, and the getInstance method does not 
gather any queue-specific configuration value from the object so this is a 
waste of memory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10909) AbstractCSQueue: Check for methods added for test code but not annotated with VisibleForTesting

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10909:
-

 Summary: AbstractCSQueue: Check for methods added for test code 
but not annotated with VisibleForTesting
 Key: YARN-10909
 URL: https://issues.apache.org/jira/browse/YARN-10909
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


For example, AbstractCSQueue#setMaxCapacity(float) is only used for testing, 
but not annotated. There can be other methods in this class like this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10910) AbstractCSQueue#setupQueueConfigs: Separate validation logic from initialization logic

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10910:
-

 Summary: AbstractCSQueue#setupQueueConfigs: Separate validation 
logic from initialization logic
 Key: YARN-10910
 URL: https://issues.apache.org/jira/browse/YARN-10910
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


AbstractCSQueue#setupQueueConfigs contains initialization + validation logic. 
The task is to factor out validation logic from this method to a separate 
method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10911) AbstractCSQueue: Create a separate class for usernames and weights that are travelling in a Map

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10911:
-

 Summary: AbstractCSQueue: Create a separate class for usernames 
and weights that are travelling in a Map
 Key: YARN-10911
 URL: https://issues.apache.org/jira/browse/YARN-10911
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


Related methods that are using the Map:
AbstractCSQueue#getUserWeightsFromHierarchy
CapacitySchedulerConfiguration#getAllUserWeightsForQueue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10912) AbstractCSQueue#updateConfigurableResourceRequirement: Separate validation logic from initialization logic

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10912:
-

 Summary: AbstractCSQueue#updateConfigurableResourceRequirement: 
Separate validation logic from initialization logic
 Key: YARN-10912
 URL: https://issues.apache.org/jira/browse/YARN-10912
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


AbstractCSQueue#updateConfigurableResourceRequirement contains initialization + 
validation logic. The task is to factor out validation logic from this method 
to a separate method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10913) AbstractCSQueue: Group preemption methods and fields into a separate class

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10913:
-

 Summary: AbstractCSQueue: Group preemption methods and fields into 
a separate class 
 Key: YARN-10913
 URL: https://issues.apache.org/jira/browse/YARN-10913
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


Relevant methods: isQueueHierarchyPreemptionDisabled, 
isIntraQueueHierarchyPreemptionDisabled, getTotalKillableResource, 
getKillableContainers



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10914) Simplify duplicated code for tracking ResourceUsage

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10914:
-

 Summary: Simplify duplicated code for tracking ResourceUsage
 Key: YARN-10914
 URL: https://issues.apache.org/jira/browse/YARN-10914
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


Alternatively, those could be moved to some computation class, too.
Relevant methods: 
incReservedResource, decReservedResource, incPendingResource, 
decPendingResource, incUsedResource, decUsedResource



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10915) AbstractCSQueue: Simplify complex logic in methods: deriveCapacityFromAbsoluteConfigurations and updateEffectiveResources

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10915:
-

 Summary: AbstractCSQueue: Simplify complex logic in methods: 
deriveCapacityFromAbsoluteConfigurations and updateEffectiveResources
 Key: YARN-10915
 URL: https://issues.apache.org/jira/browse/YARN-10915
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10917) Investigate and simplify CapacitySchedulerConfigValidator#validateQueueHierarchy

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10917:
-

 Summary: Investigate and simplify 
CapacitySchedulerConfigValidator#validateQueueHierarchy
 Key: YARN-10917
 URL: https://issues.apache.org/jira/browse/YARN-10917
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10916) Investigate and simplify GuaranteedOrZeroCapacityOverTimePolicy#computeQueueManagementChanges

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10916:
-

 Summary: Investigate and simplify 
GuaranteedOrZeroCapacityOverTimePolicy#computeQueueManagementChanges
 Key: YARN-10916
 URL: https://issues.apache.org/jira/browse/YARN-10916
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10918) Simplify code of method: CapacitySchedulerQueueManager#parseQueue

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10918:
-

 Summary: Simplify code of method: 
CapacitySchedulerQueueManager#parseQueue
 Key: YARN-10918
 URL: https://issues.apache.org/jira/browse/YARN-10918
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10919) Remove LeafQueue#scheduler field

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10919:
-

 Summary: Remove LeafQueue#scheduler field 
 Key: YARN-10919
 URL: https://issues.apache.org/jira/browse/YARN-10919
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


As it is the same object as AbstractCSQueue#csContext (from parent class).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10920) Created a dedicated class for Node Labels

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10920:
-

 Summary: Created a dedicated class for Node Labels
 Key: YARN-10920
 URL: https://issues.apache.org/jira/browse/YARN-10920
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


In the current codebase, Node labels are simple strings. It's very error-prone 
to use Strings as it can contain basically anything. Moreover, it's easier to 
keep track of all usages if we have a dedicated class for Node labels.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10921) AbstractCSQueue: Node Labels logic is scattered and iteration logic is repeated all over the place

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10921:
-

 Summary: AbstractCSQueue: Node Labels logic is scattered and 
iteration logic is repeated all over the place
 Key: YARN-10921
 URL: https://issues.apache.org/jira/browse/YARN-10921
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


TODO items:
- Check original Node labels epic / jiras?
- Think about ways to improve repetitive iteration on configuredNodeLabels
- Search for: "String label" in code

Code blocks to handle Node labels:
- AbstractCSQueue#setupQueueConfigs
- AbstractCSQueue#getQueueConfigurations
- AbstractCSQueue#accessibleToPartition
- AbstractCSQueue#getNodeLabelsForQueue
- AbstractCSQueue#updateAbsoluteCapacities
- AbstractCSQueue#updateConfigurableResourceRequirement
- CSQueueUtils#loadCapacitiesByLabelsFromConf
- AutoCreatedLeafQueue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10922) Investigation: Verify if legacy AQC works as documented

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10922:
-

 Summary: Investigation: Verify if legacy AQC works as documented
 Key: YARN-10922
 URL: https://issues.apache.org/jira/browse/YARN-10922
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


Quoting from the Capacity Scheduler documentation: 
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
Section: "Dynamic Auto-Creation and Management of Leaf Queues"

The task is to verify if legacy AQC works like this: 
{quote}
The parent queue which has been enabled for auto leaf queue creation, supports 
the configuration of template parameters for automatic configuration of the 
auto-created leaf queues. The auto-created queues support all of the leaf queue 
configuration parameters except for Queue ACL, Absolute Resource 
configurations. Queue ACLs are currently inherited from the parent queue i.e 
they are not configurable on the leaf queue template
{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10923) Investigate if creating separate classes for Dynamic Leaf / Dynamic Parent queues makes sense

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10923:
-

 Summary: Investigate if creating separate classes for Dynamic Leaf 
/ Dynamic Parent queues makes sense
 Key: YARN-10923
 URL: https://issues.apache.org/jira/browse/YARN-10923
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


First, create 2 new classes: DynamicLeaf / DynamicParent.
Then, gradually move AQC functionality from ManagedParentQueue / 
AutoCreatedLeafQueue.
Revisit if AbstractManagedParentQueue makes sense at all.

ManagedParent / Parent: Is there an actual need for the two classes?
- Currently the two different parents can cause confusion and chaos
- Can be a “back two the drawing board” task

The ultimate goal is to have a common class for AQC-enabled parent and 
investigate if separate class for AutoCreatedLeafQueue is required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10924) Clean up CapacityScheduler#initScheduler

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10924:
-

 Summary: Clean up CapacityScheduler#initScheduler
 Key: YARN-10924
 URL: https://issues.apache.org/jira/browse/YARN-10924
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


The task is to define methods in order to initialize related fields together 
and call these method from initScheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10925) Simplify AbstractCSQueue#setupQueueConfigs

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10925:
-

 Summary: Simplify AbstractCSQueue#setupQueueConfigs
 Key: YARN-10925
 URL: https://issues.apache.org/jira/browse/YARN-10925
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10926) Test validation after YARN-10504 and YARN-10506: Check if modified test expectations are correct or not

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10926:
-

 Summary: Test validation after YARN-10504 and YARN-10506: Check if 
modified test expectations are correct or not
 Key: YARN-10926
 URL: https://issues.apache.org/jira/browse/YARN-10926
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


YARN-10504 and YARN-10506 modified some test expectations.
The task is to verify if those expectations are correct.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10927) Explain assertion literals in testcases of CapacityScheduler and related test classes

2021-08-29 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10927:
-

 Summary: Explain assertion literals in testcases of 
CapacityScheduler and related test classes
 Key: YARN-10927
 URL: https://issues.apache.org/jira/browse/YARN-10927
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


In the existing tests the assertion literals could be explained for easier 
understanding As there are too many test classes, we can tackle this more 
easily in a feature by feature fashion



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10929) Refrain from creating new Configuration object in AbstractManagedParentQueue#initializeLeafQueueConfigs

2021-08-30 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10929:
-

 Summary: Refrain from creating new Configuration object in 
AbstractManagedParentQueue#initializeLeafQueueConfigs
 Key: YARN-10929
 URL: https://issues.apache.org/jira/browse/YARN-10929
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


AbstractManagedParentQueue#initializeLeafQueueConfigs creates a new 
CapacitySchedulerConfiguration with templated configs only. We should stop 
doing this. 
Also, there is a sorting of config keys in this method, but in the end the 
configs are added to the Configuration object which is an enhanced Map.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-10901) Permission checking error on an existing directory in LogAggregationFileController#verifyAndCreateRemoteLogDir

2021-09-08 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved YARN-10901.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

> Permission checking error on an existing directory in 
> LogAggregationFileController#verifyAndCreateRemoteLogDir
> --
>
> Key: YARN-10901
> URL: https://issues.apache.org/jira/browse/YARN-10901
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.2.2, 3.3.1
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> *LogAggregationFileController.verifyAndCreateRemoteLogDir* tries to check 
> whether the remote file system has set/modify permissions on the 
> _yarn.nodemanager.remote-app-log-dir:_
>  
> {code:java}
>   //Check if FS has capability to set/modify permissions
>   try {
> remoteFS.setPermission(qualified, new 
> FsPermission(TLDIR_PERMISSIONS));
>   } catch (UnsupportedOperationException use) {
> LOG.info("Unable to set permissions for configured filesystem since"
> + " it does not support this", remoteFS.getScheme());
> fsSupportsChmod = false;
>   } catch (IOException e) {
> LOG.warn("Failed to check if FileSystem suppports permissions on "
> + "remoteLogDir [" + remoteRootLogDir + "]", e);
>   } {code}
> But it will fail if the _yarn.nodemanager.remote-app-log-dir_'s owner is not 
> the same as the NodeManager's user.
>  
> Example error
> {code:java}
> 2021-08-27 11:33:21,649 WARN 
> org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController:
>  Failed to check if FileSystem suppports permissions on remoteLogDir 
> [/tmp/logs]2021-08-27 11:33:21,649 WARN 
> org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController:
>  Failed to check if FileSystem suppports permissions on remoteLogDir 
> [/tmp/logs]org.apache.hadoop.security.AccessControlException: Permission 
> denied. user=yarn is not the owner of inode=/tmp/logs at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkOwner(FSPermissionChecker.java:464)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:407)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermissionWithContext(FSPermissionChecker.java:417)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:297)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1950)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1931)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1876)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setPermission(FSDirAttrOp.java:64)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1976)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:858)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:548)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.la

[jira] [Resolved] (YARN-10852) Optimise CSConfiguration getAllUserWeightsForQueue

2021-09-10 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved YARN-10852.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

> Optimise CSConfiguration getAllUserWeightsForQueue
> --
>
> Key: YARN-10852
> URL: https://issues.apache.org/jira/browse/YARN-10852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> CapacitySchedulerConfiguration#getAllUsersWeightsForQueue is called in a 
> O(n^2) fashion in AbstractCSQueue#setupQueueConfigs. This could be optimised 
> by incorporating the ConfigurationProperties introduced in YARN-10838.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-10872) Replace getPropsWithPrefix calls in AutoCreatedQueueTemplate

2021-09-10 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved YARN-10872.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

> Replace getPropsWithPrefix calls in AutoCreatedQueueTemplate
> 
>
> Key: YARN-10872
> URL: https://issues.apache.org/jira/browse/YARN-10872
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Andras Gyori
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> With the introduction of YARN-10838, it is now possible to optimise 
> AutoCreatedQueueTemplate and replace calls of getPropsWithPrefix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-10908) Investigate: Why AbstractCSQueue#authorizer is constructed for each queue

2021-09-10 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved YARN-10908.
---
Resolution: Invalid

> Investigate: Why AbstractCSQueue#authorizer is constructed for each queue
> -
>
> Key: YARN-10908
> URL: https://issues.apache.org/jira/browse/YARN-10908
> Project: Hadoop YARN
>  Issue Type: Sub-task
>    Reporter: Szilard Nemeth
>        Assignee: Szilard Nemeth
>Priority: Minor
>
> AbstractCSQueue#hasAccess checks if a certain user with an ACL has permission 
> to submit an app to the queue.
> Checking the permission itself is performed by calling 
> ConfiguredYarnAuthorizer#checkPermission. 
> Interestingly, all queue objects have a reference to a 
> YarnAuthorizationProvider instance.
> What looks weird is how the authorizer is initialized: 
> https://github.com/apache/hadoop/blob/ac0a4e7f589e7280268013c56339b3b257d332a0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java#L428
> It just calls YarnAuthorizationProvider.getInstance with the Configuration 
> object as an argument so actually, all queue objects have an instance 
> constructed with the same configuration, and the getInstance method does not 
> gather any queue-specific configuration value from the object so this is a 
> waste of memory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10942) CLONE - Investigate: Remove unnecessary fields from AbstractCSQueue or group fields by feature if possible

2021-09-13 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10942:
-

 Summary: CLONE - Investigate: Remove unnecessary fields from 
AbstractCSQueue or group fields by feature if possible
 Key: YARN-10942
 URL: https://issues.apache.org/jira/browse/YARN-10942
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-10912) AbstractCSQueue#updateConfigurableResourceRequirement: Separate validation logic from initialization logic

2021-09-14 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved YARN-10912.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

> AbstractCSQueue#updateConfigurableResourceRequirement: Separate validation 
> logic from initialization logic
> --
>
> Key: YARN-10912
> URL: https://issues.apache.org/jira/browse/YARN-10912
> Project: Hadoop YARN
>  Issue Type: Sub-task
>        Reporter: Szilard Nemeth
>Assignee: Tamas Domok
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> AbstractCSQueue#updateConfigurableResourceRequirement contains initialization 
> + validation logic. The task is to factor out validation logic from this 
> method to a separate method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-10870) Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM Scheduler page

2021-09-14 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved YARN-10870.
---
Resolution: Fixed

> Missing user filtering check -> yarn.webapp.filter-entity-list-by-user for RM 
> Scheduler page
> 
>
> Key: YARN-10870
> URL: https://issues.apache.org/jira/browse/YARN-10870
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Siddharth Ahuja
>Assignee: Gergely Pollák
>Priority: Major
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: YARN-10870.001.patch, YARN-10870.002.patch, 
> YARN-10870.branch-3.1.002.patch, YARN-10870.branch-3.2.002.patch, 
> YARN-10870.branch-3.3.002.patch
>
>
> Non-permissible users are (incorrectly) able to view application submitted by 
> another user on the RM's Scheduler UI (not Applications UI), where 
> _non-permissible users_ are non-application-owners and are not present in the 
> application ACL -> mapreduce.job.acl-view-job, nor present in the Queue ACL 
> as a Queue admin to which this job was submitted to" (see [1] where both the 
> filter setting introduced by YARN-8319 & ACL checks are performed):
> The issue can be reproduced easily by having the setting 
> {{yarn.webapp.filter-entity-list-by-user}} set to true in yarn-site.xml.
> The above disallows non-permissible users from viewing another user's 
> applications in the Applications page, but not in the Scheduler's page.
> The filter setting seems to be getting checked only on the getApps() call but 
> not while rendering the apps information on the Scheduler page. This seems to 
> be a "missed" feature from YARN-8319.
> Following pre-requisites are needed to reproduce the issue:
> * Kerberized cluster,
> * SPNEGO enabled for HDFS & YARN,
> * Add test users - systest and user1 on all nodes.
> * Add kerberos princs for the above users.
> * Create HDFS user dirs for above users and chown them appropriately.
> * Run a sample MR Sleep job and test.
> Steps to reproduce the issue:
> * kinit as "systest" user and run a sample MR sleep job from one of the nodes 
> in the cluster:
> {code}
> yarn jar  sleep -m 1 -mt 
> 360
> {code}
> * kinit as "user1" from Mac as an example (this assumes you've copied the 
> /etc/krb5.conf from the cluster to your Mac's /private/etc folder already for 
> Spengo auth).
> * Open the Applications page. user1 cannot view the job being run by systest. 
> This is correct.
> * Open the Scheduler page. user1 *CAN* view the job being run by systest. 
> This is *INCORRECT*.
> [1] 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java#L676



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10943) AbstractCSQueue: Create separate class for encapsulating Min / Max Resource

2021-09-14 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10943:
-

 Summary: AbstractCSQueue: Create separate class for encapsulating 
Min / Max Resource
 Key: YARN-10943
 URL: https://issues.apache.org/jira/browse/YARN-10943
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


There are certain methods where min and max Resources are used in tandem.
Some examples of these kind of methods: 
- getMinimumAbsoluteResource / getMaximumAbsoluteResource
- updateConfigurableResourceLimits: 
- It invokes setConfiguredMinResource / setConfiguredMaxResource on 
QueueResourceQuotas. That object could define a simple method that receives the 
MinMaxResource alone.
- Validator methods are also receiving min/max resources as separate 
parameters, which could be tied together.
- updateEffectiveResources: It performs operations with effective min/max 
resources.

Alternatively, 2 classes could be created: 
- One for EffectiveMinMaxResource
- And another for AbsoluteMinMaxResource



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10944) AbstractCSQueue: Eliminate code duplication in overloaded versions of setMaxCapacity

2021-09-14 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10944:
-

 Summary: AbstractCSQueue: Eliminate code duplication in overloaded 
versions of setMaxCapacity
 Key: YARN-10944
 URL: https://issues.apache.org/jira/browse/YARN-10944
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


Methods are:
- AbstractCSQueue#setMaxCapacity(float)
- AbstractCSQueue#setMaxCapacity(java.lang.String, float)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10945) Add javadoc to all methods of AbstractCSQueue

2021-09-14 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10945:
-

 Summary: Add javadoc to all methods of AbstractCSQueue
 Key: YARN-10945
 URL: https://issues.apache.org/jira/browse/YARN-10945
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10946) AbstractCSQueue: Create separate class for constructing Queue API objects

2021-09-14 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10946:
-

 Summary: AbstractCSQueue: Create separate class for constructing 
Queue API objects
 Key: YARN-10946
 URL: https://issues.apache.org/jira/browse/YARN-10946
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


Relevant methods are: 
- 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueConfigurations
- 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueInfo
- 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue#getQueueStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10948) Rename SchedulerQueue#activeQueue to activateQueue

2021-09-14 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10948:
-

 Summary: Rename SchedulerQueue#activeQueue to activateQueue
 Key: YARN-10948
 URL: https://issues.apache.org/jira/browse/YARN-10948
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10947) Simplify AbstractCSQueue#initializeQueueState

2021-09-14 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10947:
-

 Summary: Simplify AbstractCSQueue#initializeQueueState
 Key: YARN-10947
 URL: https://issues.apache.org/jira/browse/YARN-10947
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10949) Simplify AbstractCSQueue#updateMaxAppRelatedField and find a more meaningful name for this method

2021-09-14 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10949:
-

 Summary: Simplify AbstractCSQueue#updateMaxAppRelatedField and 
find a more meaningful name for this method
 Key: YARN-10949
 URL: https://issues.apache.org/jira/browse/YARN-10949
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10950) Code cleanup in QueueCapacities

2021-09-14 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10950:
-

 Summary: Code cleanup in QueueCapacities
 Key: YARN-10950
 URL: https://issues.apache.org/jira/browse/YARN-10950
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


- Make fields final: capacitiesMap, readLock, writeLock
- Remove explicit type arguments, e.g. new HashMap();
- Remove abbrevations and avoid string concatenation in 
QueueCapacities.Capacities#toString
- Remove unnecessary comments, e.g. "/* Used Capacity Getter and Setter */" & 
"/* Absolute Used Capacity Getter and Setter */"
- And probably many more..



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10951) CapacityScheduler: Move all fields and initializer code that belongs to async scheduling to a new class

2021-09-14 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10951:
-

 Summary: CapacityScheduler: Move all fields and initializer code 
that belongs to async scheduling to a new class
 Key: YARN-10951
 URL: https://issues.apache.org/jira/browse/YARN-10951
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


There are certain if-statements that control whether to initialize some 
async-scheduling related fields, based on the value of field called 
'scheduleAsynchronously'. 
We could move these fields to a separate class for clarity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10952) Move CapacityScheduler#updatePlacementRules elsewhere

2021-09-14 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10952:
-

 Summary: Move CapacityScheduler#updatePlacementRules elsewhere
 Key: YARN-10952
 URL: https://issues.apache.org/jira/browse/YARN-10952
 Project: Hadoop YARN
  Issue Type: Sub-task
 Environment: This method does belong strongly to this class, as it 
techniqually just a parser for MappingRules based on the provided  
Configuration object.
The method could be static and should also receive 
rmContext.getQueuePlacementManager() along with the Configuration.
The updateRules method of PlacementManager is already public.
Reporter: Szilard Nemeth






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10953) Make CapacityScheduler#getOrCreateQueueFromPlacementContext more easy to comprehend

2021-09-14 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10953:
-

 Summary: Make 
CapacityScheduler#getOrCreateQueueFromPlacementContext more easy to comprehend
 Key: YARN-10953
 URL: https://issues.apache.org/jira/browse/YARN-10953
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth


1. Most of the method body is wrapped in an if-statement that checks if the 
queue is null. We could negate this and return immediately if the queue != 
null, so we don't need a large if statement.
2. Similarly in that large if body, there's a check for 
fallbackContext.hasParentQueue(). If it's true, we are having yet another large 
if-body. We should also negate this condition and return immediately if it's 
false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10954) Remove commented code block from CSQueueUtils#loadCapacitiesByLabelsFromConf

2021-09-14 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10954:
-

 Summary: Remove commented code block from 
CSQueueUtils#loadCapacitiesByLabelsFromConf
 Key: YARN-10954
 URL: https://issues.apache.org/jira/browse/YARN-10954
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10959) CLONE - AbstractCSQueue: Group preemption methods and fields into a separate class

2021-09-17 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10959:
-

 Summary: CLONE - AbstractCSQueue: Group preemption methods and 
fields into a separate class 
 Key: YARN-10959
 URL: https://issues.apache.org/jira/browse/YARN-10959
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


Relevant methods: isQueueHierarchyPreemptionDisabled, 
isIntraQueueHierarchyPreemptionDisabled, getTotalKillableResource, 
getKillableContainers



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-10937) Fix log message arguments in LogAggregationFileController

2021-09-19 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved YARN-10937.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

> Fix log message arguments in LogAggregationFileController
> -
>
> Key: YARN-10937
> URL: https://issues.apache.org/jira/browse/YARN-10937
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.1
>Reporter: Tamas Domok
>Assignee: Tibor Kovács
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> More arguments provided (1) than placeholders specified (0) in the following 
> log message:
> {code:java}
> LOG.info("Unable to set permissions for configured filesystem since"
> + " it does not support this", remoteFS.getScheme());{code}
> This is logged two times, both of them is affected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10983) Follow-up changes for YARN-10904

2021-10-19 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10983:
-

 Summary: Follow-up changes for YARN-10904
 Key: YARN-10983
 URL: https://issues.apache.org/jira/browse/YARN-10983
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


Links to Github comments from [~gandras]:
- https://github.com/apache/hadoop/pull/3551#discussion_r730728783
- https://github.com/apache/hadoop/pull/3551#discussion_r730729218
- https://github.com/apache/hadoop/pull/3551#discussion_r730729717
- https://github.com/apache/hadoop/pull/3551#discussion_r730736115
- https://github.com/apache/hadoop/pull/3551#discussion_r730741596


The required changes are the following:
- QueueNodeLabelsSettings: Incorporate QueuePath
- QueueAppLifetimeAndLimitSettings: Simplify parentQueue null check
- QueueAllocationSettings: Remove comment starting with: "/* YARN-10869: When 
using AutoCreatedLeafQueues, the passed configuration" - Only if YARN-10929 got 
merged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10984) Add tests to CapacitySchedulerConfiguration

2021-10-20 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10984:
-

 Summary: Add tests to CapacitySchedulerConfiguration
 Key: YARN-10984
 URL: https://issues.apache.org/jira/browse/YARN-10984
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10985) CLONE - Add tests to CapacitySchedulerConfiguration

2021-10-20 Thread Szilard Nemeth (Jira)

Szilard Nemeth created YARN-10985:
-

 Summary: CLONE - Add tests to CapacitySchedulerConfiguration
 Key: YARN-10985
 URL: https://issues.apache.org/jira/browse/YARN-10985
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Szilard Nemeth






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-10930) Introduce universal configured capacity vector

2021-10-22 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved YARN-10930.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

> Introduce universal configured capacity vector
> --
>
> Key: YARN-10930
> URL: https://issues.apache.org/jira/browse/YARN-10930
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: capacity_scheduler_queue_capacity.html
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> The proposal is to introduce a capacity resource vector that is universally 
> parsed for every queue. CapacityResourceVector is a way to unite the current 
> capacity modes (weight, percentage, absolute), while maintaining flexibility 
> and extendability.
> CapacityResourceVector is a good fit for the existing capacity configs, for 
> example:
> * percentage mode: root.example.capacity 50 is a syntactic sugar for 
> [memory=50%, vcores=50%, ]
> * absolute mode: root.example.capacity [memory=1024, vcores=2] is a natural 
> fit for the vector, there is no need for additional settings
> CapacityResourceVector will be used in a future refactor, to unify the 
> resource calculation and lift the limitation imposed on the queue hierarchy 
> capacity settings (eg. can not use both absolute resource and percentage in 
> the same hierarchy etc...)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-10758) Mixed mode: Allow relative and absolute mode in the same queue hierarchy

2021-10-26 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved YARN-10758.
---
Resolution: Duplicate

> Mixed mode: Allow relative and absolute mode in the same queue hierarchy
> 
>
> Key: YARN-10758
> URL: https://issues.apache.org/jira/browse/YARN-10758
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Major
>
> Fair Scheduler supports mixed mode for shares (FS equivalent of capacity). An 
> example scenario of such configuration:
> {noformat}
> root.a.capacity [memory-mb=7268, vcores=8]{noformat}
> {noformat}
> root.a.a1.capacity 50{noformat}
> {noformat}
> root.a.a2.capacity 50{noformat}
> This above scenario is not supported in CS today because despite CS already 
> permits using weight mode and relative/percentage mode in the same hierarchy 
> the absolute mode and relative mode is mutually exclusive.
> This improvement is a natural extension of CS to lift this limitation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-9936) Support vector of capacity percentages in Capacity Scheduler configuration

2021-10-26 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved YARN-9936.
--
Resolution: Invalid

> Support vector of capacity percentages in Capacity Scheduler configuration
> --
>
> Key: YARN-9936
> URL: https://issues.apache.org/jira/browse/YARN-9936
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Zoltan Siegl
>Assignee: Andras Gyori
>Priority: Major
> Attachments: Capacity Scheduler support of “vector of resources 
> percentage”.pdf
>
>
> Currently, the Capacity Scheduler queue configuration supports two ways to 
> set queue capacity.
>  * In percentage of all available resources as a float ( eg. 25.0 ) means 25% 
> of the resources of its parent queue for all resource types equally (eg. 25% 
> of all memory, 25% of all CPU cores, and 25% of all available GPU in the 
> cluster) The percentages of all queues has to add up to 100%.
>  * In an absolute amount of resources ( e.g. 
> memory=4GB,vcores=20,yarn.io/gpu=4 ). The amount of all resources in the 
> queues has to be less than or equal to all resources in the 
> cluster.{color:#de350b}Actually, the above is not supported, we only support 
> memory and vcores now in absolute mode, we should extend in {color} 
> YARN-10503.
> Apart from these two already existing ways, there is a demand to add capacity 
> percentage of each available resource type separately. (eg. 
> {{memory=20%,vcores=40%,yarn.io/gpu=100%}}).
>  At the same time, a similar concept should be included with queues 
> maximum-capacity as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

1 2 3 >

1 - 100 of 266 matches

Mail list logo