[jira] [Commented] (YARN-10022) Create RM Rest API to validate a CapacityScheduler Configuration

2020-01-16 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016697#comment-17016697
 ] 

Prabhu Joseph commented on YARN-10022:
--

Thanks [~kmarton] for the patch. Have few comments on the patch

1. CapacityScheduler#reinitialize replaces calling validateConf() with below 
which is not required.
{code:java}
CapacitySchedulerConfigValidator.validateMemoryAllocation(this.conf);
CapacitySchedulerConfigValidator.validateVCores(this.conf);
{code}
 

2. In CapacityScheduler#reinitialize, distinguishRuleSet returned by 
validatePlacementRules can be used.
{code:java}
+CapacitySchedulerConfigValidator
+.validatePlacementRules(placementRuleStrs);
+Set distinguishRuleSet = new HashSet<>(placementRuleStrs);
{code}
 

3. In CapacitySchedulerQueueManager, there is a typo
{code:java}
-// When failing over, if using configuration store, don't validate queue
+// When failing over, if using configuration store, don't validate queueR
{code}
 

4. In RMWSConsts, method name is validateAndGetSchedulerConfiguration
{code:java}
+  /** Path for {@code RMWebServiceProtocol#validateCapacitySchedulerConfig}. */
{code}
 

5. In RMWebServices, it has to be initForWritableEndpoints. Only admin user is 
allowed to read scheduler conf,
 in order to avoid leaking sensitive info, such as ACLs. Reference: 
RMWebServices#getSchedulerConfiguration()
{code:java}
+initForReadableEndpoints();
{code}
 

6. Below lines of code is not straightforward
{code:java}
+Configuration config = new Configuration(false);
+rm.getRMContext().getRMAdminService().getConfiguration(config,
+YarnConfiguration.CS_CONFIGURATION_FILE);
+MutableCSConfigurationProvider provider
+= new MutableCSConfigurationProvider(null);
+
+CapacitySchedulerConfiguration capacitySchedulerConfig =
+new CapacitySchedulerConfiguration(config, false);
+Configuration newConfig = 
provider.applyChanges(capacitySchedulerConfig,
+mutationInfo);
{code}
can be replaced similar to the one in RMWebServices#getSchedulerConfiguration() 
like below
{code:java}
  MutableConfigurationProvider mutableConfigurationProvider =
  ((MutableConfScheduler) scheduler).getMutableConfProvider();
  Configuration schedulerConf = mutableConfigurationProvider
  .getConfiguration();
  Configuration newConfig = 
provider.applyChanges(capacitySchedulerConfig,
  mutationInfo);
{code}
Change in AdminService.java is not required with above.

 

7. Error message "CS" can be expanded to "CapacityScheduler"
{code:java}
+  String errorMsg = "CS configuration validation failed: "
+  + e.toString();
{code}
 

8. Error Message is not added in the Error Response.
{code:java}
+  return Response.status(Status.BAD_REQUEST)
+  .build();
{code}
to
{code:java}
return Response.status(Status.BAD_REQUEST).entity(errorMsg)
.build();
{code}
 

9. Below error message is wrong
{code:java}
+  String errorMsg = "Configuration change only supported by " +
+  "MutableConfScheduler.";
{code}

> Create RM Rest API to validate a CapacityScheduler Configuration
> 
>
> Key: YARN-10022
> URL: https://issues.apache.org/jira/browse/YARN-10022
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Kinga Marton
>Assignee: Kinga Marton
>Priority: Major
> Attachments: YARN-10022.WIP.patch, YARN-10022.WIP2.patch
>
>
> RMWebService should expose a new api which gets a CapacityScheduler 
> Configuration as an input, validates it and returns success / failure.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9742) [JDK11] TestTimelineWebServicesWithSSL.testPutEntities fails

2020-01-16 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016696#comment-17016696
 ] 

Akira Ajisaka commented on YARN-9742:
-

Tested on trunk with the latest AdoptOpenJDK 11.0.6 with MacOS and it passed 
for me. Closing this.
If this issue occurs in the latest Java 11 version, please reopen this.

> [JDK11] TestTimelineWebServicesWithSSL.testPutEntities fails
> 
>
> Key: YARN-9742
> URL: https://issues.apache.org/jira/browse/YARN-9742
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineservice
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Priority: Major
>
> Tested on openjdk-11.0.2 on a Mac.
> Stack trace:
> {noformat}
> [ERROR] Tests run: 3, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 8.206 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServicesWithSSL
> [ERROR] 
> testPutEntities(org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServicesWithSSL)
>   Time elapsed: 0.366 s  <<< ERROR!
> com.sun.jersey.api.client.ClientHandlerException: java.io.IOException: HTTPS 
> hostname wrong:  should be <0.0.0.0>
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineJerseyRetryFilter$1.run(TimelineConnector.java:392)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineClientConnectionRetry.retryOn(TimelineConnector.java:335)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineJerseyRetryFilter.handle(TimelineConnector.java:405)
>   at com.sun.jersey.api.client.Client.handle(Client.java:652)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:570)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPostingObject(TimelineWriter.java:152)
>   at 
> org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServicesWithSSL$TestTimelineClient$1.doPostingObject(TestTimelineWebServicesWithSSL.java:139)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:115)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:112)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPosting(TimelineWriter.java:112)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:92)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:178)
>   at 
> org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServicesWithSSL.testPutEntities(TestTimelineWebServicesWithSSL.java:110)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>

[jira] [Resolved] (YARN-9742) [JDK11] TestTimelineWebServicesWithSSL.testPutEntities fails

2020-01-16 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka resolved YARN-9742.
-
Resolution: Cannot Reproduce

> [JDK11] TestTimelineWebServicesWithSSL.testPutEntities fails
> 
>
> Key: YARN-9742
> URL: https://issues.apache.org/jira/browse/YARN-9742
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineservice
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Priority: Major
>
> Tested on openjdk-11.0.2 on a Mac.
> Stack trace:
> {noformat}
> [ERROR] Tests run: 3, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 8.206 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServicesWithSSL
> [ERROR] 
> testPutEntities(org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServicesWithSSL)
>   Time elapsed: 0.366 s  <<< ERROR!
> com.sun.jersey.api.client.ClientHandlerException: java.io.IOException: HTTPS 
> hostname wrong:  should be <0.0.0.0>
>   at 
> com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineJerseyRetryFilter$1.run(TimelineConnector.java:392)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineClientConnectionRetry.retryOn(TimelineConnector.java:335)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineJerseyRetryFilter.handle(TimelineConnector.java:405)
>   at com.sun.jersey.api.client.Client.handle(Client.java:652)
>   at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
>   at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
>   at 
> com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:570)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPostingObject(TimelineWriter.java:152)
>   at 
> org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServicesWithSSL$TestTimelineClient$1.doPostingObject(TestTimelineWebServicesWithSSL.java:139)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:115)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:112)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPosting(TimelineWriter.java:112)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:92)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:178)
>   at 
> org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServicesWithSSL.testPutEntities(TestTimelineWebServicesWithSSL.java:110)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.s

[jira] [Created] (YARN-10088) Too many threads created by container caused NM shutdown

2020-01-16 Thread yehuanhuan (Jira)
yehuanhuan created YARN-10088:
-

 Summary: Too many threads created by container caused NM shutdown
 Key: YARN-10088
 URL: https://issues.apache.org/jira/browse/YARN-10088
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.7.2
Reporter: yehuanhuan


Because containerMonitorImpl only monitors physical memory and virtual memory. 
When the number of threads created by the container exceeds the number of user 
threads in the system, NM will exit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10088) Too many threads created by container caused NM shutdown

2020-01-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016811#comment-17016811
 ] 

Hadoop QA commented on YARN-10088:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  6m 
58s{color} | {color:red} Docker failed to build yetus/hadoop:06eafeedf12. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-10088 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12991113/YARN-10088-branch-2.7.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25400/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Too many threads created by container caused NM shutdown
> 
>
> Key: YARN-10088
> URL: https://issues.apache.org/jira/browse/YARN-10088
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.2
>Reporter: yehuanhuan
>Priority: Major
> Attachments: YARN-10088-branch-2.7.patch
>
>
> Because containerMonitorImpl only monitors physical memory and virtual 
> memory. When the number of threads created by the container exceeds the 
> number of user threads in the system, NM will exit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10085) FS-CS converter: remove mixed ordering policy check

2020-01-16 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016845#comment-17016845
 ] 

Peter Bacsko commented on YARN-10085:
-

[~leftnoteasy] a question to you: I set YARN-9892 as a dependency for this JIRA.

However, that ticket might require some more discussions, implementation takes 
time, etc. Currently the converter does not allow converting an FS config which 
has mixed fifo/fair/drf policies.

But you said that the conversion should succeed regardless, we should only 
print warnings.

Question: what should we do? Eg. we have an FS config with default "fair" 
policy, with some queues being set to "drf". How should the tool behave in this 
scenario?

> FS-CS converter: remove mixed ordering policy check
> ---
>
> Key: YARN-10085
> URL: https://issues.apache.org/jira/browse/YARN-10085
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
>
> When YARN-9892 gets committed, this part will become unnecessary:
> {noformat}
> // Validate ordering policy
> if (queueConverter.isDrfPolicyUsedOnQueueLevel()) {
>   if (queueConverter.isFifoOrFairSharePolicyUsed()) {
> throw new ConversionException(
> "DRF ordering policy cannot be used together with fifo/fair");
>   } else {
> capacitySchedulerConfig.set(
> CapacitySchedulerConfiguration.RESOURCE_CALCULATOR_CLASS,
> DominantResourceCalculator.class.getCanonicalName());
>   }
> }
> {noformat}
> We will be able to freely mix fifo/fair/drf, so let's get rid of this strict 
> check and also rewrite {{FSQueueConverter.emitOrderingPolicy()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10085) FS-CS converter: remove mixed ordering policy check

2020-01-16 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016845#comment-17016845
 ] 

Peter Bacsko edited comment on YARN-10085 at 1/16/20 12:11 PM:
---

[~leftnoteasy] a question to you: I set YARN-9892 as a dependency for this JIRA.

However, that ticket might require some more discussions, implementation takes 
time, etc. Currently the converter does not allow converting an FS config which 
has mixed fifo/fair/drf policies.

But you said that the conversion must succeed regardless, we just print 
warnings if we encounter something that's not supported.

Question: what should we do? Eg. we have an FS config with default "fair" 
policy, with some queues being set to "drf". How should the tool behave in this 
scenario?


was (Author: pbacsko):
[~leftnoteasy] a question to you: I set YARN-9892 as a dependency for this JIRA.

However, that ticket might require some more discussions, implementation takes 
time, etc. Currently the converter does not allow converting an FS config which 
has mixed fifo/fair/drf policies.

But you said that the conversion should succeed regardless, we should only 
print warnings.

Question: what should we do? Eg. we have an FS config with default "fair" 
policy, with some queues being set to "drf". How should the tool behave in this 
scenario?

> FS-CS converter: remove mixed ordering policy check
> ---
>
> Key: YARN-10085
> URL: https://issues.apache.org/jira/browse/YARN-10085
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
>
> When YARN-9892 gets committed, this part will become unnecessary:
> {noformat}
> // Validate ordering policy
> if (queueConverter.isDrfPolicyUsedOnQueueLevel()) {
>   if (queueConverter.isFifoOrFairSharePolicyUsed()) {
> throw new ConversionException(
> "DRF ordering policy cannot be used together with fifo/fair");
>   } else {
> capacitySchedulerConfig.set(
> CapacitySchedulerConfiguration.RESOURCE_CALCULATOR_CLASS,
> DominantResourceCalculator.class.getCanonicalName());
>   }
> }
> {noformat}
> We will be able to freely mix fifo/fair/drf, so let's get rid of this strict 
> check and also rewrite {{FSQueueConverter.emitOrderingPolicy()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10070) NPE if no rule is defined and application-tag-based-placement is enabled

2020-01-16 Thread Kinga Marton (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016855#comment-17016855
 ] 

Kinga Marton edited comment on YARN-10070 at 1/16/20 12:22 PM:
---

Thank you [~prabhujoseph] and [~adam.antal] for the review!


was (Author: kmarton):
hank you [~prabhujoseph] and [~adam.antal] for the review!

> NPE if no rule is defined and application-tag-based-placement is enabled
> 
>
> Key: YARN-10070
> URL: https://issues.apache.org/jira/browse/YARN-10070
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kinga Marton
>Assignee: Kinga Marton
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-10070.001.patch, YARN-10070.002.patch, 
> YARN-10070.003.patch
>
>
> If there is no rule defined for a user NPE is thrown by the following line.
> {code:java}
> String queue = placementManager
>  .placeApplication(context, usernameUsedForPlacement).getQueue();{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10070) NPE if no rule is defined and application-tag-based-placement is enabled

2020-01-16 Thread Kinga Marton (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016855#comment-17016855
 ] 

Kinga Marton commented on YARN-10070:
-

hank you [~prabhujoseph] and [~adam.antal] for the review!

> NPE if no rule is defined and application-tag-based-placement is enabled
> 
>
> Key: YARN-10070
> URL: https://issues.apache.org/jira/browse/YARN-10070
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kinga Marton
>Assignee: Kinga Marton
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-10070.001.patch, YARN-10070.002.patch, 
> YARN-10070.003.patch
>
>
> If there is no rule defined for a user NPE is thrown by the following line.
> {code:java}
> String queue = placementManager
>  .placeApplication(context, usernameUsedForPlacement).getQueue();{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10018) container-executor: possible -1 return value of fork() is not always checked

2020-01-16 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016925#comment-17016925
 ] 

Peter Bacsko commented on YARN-10018:
-

I rebased the patch, uploaded v2.

> container-executor: possible -1 return value of fork() is not always checked
> 
>
> Key: YARN-10018
> URL: https://issues.apache.org/jira/browse/YARN-10018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10018-001.patch, YARN-10018-001.patch, 
> YARN-10018-002.patch
>
>
> There are some places in the container-executor native, where the {{fork()}} 
> call is not handled properly. This operation can fail with -1, but sometimes 
> the necessary if branch is missing to validate that it's been successful.
> Also, at one location, the return value is defined as an {{int}}, not 
> {{pid_t}}. It's better to handle this transparently and change it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10018) container-executor: possible -1 return value of fork() is not always checked

2020-01-16 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-10018:

Attachment: YARN-10018-002.patch

> container-executor: possible -1 return value of fork() is not always checked
> 
>
> Key: YARN-10018
> URL: https://issues.apache.org/jira/browse/YARN-10018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10018-001.patch, YARN-10018-001.patch, 
> YARN-10018-002.patch
>
>
> There are some places in the container-executor native, where the {{fork()}} 
> call is not handled properly. This operation can fail with -1, but sometimes 
> the necessary if branch is missing to validate that it's been successful.
> Also, at one location, the return value is defined as an {{int}}, not 
> {{pid_t}}. It's better to handle this transparently and change it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10083) Provide utility to ask whether an application is in final status

2020-01-16 Thread Adam Antal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-10083:
--
Attachment: YARN-10083.branch-3.2.001.patch

> Provide utility to ask whether an application is in final status
> 
>
> Key: YARN-10083
> URL: https://issues.apache.org/jira/browse/YARN-10083
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Attachments: YARN-10083.001.patch, YARN-10083.002.patch, 
> YARN-10083.002.patch, YARN-10083.branch-3.2.001.patch
>
>
> This code part is severely duplicated across the Hadoop repo:
> {code:java}
>   public static boolean isApplicationFinalState(YarnApplicationState 
> appState) {
> return appState == YarnApplicationState.FINISHED
> || appState == YarnApplicationState.FAILED
> || appState == YarnApplicationState.KILLED;
>   }
> {code}
> This functionality is used heavily by the log aggregation as well, so we may 
> do some sanitizing here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10083) Provide utility to ask whether an application is in final status

2020-01-16 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017028#comment-17017028
 ] 

Adam Antal commented on YARN-10083:
---

This "unable to create native thread" is probably due to resource issue, which 
is not related to this patch.

Uploaded patch for branch-3.2 to this issue as well. Conflicts are due to 
missing YARN-7477 commit, and since Dynamometer is not in branch-3.2.

> Provide utility to ask whether an application is in final status
> 
>
> Key: YARN-10083
> URL: https://issues.apache.org/jira/browse/YARN-10083
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Attachments: YARN-10083.001.patch, YARN-10083.002.patch, 
> YARN-10083.002.patch, YARN-10083.branch-3.2.001.patch
>
>
> This code part is severely duplicated across the Hadoop repo:
> {code:java}
>   public static boolean isApplicationFinalState(YarnApplicationState 
> appState) {
> return appState == YarnApplicationState.FINISHED
> || appState == YarnApplicationState.FAILED
> || appState == YarnApplicationState.KILLED;
>   }
> {code}
> This functionality is used heavily by the log aggregation as well, so we may 
> do some sanitizing here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10083) Provide utility to ask whether an application is in final status

2020-01-16 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017044#comment-17017044
 ] 

Szilard Nemeth commented on YARN-10083:
---

Hi [~adam.antal],

This is a great finding.
One thing I found: 
org.apache.hadoop.yarn.logaggregation.LogToolUtils#getResponeFromNMWebService 
has a typo in its name.
I wanted to quickly fix it and then let it go, but found another occurrences of 
"getResponeFromNMWebService" in the codebase. Could you please check those as 
well? 
Thanks.

> Provide utility to ask whether an application is in final status
> 
>
> Key: YARN-10083
> URL: https://issues.apache.org/jira/browse/YARN-10083
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Attachments: YARN-10083.001.patch, YARN-10083.002.patch, 
> YARN-10083.002.patch, YARN-10083.branch-3.2.001.patch
>
>
> This code part is severely duplicated across the Hadoop repo:
> {code:java}
>   public static boolean isApplicationFinalState(YarnApplicationState 
> appState) {
> return appState == YarnApplicationState.FINISHED
> || appState == YarnApplicationState.FAILED
> || appState == YarnApplicationState.KILLED;
>   }
> {code}
> This functionality is used heavily by the log aggregation as well, so we may 
> do some sanitizing here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10018) container-executor: possible -1 return value of fork() is not always checked

2020-01-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017048#comment-17017048
 ] 

Hadoop QA commented on YARN-10018:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 23m 
44s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
33m 46s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 36s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 
25s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 96m 35s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | YARN-10018 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12991128/YARN-10018-002.patch |
| Optional Tests |  dupname  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 92527419cbd3 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a0ff42d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/25401/testReport/ |
| Max. process+thread count | 344 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25401/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> container-executor: possible -1 return value of fork() is not always checked
> 
>
> Key: YARN-10018
> URL: https://issues.apache.org/jira/browse/YARN-10018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10018-001.patch, YARN-10018-001.patch, 
> YARN-10018-002.patch
>
>
> There are some places in the container-executor native, where the {{fork()}} 
> call is not handled 

[jira] [Commented] (YARN-9292) Implement logic to keep docker image consistent in application that uses :latest tag

2020-01-16 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017312#comment-17017312
 ] 

Eric Badger commented on YARN-9292:
---

bq. Very good question, and the answer is somewhat complicated. For AM to run 
in the docker container, AM must have identical Hadoop client bits (Java, 
Hadoop, etc), and credential mapping (nscd/sssd). Many of those pieces can not 
be moved cleanly into Docker container in the first implementation of YARN 
native service (LLAP/Slider alike projects) because resistance of building 
agreeable docker image as part of Hadoop project. AM remains as outside of 
docker container for simplicity.

So I read your last comment and I think that everything pretty much makes sense 
if we can fix the issue of the AM not running in a Docker container. That way 
we can use YARN-9184 to pull the image and get the most up to date sha for the 
entire job to run with. And if an admin wants to do the image management 
themselves then they don't enable YARN-9184 and are responsible to have the 
images on the cluster that they want there. At that point, any errors would be 
for them to fix through their own automation.

I do have some questions on why we can't move the AM into a docker container 
though. What is it that is special about the AM that we need to run it directly 
on the host? What does it depend on the host for? We should be able to use the 
distributed cache to localize any libraries/jars that it needs. And as far as 
nscd/sssd, those can be bind-mounted into the container via configs. If they 
don't have nscd/sssd then they can bind-mount /etc/passwd. Since they would've 
been using the host anyway, this is no different. 

As far as the docker image itself, why does Hadoop need to provide an image? 
Everything needed can be provided via the distributed cache or bind-mounts, 
right? I don't see why we need a specialized image that is tied to Hadoop. You 
just need an image with Java and Bash.

> Implement logic to keep docker image consistent in application that uses 
> :latest tag
> 
>
> Key: YARN-9292
> URL: https://issues.apache.org/jira/browse/YARN-9292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9292.001.patch, YARN-9292.002.patch, 
> YARN-9292.003.patch, YARN-9292.004.patch, YARN-9292.005.patch, 
> YARN-9292.006.patch, YARN-9292.007.patch, YARN-9292.008.patch
>
>
> Docker image with latest tag can run in YARN cluster without any validation 
> in node managers. If a image with latest tag is changed during containers 
> launch. It might produce inconsistent results between nodes. This is surfaced 
> toward end of development for YARN-9184 to keep docker image consistent 
> within a job. One of the ideas to keep :latest tag consistent for a job, is 
> to use docker image command to figure out the image id and use image id to 
> propagate to rest of the container requests. There are some challenges to 
> overcome:
>  # The latest tag does not exist on the node where first container starts. 
> The first container will need to download the latest image, and find image 
> ID. This can introduce lag time for other containers to start.
>  # If image id is used to start other container, container-executor may have 
> problems to check if the image is coming from a trusted source. Both image 
> name and ID must be supply through .cmd file to container-executor. However, 
> hacker can supply incorrect image id and defeat container-executor security 
> checks.
> If we can over come those challenges, it maybe possible to keep docker image 
> consistent with one application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10083) Provide utility to ask whether an application is in final status

2020-01-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017323#comment-17017323
 ] 

Hadoop QA commented on YARN-10083:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
40s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-3.2 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
45s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
24s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
20s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
24s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
17s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
20m 17s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
43s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
42s{color} | {color:green} branch-3.2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
28s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
20s{color} | {color:red} hadoop-yarn-server-common in the patch failed. {color} 
|
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
15s{color} | {color:red} hadoop-yarn-server-applicationhistoryservice in the 
patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
21s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
14s{color} | {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
14s{color} | {color:red} hadoop-yarn-server-timeline-pluginstorage in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
13s{color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  4m 
29s{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  4m 29s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
15s{color} | {color:green} root: The patch generated 0 new + 137 unchanged - 3 
fixed = 137 total (was 140) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
28s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
22s{color} | {color:red} hadoop-yarn-server-common in the patch failed. {color} 
|
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
18s{color} | {color:red} hadoop-yarn-server-applicationhistoryservice in the 
patch failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
23s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
18s{color} | {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
16s{color} | {color:red} hadoop-yarn-server-timeline-pluginstorage in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {co

[jira] [Commented] (YARN-10043) FairOrderingPolicy Improvements

2020-01-16 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017328#comment-17017328
 ] 

Wangda Tan commented on YARN-10043:
---

Thanks [~maniraj...@gmail.com]  for posting thoughts on this.

In my opinion, the importance of mentioned behaviors is, 3 > 4 > 1; 3 is 
already supported.

#4 is important, we should add it.

#1 to me only impact performance not correctness (app without demand won't 
allocate anything), but comparing one more field could also impact performance. 
So I would say it is minor.

And:

To me #5 is not a necessary behavior, why an app start with "a" will be more 
important than an app start with "z". Since we have compared 3/4, I felt it is 
not worth to add.

#2 is not necessary to me, two reasons: a. it only related to queues, b. for 
queues, CS already compares relative usage. I don't really think add one more 
resource comparison is worth here.

I think YARN-10049 is also the same.

+ [~pbacsko]  since Peter is asking the similar quesitons.

> FairOrderingPolicy Improvements
> ---
>
> Key: YARN-10043
> URL: https://issues.apache.org/jira/browse/YARN-10043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> FairOrderingPolicy can be improved by using some of the approaches (only 
> relevant) implemented in FairSharePolicy of FS. This improvement has 
> significance in FS to CS migration context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10085) FS-CS converter: remove mixed ordering policy check

2020-01-16 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017331#comment-17017331
 ] 

Wangda Tan commented on YARN-10085:
---

[~pbacsko], I also posted a comment on YARN-10043, to me it is sufficient to 
convert (drf/fair) from FS to fair (CS), if any of the drf is set in FS, we 
should set global DominanteResourceCalculator in CS, and we can print a warning 
for that. To be honest it is a minor behavior which we don't need warning too.

> FS-CS converter: remove mixed ordering policy check
> ---
>
> Key: YARN-10085
> URL: https://issues.apache.org/jira/browse/YARN-10085
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
>
> When YARN-9892 gets committed, this part will become unnecessary:
> {noformat}
> // Validate ordering policy
> if (queueConverter.isDrfPolicyUsedOnQueueLevel()) {
>   if (queueConverter.isFifoOrFairSharePolicyUsed()) {
> throw new ConversionException(
> "DRF ordering policy cannot be used together with fifo/fair");
>   } else {
> capacitySchedulerConfig.set(
> CapacitySchedulerConfiguration.RESOURCE_CALCULATOR_CLASS,
> DominantResourceCalculator.class.getCanonicalName());
>   }
> }
> {noformat}
> We will be able to freely mix fifo/fair/drf, so let's get rid of this strict 
> check and also rewrite {{FSQueueConverter.emitOrderingPolicy()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10081) Exception message from ClientRMProxy#getRMAddress is misleading

2020-01-16 Thread Adam Antal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal reassigned YARN-10081:
-

Assignee: Ravuri Sushma sree

> Exception message from ClientRMProxy#getRMAddress is misleading
> ---
>
> Key: YARN-10081
> URL: https://issues.apache.org/jira/browse/YARN-10081
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Ravuri Sushma sree
>Priority: Trivial
> Attachments: YARN-10081.001.patch
>
>
> In {{ClientRMProxy#getRMAddress}} in the else branch we have the following 
> piece of code.
> {code:java}
> } else {
>   String message = "Unsupported protocol found when creating the proxy " +
>   "connection to ResourceManager: " +
>   ((protocol != null) ? protocol.getClass().getName() : "null");
>   LOG.error(message);
>   throw new IllegalStateException(message);
> }
> {code}
> This is wrong, because the protocol variable is of type "Class", so 
> {{Class.getClass()}} will be always {{Object}}. It should be 
> {{protocol.getName()}}. 
> An example of the error message if {{RMProxy}} is misused, and this exception 
> is thrown:
> {noformat}
> java.lang.IllegalStateException: Unsupported protocol found when creating the 
> proxy connection to ResourceManager: java.lang.Class
>   at 
> org.apache.hadoop.yarn.client.ClientRMProxy.getRMAddress(ClientRMProxy.java:109)
>   at 
> org.apache.hadoop.yarn.client.RMProxy.newProxyInstance(RMProxy.java:133)
> ...
> {noformat}
> where obviously not a {{Object.class}} was provided to this function as 
> protocol parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10081) Exception message from ClientRMProxy#getRMAddress is misleading

2020-01-16 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017355#comment-17017355
 ] 

Adam Antal commented on YARN-10081:
---

Assigned this to you. Perfect, thanks for the patch! +1 (non-binding).

> Exception message from ClientRMProxy#getRMAddress is misleading
> ---
>
> Key: YARN-10081
> URL: https://issues.apache.org/jira/browse/YARN-10081
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Ravuri Sushma sree
>Priority: Trivial
> Attachments: YARN-10081.001.patch
>
>
> In {{ClientRMProxy#getRMAddress}} in the else branch we have the following 
> piece of code.
> {code:java}
> } else {
>   String message = "Unsupported protocol found when creating the proxy " +
>   "connection to ResourceManager: " +
>   ((protocol != null) ? protocol.getClass().getName() : "null");
>   LOG.error(message);
>   throw new IllegalStateException(message);
> }
> {code}
> This is wrong, because the protocol variable is of type "Class", so 
> {{Class.getClass()}} will be always {{Object}}. It should be 
> {{protocol.getName()}}. 
> An example of the error message if {{RMProxy}} is misused, and this exception 
> is thrown:
> {noformat}
> java.lang.IllegalStateException: Unsupported protocol found when creating the 
> proxy connection to ResourceManager: java.lang.Class
>   at 
> org.apache.hadoop.yarn.client.ClientRMProxy.getRMAddress(ClientRMProxy.java:109)
>   at 
> org.apache.hadoop.yarn.client.RMProxy.newProxyInstance(RMProxy.java:133)
> ...
> {noformat}
> where obviously not a {{Object.class}} was provided to this function as 
> protocol parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9970) Refactor TestUserGroupMappingPlacementRule#verifyQueueMapping

2020-01-16 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-9970:
-
Hadoop Flags: Reviewed

> Refactor TestUserGroupMappingPlacementRule#verifyQueueMapping
> -
>
> Key: YARN-9970
> URL: https://issues.apache.org/jira/browse/YARN-9970
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
> Attachments: YARN-9970-branch-3.2.010.patch, 
> YARN-9970-branch-3.2.011.patch, YARN-9970.001.patch, YARN-9970.002.patch, 
> YARN-9970.003.patch, YARN-9970.004.patch, YARN-9970.005.patch, 
> YARN-9970.006.patch, YARN-9970.007.patch, YARN-9970.008.patch, 
> YARN-9970.009.patch
>
>
> Scope of this Jira is to refactor 
> TestUserGroupMappingPlacementRule#verifyQueueMapping and QueueMapping class 
> as discussed in 
> https://issues.apache.org/jira/browse/YARN-9865?focusedCommentId=16971482&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16971482



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9970) Refactor TestUserGroupMappingPlacementRule#verifyQueueMapping

2020-01-16 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017356#comment-17017356
 ] 

Szilard Nemeth commented on YARN-9970:
--

Thanks [~maniraj...@gmail.com],
Committed branch-3.2 patch.

> Refactor TestUserGroupMappingPlacementRule#verifyQueueMapping
> -
>
> Key: YARN-9970
> URL: https://issues.apache.org/jira/browse/YARN-9970
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9970-branch-3.2.010.patch, 
> YARN-9970-branch-3.2.011.patch, YARN-9970.001.patch, YARN-9970.002.patch, 
> YARN-9970.003.patch, YARN-9970.004.patch, YARN-9970.005.patch, 
> YARN-9970.006.patch, YARN-9970.007.patch, YARN-9970.008.patch, 
> YARN-9970.009.patch
>
>
> Scope of this Jira is to refactor 
> TestUserGroupMappingPlacementRule#verifyQueueMapping and QueueMapping class 
> as discussed in 
> https://issues.apache.org/jira/browse/YARN-9865?focusedCommentId=16971482&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16971482



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10085) FS-CS converter: remove mixed ordering policy check

2020-01-16 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017361#comment-17017361
 ] 

Peter Bacsko commented on YARN-10085:
-

Thanks [~leftnoteasy] - will modify the code accordingly.

> FS-CS converter: remove mixed ordering policy check
> ---
>
> Key: YARN-10085
> URL: https://issues.apache.org/jira/browse/YARN-10085
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
>
> When YARN-9892 gets committed, this part will become unnecessary:
> {noformat}
> // Validate ordering policy
> if (queueConverter.isDrfPolicyUsedOnQueueLevel()) {
>   if (queueConverter.isFifoOrFairSharePolicyUsed()) {
> throw new ConversionException(
> "DRF ordering policy cannot be used together with fifo/fair");
>   } else {
> capacitySchedulerConfig.set(
> CapacitySchedulerConfiguration.RESOURCE_CALCULATOR_CLASS,
> DominantResourceCalculator.class.getCanonicalName());
>   }
> }
> {noformat}
> We will be able to freely mix fifo/fair/drf, so let's get rid of this strict 
> check and also rewrite {{FSQueueConverter.emitOrderingPolicy()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9292) Implement logic to keep docker image consistent in application that uses :latest tag

2020-01-16 Thread Eric Yang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017364#comment-17017364
 ] 

Eric Yang commented on YARN-9292:
-

[~ebadger] {quote}I do have some questions on why we can't move the AM into a 
docker container though. What is it that is special about the AM that we need 
to run it directly on the host? What does it depend on the host for? We should 
be able to use the distributed cache to localize any libraries/jars that it 
needs. And as far as nscd/sssd, those can be bind-mounted into the container 
via configs. If they don't have nscd/sssd then they can bind-mount /etc/passwd. 
Since they would've been using the host anyway, this is no different.{quote}

YARN native service was a code merge from Apache Slider, and it was developed 
to run in YARN container directory like mapreduce tasks.  If the AM docker 
image is a mirror image of the host system, AM can run in a docker container.  
AM code still depends on all Hadoop client libraries, Hadoop configuration and 
Hadoop environment variables.

{quote}As far as the docker image itself, why does Hadoop need to provide an 
image? Everything needed can be provided via the distributed cache or 
bind-mounts, right? I don't see why we need a specialized image that is tied to 
Hadoop. You just need an image with Java and Bash.{quote}

>From 10,000 feet point of view, yes, AM only requires Java and Bash.  If 
>Hadoop provides the image, our users can deploy the image without worry about 
>how to create a docker image that mirrors the host structure.  Without Hadoop 
>supplying image and agreed upon image format.  It is up to the system admin's 
>interpretation of where Hadoop client configuration and client binaries are 
>located.  He/she can run the job with ENTRY point mode disabled and bind mount 
>Hadoop configuration and binaries.  As I recall, this is the less secure 
>approach to run the container because container requires to bind mount 
>writable Hadoop log directory to the container for launcher script to write 
>output.  This is a hassle and no container benefit. This method still exposes 
>host level environment and binaries to container.  There are 5 people on 
>planet Earth that knows how to wire this together, but unlikely to suggest 
>this approach.

> Implement logic to keep docker image consistent in application that uses 
> :latest tag
> 
>
> Key: YARN-9292
> URL: https://issues.apache.org/jira/browse/YARN-9292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9292.001.patch, YARN-9292.002.patch, 
> YARN-9292.003.patch, YARN-9292.004.patch, YARN-9292.005.patch, 
> YARN-9292.006.patch, YARN-9292.007.patch, YARN-9292.008.patch
>
>
> Docker image with latest tag can run in YARN cluster without any validation 
> in node managers. If a image with latest tag is changed during containers 
> launch. It might produce inconsistent results between nodes. This is surfaced 
> toward end of development for YARN-9184 to keep docker image consistent 
> within a job. One of the ideas to keep :latest tag consistent for a job, is 
> to use docker image command to figure out the image id and use image id to 
> propagate to rest of the container requests. There are some challenges to 
> overcome:
>  # The latest tag does not exist on the node where first container starts. 
> The first container will need to download the latest image, and find image 
> ID. This can introduce lag time for other containers to start.
>  # If image id is used to start other container, container-executor may have 
> problems to check if the image is coming from a trusted source. Both image 
> name and ID must be supply through .cmd file to container-executor. However, 
> hacker can supply incorrect image id and defeat container-executor security 
> checks.
> If we can over come those challenges, it maybe possible to keep docker image 
> consistent with one application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10018) container-executor: possible -1 return value of fork() is not always checked

2020-01-16 Thread Miklos Szegedi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017541#comment-17017541
 ] 

Miklos Szegedi commented on YARN-10018:
---

[~pbacsko], thank you for the updated patch. I think we could use 
ERROR_FORKING_PROCESS everywhere to make debugging easier.

> container-executor: possible -1 return value of fork() is not always checked
> 
>
> Key: YARN-10018
> URL: https://issues.apache.org/jira/browse/YARN-10018
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10018-001.patch, YARN-10018-001.patch, 
> YARN-10018-002.patch
>
>
> There are some places in the container-executor native, where the {{fork()}} 
> call is not handled properly. This operation can fail with -1, but sometimes 
> the necessary if branch is missing to validate that it's been successful.
> Also, at one location, the return value is defined as an {{int}}, not 
> {{pid_t}}. It's better to handle this transparently and change it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5356) NodeManager should communicate physical resource capability to ResourceManager

2020-01-16 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017663#comment-17017663
 ] 

Brahma Reddy Battula commented on YARN-5356:


thanks  to all it's nice to have. There is an issue when rolling 
upgrade,PhysicalResource will be null always.

i) Upgrade RM from 2.7 to 3.0.

ii) Upgrade NM from 2.7 to 3.0.

Here when NM re-register,as RMContext already have this nodeID so it will not 
added again as httpport also same hence "PhysicalResource" will be always null 
in the upgraded cluster till RM restart.

will raise Jira for same.

 

> NodeManager should communicate physical resource capability to ResourceManager
> --
>
> Key: YARN-5356
> URL: https://issues.apache.org/jira/browse/YARN-5356
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Nathan Roberts
>Assignee: Íñigo Goiri
>Priority: Major
>  Labels: oct16-medium
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-5356.000.patch, YARN-5356.001.patch, 
> YARN-5356.002.patch, YARN-5356.002.patch, YARN-5356.003.patch, 
> YARN-5356.004.patch, YARN-5356.005.patch, YARN-5356.006.patch, 
> YARN-5356.007.patch, YARN-5356.008.patch, YARN-5356.009.patch, 
> YARN-5356.010.patch, YARN-5356.011.patch
>
>
> Currently ResourceUtilization contains absolute quantities of resource used 
> (e.g. 4096MB memory used). It would be good if the NM also communicated the 
> actual physical resource capabilities of the node so that the RM can use this 
> data to schedule more effectively (overcommit, etc)
> Currently the only available information is the Resource the node registered 
> with (or later updated using updateNodeResource). However, these aren't 
> really sufficient to get a good view of how utilized a resource is. For 
> example, if a node reports 400% CPU utilization, does that mean it's 
> completely full, or barely utilized? Today there is no reliable way to figure 
> this out.
> [~elgoiri] - Lots of good work is happening in YARN-2965 so curious if you 
> have thoughts/opinions on this?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5356) NodeManager should communicate physical resource capability to ResourceManager

2020-01-16 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017663#comment-17017663
 ] 

Brahma Reddy Battula edited comment on YARN-5356 at 1/17/20 2:35 AM:
-

thanks  to all it's nice to have. There is an issue when rolling 
upgrade,PhysicalResource will be null always.

i) Upgrade RM from 2.7 to 3.0.

ii) Upgrade NM from 2.7 to 3.0.

Here when NM re-register,as RMContext already have this nodeID so it will not 
added again as httpport also same hence "PhysicalResource" will be always null 
in the upgraded cluster till RM restart.


RMNode rmNode = new RMNodeImpl(nodeId, rmContext, host, cmPort, httpPort,
 resolve(host), capability, nodeManagerVersion, physicalResource);


*org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService#registerNodeManager*
{code:java}
RMNode oldNode = this.rmContext.getRMNodes().putIfAbsent(nodeId, rmNode);
if (oldNode == null) {
 RMNodeStartedEvent startEvent = new RMNodeStartedEvent(nodeId,
 request.getNMContainerStatuses(),
 request.getRunningApplications());
 if (request.getLogAggregationReportsForApps() != null
 && !request.getLogAggregationReportsForApps().isEmpty()) {
 if (LOG.isDebugEnabled()) {
 LOG.debug("Found the number of previous cached log aggregation "
 + "status from nodemanager:" + nodeId + " is :"
 + request.getLogAggregationReportsForApps().size());
 }
 startEvent.setLogAggregationReportsForApps(request
 .getLogAggregationReportsForApps());
 }
 this.rmContext.getDispatcher().getEventHandler().handle(
 startEvent);
} else {
 LOG.info("Reconnect from the node at: " + host);
 this.nmLivelinessMonitor.unregister(nodeId);

 if (CollectionUtils.isEmpty(request.getRunningApplications())
 && rmNode.getState() != NodeState.DECOMMISSIONING
 && rmNode.getHttpPort() != oldNode.getHttpPort()) {
 // Reconnected node differs, so replace old node and start new node
 switch (rmNode.getState()) {
 case RUNNING:
 ClusterMetrics.getMetrics().decrNumActiveNodes();
 break;
 case UNHEALTHY:
 ClusterMetrics.getMetrics().decrNumUnhealthyNMs();
 break;
 default:
 LOG.debug("Unexpected Rmnode state");
 }
 this.rmContext.getDispatcher().getEventHandler()
 .handle(new NodeRemovedSchedulerEvent(rmNode));

 this.rmContext.getRMNodes().put(nodeId, rmNode);
 this.rmContext.getDispatcher().getEventHandler()
 .handle(new RMNodeStartedEvent(nodeId, null, null));
 } else {
 // Reset heartbeat ID since node just restarted.
 oldNode.resetLastNodeHeartBeatResponse();

 this.rmContext.getDispatcher().getEventHandler()
 .handle(new RMNodeReconnectEvent(nodeId, rmNode,
 request.getRunningApplications(),
 request.getNMContainerStatuses()));
 }
}{code}
 

will raise Jira for same.

 


was (Author: brahmareddy):
thanks  to all it's nice to have. There is an issue when rolling 
upgrade,PhysicalResource will be null always.

i) Upgrade RM from 2.7 to 3.0.

ii) Upgrade NM from 2.7 to 3.0.

Here when NM re-register,as RMContext already have this nodeID so it will not 
added again as httpport also same hence "PhysicalResource" will be always null 
in the upgraded cluster till RM restart.

will raise Jira for same.

 

> NodeManager should communicate physical resource capability to ResourceManager
> --
>
> Key: YARN-5356
> URL: https://issues.apache.org/jira/browse/YARN-5356
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Nathan Roberts
>Assignee: Íñigo Goiri
>Priority: Major
>  Labels: oct16-medium
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-5356.000.patch, YARN-5356.001.patch, 
> YARN-5356.002.patch, YARN-5356.002.patch, YARN-5356.003.patch, 
> YARN-5356.004.patch, YARN-5356.005.patch, YARN-5356.006.patch, 
> YARN-5356.007.patch, YARN-5356.008.patch, YARN-5356.009.patch, 
> YARN-5356.010.patch, YARN-5356.011.patch
>
>
> Currently ResourceUtilization contains absolute quantities of resource used 
> (e.g. 4096MB memory used). It would be good if the NM also communicated the 
> actual physical resource capabilities of the node so that the RM can use this 
> data to schedule more effectively (overcommit, etc)
> Currently the only available information is the Resource the node registered 
> with (or later updated using updateNodeResource). However, these aren't 
> really sufficient to get a good view of how utilized a resource is. For 
> example, if a node reports 400% CPU utilization, does that mean it's 
> completely full, or barely utilized? Today there is no reliable way to figure 
> this out.
> [~elgoiri] - Lots of good work is happening in YARN-2965 so curious if you 
> have thoughts/opinions on this?



--
This message was sent by Atlassia

[jira] [Created] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))

2020-01-16 Thread Brahma Reddy Battula (Jira)
Brahma Reddy Battula created YARN-10089:
---

 Summary: [Rollingupragde] PhysicalResource be always null (RMNode 
should be updated NM registeration))
 Key: YARN-10089
 URL: https://issues.apache.org/jira/browse/YARN-10089
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula


PhysicalResource will be null always, in following scenario

i) Upgrade RM from 2.7 to 3.0.

ii) Upgrade NM from 2.7 to 3.0.

Here when NM re-register,as RMContext already have this nodeID so it will not 
added again as httpport also same hence "PhysicalResource" will be always null 
in the upgraded cluster till RM restart.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10089) [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM registeration))

2020-01-16 Thread Brahma Reddy Battula (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-10089:

Priority: Blocker  (was: Major)

> [Rollingupragde] PhysicalResource be always null (RMNode should be updated NM 
> registeration))
> -
>
> Key: YARN-10089
> URL: https://issues.apache.org/jira/browse/YARN-10089
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Priority: Blocker
>
> PhysicalResource will be null always, in following scenario
> i) Upgrade RM from 2.7 to 3.0.
> ii) Upgrade NM from 2.7 to 3.0.
> Here when NM re-register,as RMContext already have this nodeID so it will not 
> added again as httpport also same hence "PhysicalResource" will be always 
> null in the upgraded cluster till RM restart.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9872) DecommissioningNodesWatcher#update blocks the heartbeat processing

2020-01-16 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-9872:

Attachment: YARN-9872.002.patch

> DecommissioningNodesWatcher#update blocks the heartbeat processing
> --
>
> Key: YARN-9872
> URL: https://issues.apache.org/jira/browse/YARN-9872
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-9872.001.patch, YARN-9872.002.patch
>
>
> ResourceTrackerService handlers gettting blocked due to the synchronisation 
> at DecommissioningNodesWatcher#update



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org