[jira] [Created] (YARN-11209) yarn CapacityScheduler Unknown bug: Resource scheduling fails after RM runs for 3 days and resources cannot be fully used.

2022-07-12 Thread ruiliang (Jira)
ruiliang created YARN-11209:
---

 Summary: yarn CapacityScheduler  Unknown bug: Resource scheduling 
fails after RM runs for 3 days and resources cannot be fully used.
 Key: YARN-11209
 URL: https://issues.apache.org/jira/browse/YARN-11209
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Affects Versions: 3.1.1
Reporter: ruiliang
 Attachments: image-2022-07-12-14-51-15-129.png, 
rm_20220711_1646_have_bug.jstack, rm_20220712_1424_run well.jstac

* !image-2022-07-12-14-51-15-129.png!
 * Here are the jStacks where resource scheduling does not reach 100%

 * rm_20220711_1646_have_bug.jstack

 

      restart  rm After 3 days of good operation

      rm_20220712_1424_run well.jstack

At present, I do not know what the problem is. Could you please tell me what 
configuration needs to be adjusted?There is no configuration change before and 
after the restart, but this problem is really prominent, request guidance

CPU before and after comparison is obvious, other hard indicators are not clear 
display abnormal



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11209) yarn CapacityScheduler Unknown bug: Resource scheduling fails after RM runs for 3 days and resources cannot be fully used.

2022-07-12 Thread ruiliang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ruiliang updated YARN-11209:

Description: 
* !image-2022-07-12-14-51-15-129.png|width=1667,height=670!
 * Here are the jStacks where resource scheduling does not reach 100%

 * rm_20220711_1646_have_bug.jstack

 

      restart  rm After 3 days of good operation

      rm_20220712_1424_run well.jstack

At present, I do not know what the problem is. Could you please tell me what 
configuration needs to be adjusted?There is no configuration change before and 
after the restart, but this problem is really prominent, request guidance

CPU before and after comparison is obvious, other hard indicators are not clear 
display abnormal

  was:
* !image-2022-07-12-14-51-15-129.png!
 * Here are the jStacks where resource scheduling does not reach 100%

 * rm_20220711_1646_have_bug.jstack

 

      restart  rm After 3 days of good operation

      rm_20220712_1424_run well.jstack

At present, I do not know what the problem is. Could you please tell me what 
configuration needs to be adjusted?There is no configuration change before and 
after the restart, but this problem is really prominent, request guidance

CPU before and after comparison is obvious, other hard indicators are not clear 
display abnormal


> yarn CapacityScheduler  Unknown bug: Resource scheduling fails after RM runs 
> for 3 days and resources cannot be fully used.
> ---
>
> Key: YARN-11209
> URL: https://issues.apache.org/jira/browse/YARN-11209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.1
>Reporter: ruiliang
>Priority: Major
>  Labels: capacity-scheduler, yarn
> Attachments: image-2022-07-12-14-51-15-129.png, 
> rm_20220711_1646_have_bug.jstack, rm_20220712_1424_run well.jstac
>
>
> * !image-2022-07-12-14-51-15-129.png|width=1667,height=670!
>  * Here are the jStacks where resource scheduling does not reach 100%
>  * rm_20220711_1646_have_bug.jstack
>  
>       restart  rm After 3 days of good operation
>       rm_20220712_1424_run well.jstack
> At present, I do not know what the problem is. Could you please tell me what 
> configuration needs to be adjusted?There is no configuration change before 
> and after the restart, but this problem is really prominent, request guidance
> CPU before and after comparison is obvious, other hard indicators are not 
> clear display abnormal



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10901) Permission checking error on an existing directory in LogAggregationFileController#verifyAndCreateRemoteLogDir

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-10901:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> Permission checking error on an existing directory in 
> LogAggregationFileController#verifyAndCreateRemoteLogDir
> --
>
> Key: YARN-10901
> URL: https://issues.apache.org/jira/browse/YARN-10901
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.2.2, 3.3.1
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> *LogAggregationFileController.verifyAndCreateRemoteLogDir* tries to check 
> whether the remote file system has set/modify permissions on the 
> _yarn.nodemanager.remote-app-log-dir:_
>  
> {code:java}
>   //Check if FS has capability to set/modify permissions
>   try {
> remoteFS.setPermission(qualified, new 
> FsPermission(TLDIR_PERMISSIONS));
>   } catch (UnsupportedOperationException use) {
> LOG.info("Unable to set permissions for configured filesystem since"
> + " it does not support this", remoteFS.getScheme());
> fsSupportsChmod = false;
>   } catch (IOException e) {
> LOG.warn("Failed to check if FileSystem suppports permissions on "
> + "remoteLogDir [" + remoteRootLogDir + "]", e);
>   } {code}
> But it will fail if the _yarn.nodemanager.remote-app-log-dir_'s owner is not 
> the same as the NodeManager's user.
>  
> Example error
> {code:java}
> 2021-08-27 11:33:21,649 WARN 
> org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController:
>  Failed to check if FileSystem suppports permissions on remoteLogDir 
> [/tmp/logs]2021-08-27 11:33:21,649 WARN 
> org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController:
>  Failed to check if FileSystem suppports permissions on remoteLogDir 
> [/tmp/logs]org.apache.hadoop.security.AccessControlException: Permission 
> denied. user=yarn is not the owner of inode=/tmp/logs at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkOwner(FSPermissionChecker.java:464)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:407)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermissionWithContext(FSPermissionChecker.java:417)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:297)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1950)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1931)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1876)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setPermission(FSDirAttrOp.java:64)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1976)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:858)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:548)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>  at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
>  at org.apache.hadoop.hdfs.DFSClient.setPermission(DFSClient.java:1921

[jira] [Updated] (YARN-10814) YARN shouldn't start with empty hadoop.http.authentication.signature.secret.file

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-10814:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> YARN shouldn't start with empty 
> hadoop.http.authentication.signature.secret.file
> 
>
> Key: YARN-10814
> URL: https://issues.apache.org/jira/browse/YARN-10814
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Benjamin Teke
>Assignee: Tamas Domok
>Priority: Major
>  Labels: patch-available, pull-request-available
> Fix For: 3.4.0, 3.3.1, 3.3.2, 3.2.4
>
> Attachments: YARN-10814-branch-3.3.patch
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> YARN currently throws the following warning upon accessing any REST endpoint 
> when the configured http secret file exists but is empty:
> {code:java}
> 2021-03-03 20:46:16,616 WARN org.eclipse.jetty.server.HttpChannel: /jmx
> java.lang.IllegalArgumentException: Empty key
> at 
> java.base/javax.crypto.spec.SecretKeySpec.(SecretKeySpec.java:95)
> at 
> org.apache.hadoop.security.authentication.util.Signer.computeSignature(Signer.java:93)
> at 
> org.apache.hadoop.security.authentication.util.Signer.sign(Signer.java:59)
> at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:587)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1638)
> at 
> org.apache.hadoop.security.http.CrossOriginFilter.doFilter(CrossOriginFilter.java:98)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1638)
> at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1681)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1638)
> at 
> org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1638)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:567)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1377)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:507)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1292)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at org.eclipse.jetty.server.Server.handle(Server.java:501)
> at 
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
> at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:556)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
> at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
> at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
> at 
> org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:540)
> at 
> org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:395)
> at 
> org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:161)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
> at 
> org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
> at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:

[jira] [Updated] (YARN-10814) YARN shouldn't start with empty hadoop.http.authentication.signature.secret.file

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-10814:

Fix Version/s: (was: 3.3.1)

> YARN shouldn't start with empty 
> hadoop.http.authentication.signature.secret.file
> 
>
> Key: YARN-10814
> URL: https://issues.apache.org/jira/browse/YARN-10814
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Benjamin Teke
>Assignee: Tamas Domok
>Priority: Major
>  Labels: patch-available, pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: YARN-10814-branch-3.3.patch
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> YARN currently throws the following warning upon accessing any REST endpoint 
> when the configured http secret file exists but is empty:
> {code:java}
> 2021-03-03 20:46:16,616 WARN org.eclipse.jetty.server.HttpChannel: /jmx
> java.lang.IllegalArgumentException: Empty key
> at 
> java.base/javax.crypto.spec.SecretKeySpec.(SecretKeySpec.java:95)
> at 
> org.apache.hadoop.security.authentication.util.Signer.computeSignature(Signer.java:93)
> at 
> org.apache.hadoop.security.authentication.util.Signer.sign(Signer.java:59)
> at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:587)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1638)
> at 
> org.apache.hadoop.security.http.CrossOriginFilter.doFilter(CrossOriginFilter.java:98)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1638)
> at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1681)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1638)
> at 
> org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1638)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:567)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1377)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:507)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1292)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at org.eclipse.jetty.server.Server.handle(Server.java:501)
> at 
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
> at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:556)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
> at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
> at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
> at 
> org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:540)
> at 
> org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:395)
> at 
> org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:161)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
> at 
> org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
> at 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
> at 
> org.eclipse

[jira] [Updated] (YARN-10660) YARN Web UI have problem when show node partitions resource

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-10660:

Target Version/s: 3.3.9, 3.2.5  (was: 3.2.4, 3.3.9)

> YARN Web UI have problem when show node partitions resource
> ---
>
> Key: YARN-10660
> URL: https://issues.apache.org/jira/browse/YARN-10660
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.1.0, 3.1.1, 3.2.1, 3.2.2
>Reporter: tuyu
>Priority: Minor
> Attachments: 2021-03-01 19-56-02 的屏幕截图.png, YARN-10660.patch
>
>
> when enable yarn label function, Yarn UI will show queue resource base on 
> partitions,but there have some problem when click expand button. The url will 
> increase very long, like  this 
> {code:java}
> 127.0.0.1:20701/cluster/scheduler?openQueues=Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20
> {code}
> The root cause is
> {code:java}
>origin url is:
>   Partition:  
>htmlencode is:
>   Partition:  
>   SchedulerPageUtil have some javascript code
>  storeExpandedQueue
> tmpCurrentParam = tmpCurrentParam.split('&');",
>the  Partition:   
>  will split and len > 1, the problem logic is here, if click  expand button 
> close, the function will clear params, but it the split array is not match 
> orgin url 
> {code}
> when click expand button close, lt;DEFAULT_PARTITION>  vCores:96>  will append, if click expand multi times, the length will 
> increase too long
>   



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10660) YARN Web UI have problem when show node partitions resource

2022-07-12 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565896#comment-17565896
 ] 

Masatake Iwasaki commented on YARN-10660:
-

update the targets to 3.2.5 for preparing 3.2.4 release.

> YARN Web UI have problem when show node partitions resource
> ---
>
> Key: YARN-10660
> URL: https://issues.apache.org/jira/browse/YARN-10660
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.1.0, 3.1.1, 3.2.1, 3.2.2
>Reporter: tuyu
>Priority: Minor
> Attachments: 2021-03-01 19-56-02 的屏幕截图.png, YARN-10660.patch
>
>
> when enable yarn label function, Yarn UI will show queue resource base on 
> partitions,but there have some problem when click expand button. The url will 
> increase very long, like  this 
> {code:java}
> 127.0.0.1:20701/cluster/scheduler?openQueues=Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20#Partition:%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20
> {code}
> The root cause is
> {code:java}
>origin url is:
>   Partition:  
>htmlencode is:
>   Partition:  
>   SchedulerPageUtil have some javascript code
>  storeExpandedQueue
> tmpCurrentParam = tmpCurrentParam.split('&');",
>the  Partition:   
>  will split and len > 1, the problem logic is here, if click  expand button 
> close, the function will clear params, but it the split array is not match 
> orgin url 
> {code}
> when click expand button close, lt;DEFAULT_PARTITION>  vCores:96>  will append, if click expand multi times, the length will 
> increase too long
>   



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9007) CS preemption monitor should only select GUARANTEED containers as candidates for queue and reserved container preemption

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-9007:
---
Target Version/s: 3.2.5  (was: 3.2.4)

> CS preemption monitor should only select GUARANTEED containers as candidates 
> for queue and reserved container preemption
> 
>
> Key: YARN-9007
> URL: https://issues.apache.org/jira/browse/YARN-9007
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9007.001.patch
>
>
> Currently CS preemption monitor doesn't consider execution type of 
> containers, so OPPORTUNISTIC containers maybe selected and killed without 
> effect.
> In some scenario with OPPORTUNISTIC containers, not even preemption can't 
> work properly to balance resources, but also some apps with OPPORTUNISTIC 
> containers maybe effected and unable to work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8657) User limit calculation should be read-lock-protected within LeafQueue

2022-07-12 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565898#comment-17565898
 ] 

Masatake Iwasaki commented on YARN-8657:


update the targets to 3.2.5 for preparing 3.2.4 release.

> User limit calculation should be read-lock-protected within LeafQueue
> -
>
> Key: YARN-8657
> URL: https://issues.apache.org/jira/browse/YARN-8657
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8657.001.patch, YARN-8657.002.patch
>
>
> When async scheduling is enabled, user limit calculation could be wrong: 
> It is possible that scheduler calculated a user_limit, but inside 
> {{canAssignToUser}} it becomes staled. 
> We need to protect user limit calculation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8657) User limit calculation should be read-lock-protected within LeafQueue

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-8657:
---
Target Version/s: 3.2.5  (was: 3.2.4)

> User limit calculation should be read-lock-protected within LeafQueue
> -
>
> Key: YARN-8657
> URL: https://issues.apache.org/jira/browse/YARN-8657
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8657.001.patch, YARN-8657.002.patch
>
>
> When async scheduling is enabled, user limit calculation could be wrong: 
> It is possible that scheduler calculated a user_limit, but inside 
> {{canAssignToUser}} it becomes staled. 
> We need to protect user limit calculation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9007) CS preemption monitor should only select GUARANTEED containers as candidates for queue and reserved container preemption

2022-07-12 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565897#comment-17565897
 ] 

Masatake Iwasaki commented on YARN-9007:


update the targets to 3.2.5 for preparing 3.2.4 release.

> CS preemption monitor should only select GUARANTEED containers as candidates 
> for queue and reserved container preemption
> 
>
> Key: YARN-9007
> URL: https://issues.apache.org/jira/browse/YARN-9007
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.1
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9007.001.patch
>
>
> Currently CS preemption monitor doesn't consider execution type of 
> containers, so OPPORTUNISTIC containers maybe selected and killed without 
> effect.
> In some scenario with OPPORTUNISTIC containers, not even preemption can't 
> work properly to balance resources, but also some apps with OPPORTUNISTIC 
> containers maybe effected and unable to work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10087) ATS possible NPE on REST API when data is missing

2022-07-12 Thread Samrat Deb (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samrat Deb reassigned YARN-10087:
-

Assignee: Samrat Deb  (was: Tanu Ajmera)

> ATS possible NPE on REST API when data is missing
> -
>
> Key: YARN-10087
> URL: https://issues.apache.org/jira/browse/YARN-10087
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Reporter: Wilfred Spiegelenburg
>Assignee: Samrat Deb
>Priority: Major
>  Labels: newbie
> Attachments: ats_stack.txt
>
>
> If the data stored by the ATS is not complete REST calls to the ATS can 
> return a NPE instead of results.
> {{{"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException"}}}
> The issue shows up when the ATS was down for a short period and in that time 
> new applications were started. This causes certain parts of the application 
> data to be missing in the ATS store. In most cases this is not a problem and 
> data will be returned but when you start filtering data the filtering fails 
> throwing the NPE.
>  In this case the request was for: 
> {{http://:8188/ws/v1/applicationhistory/apps?user=hive'}}
> If certain pieces of data are missing the ATS should not even consider 
> returning that data, filtered or not. We should not display partial or 
> incomplete data.
>  In case of the missing user information ACL checks cannot be correctly 
> performed and we could see more issues.
> A similar issue was fixed in YARN-7118 where the queue details were missing. 
> It just _skips_ the app to prevent the NPE but that is not the correct thing 
> when the user is missing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10087) ATS possible NPE on REST API when data is missing

2022-07-12 Thread Samrat Deb (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565959#comment-17565959
 ] 

Samrat Deb commented on YARN-10087:
---

picking this up!

> ATS possible NPE on REST API when data is missing
> -
>
> Key: YARN-10087
> URL: https://issues.apache.org/jira/browse/YARN-10087
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Reporter: Wilfred Spiegelenburg
>Assignee: Tanu Ajmera
>Priority: Major
>  Labels: newbie
> Attachments: ats_stack.txt
>
>
> If the data stored by the ATS is not complete REST calls to the ATS can 
> return a NPE instead of results.
> {{{"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException"}}}
> The issue shows up when the ATS was down for a short period and in that time 
> new applications were started. This causes certain parts of the application 
> data to be missing in the ATS store. In most cases this is not a problem and 
> data will be returned but when you start filtering data the filtering fails 
> throwing the NPE.
>  In this case the request was for: 
> {{http://:8188/ws/v1/applicationhistory/apps?user=hive'}}
> If certain pieces of data are missing the ATS should not even consider 
> returning that data, filtered or not. We should not display partial or 
> incomplete data.
>  In case of the missing user information ACL checks cannot be correctly 
> performed and we could see more issues.
> A similar issue was fixed in YARN-7118 where the queue details were missing. 
> It just _skips_ the app to prevent the NPE but that is not the correct thing 
> when the user is missing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org