[jira] [Resolved] (YARN-6990) AmIpFilter:findRedirectUrl use HAServiceProtocol to getServiceStatus

2017-08-10 Thread yunjiong zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yunjiong zhao resolved YARN-6990.
-
Resolution: Duplicate
  Assignee: yunjiong zhao

> AmIpFilter:findRedirectUrl use HAServiceProtocol to getServiceStatus
> 
>
> Key: YARN-6990
> URL: https://issues.apache.org/jira/browse/YARN-6990
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
>
> Due to we have multiple IP in ResourceManager, when try to access proxy URL 
> like https://*:50030/proxy/application_1502349494018_10877/, it will failed 
> due to it use HAServiceProtocol to find out which one is active RM.
> {code}
> 2017-08-10 10:51:42,344 WARN [971256592@qtp-666312528-0] 
> org.apache.hadoop.ipc.Client: Exception encountered while connecting to the 
> server :
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
> via:[KERBEROS]
> at 
> org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:172)
> at 
> org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:396)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:563)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:378)
> at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:732)
> at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:728)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:727)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1492)
> at org.apache.hadoop.ipc.Client.call(Client.java:1402)
> at org.apache.hadoop.ipc.Client.call(Client.java:1363)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy84.getServiceStatus(Unknown Source)
> at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.getServiceStatus(HAServiceProtocolClientSideTranslatorPB.java:122)
> at org.apache.hadoop.yarn.util.RMHAUtils.getHAState(RMHAUtils.java:68)
> at 
> org.apache.hadoop.yarn.util.RMHAUtils.findActiveRMHAId(RMHAUtils.java:44)
> at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.findRedirectUrl(AmIpFilter.java:174)
> at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:138)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1243)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:326)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> {code}
> This only can happen on RM have multiple IPs, related code is inside 
> AmIpFilter.java doFilter function:
> {code}
> if 

[jira] [Commented] (YARN-6990) AmIpFilter:findRedirectUrl use HAServiceProtocol to getServiceStatus

2017-08-10 Thread yunjiong zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122538#comment-16122538
 ] 

yunjiong zhao commented on YARN-6990:
-

I just find YARN-6625 fixed the issue.
We use 2.7.

> AmIpFilter:findRedirectUrl use HAServiceProtocol to getServiceStatus
> 
>
> Key: YARN-6990
> URL: https://issues.apache.org/jira/browse/YARN-6990
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
>
> Due to we have multiple IP in ResourceManager, when try to access proxy URL 
> like https://*:50030/proxy/application_1502349494018_10877/, it will failed 
> due to it use HAServiceProtocol to find out which one is active RM.
> {code}
> 2017-08-10 10:51:42,344 WARN [971256592@qtp-666312528-0] 
> org.apache.hadoop.ipc.Client: Exception encountered while connecting to the 
> server :
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
> via:[KERBEROS]
> at 
> org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:172)
> at 
> org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:396)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:563)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:378)
> at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:732)
> at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:728)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:727)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1492)
> at org.apache.hadoop.ipc.Client.call(Client.java:1402)
> at org.apache.hadoop.ipc.Client.call(Client.java:1363)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy84.getServiceStatus(Unknown Source)
> at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.getServiceStatus(HAServiceProtocolClientSideTranslatorPB.java:122)
> at org.apache.hadoop.yarn.util.RMHAUtils.getHAState(RMHAUtils.java:68)
> at 
> org.apache.hadoop.yarn.util.RMHAUtils.findActiveRMHAId(RMHAUtils.java:44)
> at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.findRedirectUrl(AmIpFilter.java:174)
> at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:138)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1243)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:326)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> {code}
> This only can happen on RM have multiple IPs, related code is inside 
> AmIpFilter.java doFilter function:
> {code}
> if 

[jira] [Updated] (YARN-6990) AmIpFilter:findRedirectUrl use HAServiceProtocol to getServiceStatus

2017-08-10 Thread yunjiong zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yunjiong zhao updated YARN-6990:

Affects Version/s: 2.7.0

> AmIpFilter:findRedirectUrl use HAServiceProtocol to getServiceStatus
> 
>
> Key: YARN-6990
> URL: https://issues.apache.org/jira/browse/YARN-6990
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: yunjiong zhao
>
> Due to we have multiple IP in ResourceManager, when try to access proxy URL 
> like https://*:50030/proxy/application_1502349494018_10877/, it will failed 
> due to it use HAServiceProtocol to find out which one is active RM.
> {code}
> 2017-08-10 10:51:42,344 WARN [971256592@qtp-666312528-0] 
> org.apache.hadoop.ipc.Client: Exception encountered while connecting to the 
> server :
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
> via:[KERBEROS]
> at 
> org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:172)
> at 
> org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:396)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:563)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:378)
> at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:732)
> at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:728)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:727)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1492)
> at org.apache.hadoop.ipc.Client.call(Client.java:1402)
> at org.apache.hadoop.ipc.Client.call(Client.java:1363)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy84.getServiceStatus(Unknown Source)
> at 
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.getServiceStatus(HAServiceProtocolClientSideTranslatorPB.java:122)
> at org.apache.hadoop.yarn.util.RMHAUtils.getHAState(RMHAUtils.java:68)
> at 
> org.apache.hadoop.yarn.util.RMHAUtils.findActiveRMHAId(RMHAUtils.java:44)
> at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.findRedirectUrl(AmIpFilter.java:174)
> at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:138)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1243)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:326)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> {code}
> This only can happen on RM have multiple IPs, related code is inside 
> AmIpFilter.java doFilter function:
> {code}
> if (!getProxyAddresses().contains(httpReq.getRemoteAddr())) {
>   String redirectUrl = findRedirectUrl();
>   

[jira] [Created] (YARN-6990) AmIpFilter:findRedirectUrl use HAServiceProtocol to getServiceStatus

2017-08-10 Thread yunjiong zhao (JIRA)
yunjiong zhao created YARN-6990:
---

 Summary: AmIpFilter:findRedirectUrl use HAServiceProtocol to 
getServiceStatus
 Key: YARN-6990
 URL: https://issues.apache.org/jira/browse/YARN-6990
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: yunjiong zhao


Due to we have multiple IP in ResourceManager, when try to access proxy URL 
like https://*:50030/proxy/application_1502349494018_10877/, it will failed due 
to it use HAServiceProtocol to find out which one is active RM.
{code}
2017-08-10 10:51:42,344 WARN [971256592@qtp-666312528-0] 
org.apache.hadoop.ipc.Client: Exception encountered while connecting to the 
server :
org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
via:[KERBEROS]
at 
org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:172)
at 
org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:396)
at 
org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:563)
at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:378)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:732)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:728)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:727)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1492)
at org.apache.hadoop.ipc.Client.call(Client.java:1402)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy84.getServiceStatus(Unknown Source)
at 
org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.getServiceStatus(HAServiceProtocolClientSideTranslatorPB.java:122)
at org.apache.hadoop.yarn.util.RMHAUtils.getHAState(RMHAUtils.java:68)
at 
org.apache.hadoop.yarn.util.RMHAUtils.findActiveRMHAId(RMHAUtils.java:44)
at 
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.findRedirectUrl(AmIpFilter.java:174)
at 
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:138)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1243)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
{code}

This only can happen on RM have multiple IPs, related code is inside 
AmIpFilter.java doFilter function:
{code}
if (!getProxyAddresses().contains(httpReq.getRemoteAddr())) {
  String redirectUrl = findRedirectUrl();
  String target = redirectUrl + httpReq.getRequestURI();
  ProxyUtils.sendRedirect(httpReq,  httpResp,  target);
  return;
}
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6339) Improve performance for createAndGetApplicationReport

2017-03-27 Thread yunjiong zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944399#comment-15944399
 ] 

yunjiong zhao commented on YARN-6339:
-

Thanks [~wangda] & [~xgong] for your time.

> Improve performance for createAndGetApplicationReport
> -
>
> Key: YARN-6339
> URL: https://issues.apache.org/jira/browse/YARN-6339
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Fix For: 2.8.1, 3.0.0-alpha3
>
> Attachments: YARN-6339.001.patch, YARN-6339.002.patch, 
> YARN-6339.003.patch
>
>
> There are two performance issue when calling createAndGetApplicationReport:
> One is inside ProtoUtils.convertFromProtoFormat, replace is too slow for 
> clusters which have more than 3000 nodes. Use substring is much better: 
> https://issues.apache.org/jira/browse/YARN-6285?focusedCommentId=15923241=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15923241
> Another one is inside getLogAggregationReportsForApp, if some application's 
> LogAggregationStatus is TIME_OUT, every time it was called it will create an 
> HashMap which will produce lots of garbage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications

2017-03-22 Thread yunjiong zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15937278#comment-15937278
 ] 

yunjiong zhao commented on YARN-6285:
-

Yes, We'll test both of them.
Thanks.

> Add option to set max limit on ResourceManager for 
> ApplicationClientProtocol.getApplications
> 
>
> Key: YARN-6285
> URL: https://issues.apache.org/jira/browse/YARN-6285
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: YARN-6285.001.patch, YARN-6285.002.patch, 
> YARN-6285.003.patch
>
>
> When users called ApplicationClientProtocol.getApplications, it will return 
> lots of data, and generate lots of garbage on ResourceManager which caused 
> long time GC.
> For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 
> applications.
> getApplications have limit parameter, but some user might not set it, and 
> then the limit will be Long.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6339) Improve performance for createAndGetApplicationReport

2017-03-21 Thread yunjiong zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yunjiong zhao updated YARN-6339:

Attachment: YARN-6339.003.patch

[~wangda], Good suggestion.
Update patch set logAggregationStatusForAppReport to volatile, no need  changes 
in createAndGetApplicationReport() any more since it's safe update 
logAggregationStatusForAppReport inside getLogAggregationStatusForAppReport().
Thanks for your time to review the patch.

> Improve performance for createAndGetApplicationReport
> -
>
> Key: YARN-6339
> URL: https://issues.apache.org/jira/browse/YARN-6339
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: YARN-6339.001.patch, YARN-6339.002.patch, 
> YARN-6339.003.patch
>
>
> There are two performance issue when calling createAndGetApplicationReport:
> One is inside ProtoUtils.convertFromProtoFormat, replace is too slow for 
> clusters which have more than 3000 nodes. Use substring is much better: 
> https://issues.apache.org/jira/browse/YARN-6285?focusedCommentId=15923241=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15923241
> Another one is inside getLogAggregationReportsForApp, if some application's 
> LogAggregationStatus is TIME_OUT, every time it was called it will create an 
> HashMap which will produce lots of garbage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications

2017-03-20 Thread yunjiong zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933790#comment-15933790
 ] 

yunjiong zhao commented on YARN-6285:
-

YARN-6339 is not applied to our cluster yet.

When I create YARN-6285, what I want is a simple patch which allow us to 
control the GC ASAP.
With YARN-6339, I believe we can set 
yarn.resourcemanager.max-limit-get-applications with a bigger value or not need 
set a limit any more. Will let you know after YARN-6339 pasted review and 
applied in our cluster.

> Add option to set max limit on ResourceManager for 
> ApplicationClientProtocol.getApplications
> 
>
> Key: YARN-6285
> URL: https://issues.apache.org/jira/browse/YARN-6285
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: YARN-6285.001.patch, YARN-6285.002.patch, 
> YARN-6285.003.patch
>
>
> When users called ApplicationClientProtocol.getApplications, it will return 
> lots of data, and generate lots of garbage on ResourceManager which caused 
> long time GC.
> For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 
> applications.
> getApplications have limit parameter, but some user might not set it, and 
> then the limit will be Long.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6339) Improve performance for createAndGetApplicationReport

2017-03-20 Thread yunjiong zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933782#comment-15933782
 ] 

yunjiong zhao commented on YARN-6339:
-

{quote}Why changes of createAndGetApplicationReport required? {quote}
The purpose is to avoid calling getLogAggregationStatus() unnecessary inside 
getLogAggregationReportsForApp() after application's LogAggregationStatus 
changed to TIME_OUT.
I think we should add LogAggregationStatus.TIME_OUT in 
isLogAggregationFinished() like LogAggregationStatus.SUCCEEDED and 
LogAggregationStatus.FAILED.

If ignore future risks, we can even change logAggregationStatusForAppReport 
inside getLogAggregationStatusForAppReport() with hold readLock only. To avoid 
confusing, due to createAndGetApplicationReport() will call 
getLogAggregationStatusForAppReport() with hold readLock, I think update 
logAggregationStatusForAppReport inside createAndGetApplicationReport() with 
writeLock hold is right thing to do.
{code}
} else if (logTimeOutCount > 0) {
+ logAggregationStatusForAppReport = LogAggregationStatus.TIME_OUT; 
  return LogAggregationStatus.TIME_OUT;
}
{code}




> Improve performance for createAndGetApplicationReport
> -
>
> Key: YARN-6339
> URL: https://issues.apache.org/jira/browse/YARN-6339
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: YARN-6339.001.patch, YARN-6339.002.patch
>
>
> There are two performance issue when calling createAndGetApplicationReport:
> One is inside ProtoUtils.convertFromProtoFormat, replace is too slow for 
> clusters which have more than 3000 nodes. Use substring is much better: 
> https://issues.apache.org/jira/browse/YARN-6285?focusedCommentId=15923241=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15923241
> Another one is inside getLogAggregationReportsForApp, if some application's 
> LogAggregationStatus is TIME_OUT, every time it was called it will create an 
> HashMap which will produce lots of garbage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications

2017-03-20 Thread yunjiong zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933508#comment-15933508
 ] 

yunjiong zhao commented on YARN-6285:
-

[~wangda], appreciate if you have time double check 
LogAggregationReportPBImpl.getLogAggregationStatus() and take a look at 
YARN-6339.


> Add option to set max limit on ResourceManager for 
> ApplicationClientProtocol.getApplications
> 
>
> Key: YARN-6285
> URL: https://issues.apache.org/jira/browse/YARN-6285
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: YARN-6285.001.patch, YARN-6285.002.patch, 
> YARN-6285.003.patch
>
>
> When users called ApplicationClientProtocol.getApplications, it will return 
> lots of data, and generate lots of garbage on ResourceManager which caused 
> long time GC.
> For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 
> applications.
> getApplications have limit parameter, but some user might not set it, and 
> then the limit will be Long.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6339) Improve performance for createAndGetApplicationReport

2017-03-16 Thread yunjiong zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yunjiong zhao updated YARN-6339:

Attachment: YARN-6339.002.patch

Update patch for more improvement.
Change RMAppImpl.logAggregationStatus from HashMap to ConcurrentHashMap so even 
hold a readlock, we can safely update logAggregationStatus.
Then return Collections.unmodifiableMap to avoid create too many HashMap.



> Improve performance for createAndGetApplicationReport
> -
>
> Key: YARN-6339
> URL: https://issues.apache.org/jira/browse/YARN-6339
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: YARN-6339.001.patch, YARN-6339.002.patch
>
>
> There are two performance issue when calling createAndGetApplicationReport:
> One is inside ProtoUtils.convertFromProtoFormat, replace is too slow for 
> clusters which have more than 3000 nodes. Use substring is much better: 
> https://issues.apache.org/jira/browse/YARN-6285?focusedCommentId=15923241=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15923241
> Another one is inside getLogAggregationReportsForApp, if some application's 
> LogAggregationStatus is TIME_OUT, every time it was called it will create an 
> HashMap which will produce lots of garbage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications

2017-03-14 Thread yunjiong zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925292#comment-15925292
 ] 

yunjiong zhao commented on YARN-6285:
-

On one of our cluster GetApplicationsAvgTime looks very bad:
{quote}
  "GetApplicationsNumOps" : 243,
   "GetApplicationsAvgTime" : 3868.0,
{quote}

On the cluster we applied this patch and set 
yarn.resourcemanager.max-limit-get-applications to 400
{quote}
"GetApplicationsNumOps" : 3370,
"GetApplicationsAvgTime" : 549.0,
{quote}

> Add option to set max limit on ResourceManager for 
> ApplicationClientProtocol.getApplications
> 
>
> Key: YARN-6285
> URL: https://issues.apache.org/jira/browse/YARN-6285
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: YARN-6285.001.patch, YARN-6285.002.patch, 
> YARN-6285.003.patch
>
>
> When users called ApplicationClientProtocol.getApplications, it will return 
> lots of data, and generate lots of garbage on ResourceManager which caused 
> long time GC.
> For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 
> applications.
> getApplications have limit parameter, but some user might not set it, and 
> then the limit will be Long.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications

2017-03-14 Thread yunjiong zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925255#comment-15925255
 ] 

yunjiong zhao commented on YARN-6285:
-

I created another issue https://issues.apache.org/jira/browse/YARN-6339 for 
improve performance.



> Add option to set max limit on ResourceManager for 
> ApplicationClientProtocol.getApplications
> 
>
> Key: YARN-6285
> URL: https://issues.apache.org/jira/browse/YARN-6285
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: YARN-6285.001.patch, YARN-6285.002.patch, 
> YARN-6285.003.patch
>
>
> When users called ApplicationClientProtocol.getApplications, it will return 
> lots of data, and generate lots of garbage on ResourceManager which caused 
> long time GC.
> For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 
> applications.
> getApplications have limit parameter, but some user might not set it, and 
> then the limit will be Long.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6339) Improve performance for createAndGetApplicationReport

2017-03-14 Thread yunjiong zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yunjiong zhao updated YARN-6339:

Attachment: YARN-6339.001.patch

This patch have 3 improvements:
1. Use substring instead of replace
2. Update logAggregationStatusForAppReport to reduce time spend in 
getLogAggregationStatusForAppReport.
3. Inside getLogAggregationReportsForApp, move somecondition checks from for 
loop to outside, so for some applications, it won't run that for loop.


> Improve performance for createAndGetApplicationReport
> -
>
> Key: YARN-6339
> URL: https://issues.apache.org/jira/browse/YARN-6339
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: YARN-6339.001.patch
>
>
> There are two performance issue when calling createAndGetApplicationReport:
> One is inside ProtoUtils.convertFromProtoFormat, replace is too slow for 
> clusters which have more than 3000 nodes. Use substring is much better: 
> https://issues.apache.org/jira/browse/YARN-6285?focusedCommentId=15923241=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15923241
> Another one is inside getLogAggregationReportsForApp, if some application's 
> LogAggregationStatus is TIME_OUT, every time it was called it will create an 
> HashMap which will produce lots of garbage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6339) Improve performance for createAndGetApplicationReport

2017-03-14 Thread yunjiong zhao (JIRA)
yunjiong zhao created YARN-6339:
---

 Summary: Improve performance for createAndGetApplicationReport
 Key: YARN-6339
 URL: https://issues.apache.org/jira/browse/YARN-6339
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: yunjiong zhao
Assignee: yunjiong zhao


There are two performance issue when calling createAndGetApplicationReport:
One is inside ProtoUtils.convertFromProtoFormat, replace is too slow for 
clusters which have more than 3000 nodes. Use substring is much better: 
https://issues.apache.org/jira/browse/YARN-6285?focusedCommentId=15923241=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15923241

Another one is inside getLogAggregationReportsForApp, if some application's 
LogAggregationStatus is TIME_OUT, every time it was called it will create an 
HashMap which will produce lots of garbage.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications

2017-03-13 Thread yunjiong zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923364#comment-15923364
 ] 

yunjiong zhao commented on YARN-6285:
-

{quote}
 convertFromProtoFormat is called once for every app
{quote}
This is not true.
There are multiple place will hit convertFromProtoFormat. For example:
Inside RMAppImpl.getLogAggregationStatusForAppReport():
{code}
  for (Entry report : reports.entrySet()) {
switch (report.getValue().getLogAggregationStatus()) { // will call 
convertFromProtoFormat
{code}
Inside RMAppImpl.getLogAggregationReportsForApp
{code}
for (Entry output : outputs.entrySet()) {
  if (!output.getValue().getLogAggregationStatus()
{code}
And our cluster which have more than 3000 nodes and running applications some 
times more than 500, from above two places getApplications may call 
convertFromProtoFormat 3,000,000 times.

I'm not saying it will completely solve the problem.
But definitely can approve the situation.

> Add option to set max limit on ResourceManager for 
> ApplicationClientProtocol.getApplications
> 
>
> Key: YARN-6285
> URL: https://issues.apache.org/jira/browse/YARN-6285
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: YARN-6285.001.patch, YARN-6285.002.patch, 
> YARN-6285.003.patch
>
>
> When users called ApplicationClientProtocol.getApplications, it will return 
> lots of data, and generate lots of garbage on ResourceManager which caused 
> long time GC.
> For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 
> applications.
> getApplications have limit parameter, but some user might not set it, and 
> then the limit will be Long.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications

2017-03-13 Thread yunjiong zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923241#comment-15923241
 ] 

yunjiong zhao commented on YARN-6285:
-

I used below code for testing replace and substring, replace took 28107ms, and 
substring took 563ms.
I below above change definitely can improve the performance.
{code}
private  static void testReplace() {
long s = System.currentTimeMillis();
for (int i = 0; i < 1; i++) {
"LOG_disable".replace("LOG_", "");
}
System.out.println(System.currentTimeMillis() - s);
}

private static void testSubstring() {
long s = System.currentTimeMillis();
for (int i = 0; i < 1; i++) {
"LOG_disable".substring(4);
}
System.out.println(System.currentTimeMillis() - s);
}
{code}

> Add option to set max limit on ResourceManager for 
> ApplicationClientProtocol.getApplications
> 
>
> Key: YARN-6285
> URL: https://issues.apache.org/jira/browse/YARN-6285
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: YARN-6285.001.patch, YARN-6285.002.patch, 
> YARN-6285.003.patch
>
>
> When users called ApplicationClientProtocol.getApplications, it will return 
> lots of data, and generate lots of garbage on ResourceManager which caused 
> long time GC.
> For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 
> applications.
> getApplications have limit parameter, but some user might not set it, and 
> then the limit will be Long.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications

2017-03-13 Thread yunjiong zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15923208#comment-15923208
 ] 

yunjiong zhao commented on YARN-6285:
-

Sorry for late response.
2.25 seconds in getApplications doesn't include ResourceRequest.
Most of the time was spend on getLogAggregationReportsForApp as stack trace 
shows.

I believe below code change should improve the performance (will test later)
{code}
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ProtoUtils.java
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ProtoUtils.java
index ab283e7..926c757 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ProtoUtils.java
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ProtoUtils.java
@@ -296,6 +296,8 @@ public static ReservationRequestInterpreter 
convertFromProtoFormat(
* Log Aggregation Status
*/
   private static final String LOG_AGGREGATION_STATUS_PREFIX = "LOG_";
+  private static final int LOG_AGGREGATION_STATUS_PREFIX_LEN =
+  LOG_AGGREGATION_STATUS_PREFIX.length();
   public static LogAggregationStatusProto convertToProtoFormat(
   LogAggregationStatus e) {
 return LogAggregationStatusProto.valueOf(LOG_AGGREGATION_STATUS_PREFIX
@@ -304,8 +306,8 @@ public static LogAggregationStatusProto 
convertToProtoFormat(
 
   public static LogAggregationStatus convertFromProtoFormat(
   LogAggregationStatusProto e) {
-return LogAggregationStatus.valueOf(e.name().replace(
-  LOG_AGGREGATION_STATUS_PREFIX, ""));
+return LogAggregationStatus.valueOf(e.name().substring(
+LOG_AGGREGATION_STATUS_PREFIX_LEN));
   }
 
   /*
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
index 9f00b2e..1db66a5 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
@@ -1700,17 +1700,16 @@ public ResourceRequest getAMResourceRequest() {
   Map outputs =
   new HashMap();
   outputs.putAll(logAggregationStatus);
-  if (!isLogAggregationFinished()) {
+  if (!isLogAggregationFinished() && isAppInFinalState(this) &&
+  System.currentTimeMillis() > this.logAggregationStartTime
+  + this.logAggregationStatusTimeout) {
 for (Entry output : outputs.entrySet()) {
   if (!output.getValue().getLogAggregationStatus()
 .equals(LogAggregationStatus.TIME_OUT)
   && !output.getValue().getLogAggregationStatus()
 .equals(LogAggregationStatus.SUCCEEDED)
   && !output.getValue().getLogAggregationStatus()
-.equals(LogAggregationStatus.FAILED)
-  && isAppInFinalState(this)
-  && System.currentTimeMillis() > this.logAggregationStartTime
-  + this.logAggregationStatusTimeout) {
+.equals(LogAggregationStatus.FAILED)) {
 output.getValue().setLogAggregationStatus(
   LogAggregationStatus.TIME_OUT);
   }
{code}
Should I open a new issue for those changes?

{quote}
1) Add parameter to indicate if we should include 
ResourceRequest/getLogAggregationReportsForApp in the response, default is true 
to make it compatible. (Can be done if above experimental shows it really 
helps).
{quote}
This will help if user use those parameters.


> Add option to set max limit on ResourceManager for 
> ApplicationClientProtocol.getApplications
> 
>
> Key: YARN-6285
> URL: https://issues.apache.org/jira/browse/YARN-6285
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: YARN-6285.001.patch, YARN-6285.002.patch, 
> YARN-6285.003.patch
>
>
> When users called ApplicationClientProtocol.getApplications, it will return 
> lots of data, and generate lots of garbage on ResourceManager which caused 
> long time GC.
> For example, on one of our RM, when called rest API " 

[jira] [Commented] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications

2017-03-06 Thread yunjiong zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897949#comment-15897949
 ] 

yunjiong zhao commented on YARN-6285:
-

[~sunilg], Totally agree.
This patch is for short time purpose to give cluster admin a choice to prevent 
RM spend to much time on GC.
After we deployed this patch and set a limit to 50, in the last two days, our 
cluster's GC was doing good.
The top 10 worst case are:
{quote}
0.4960477
0.4992665
0.5180593
0.5804366
0.5876860
0.5885162
0.5900650
0.6041406
0.6474685
0.8865442
{quote}
And total time spend on GC is around 1.5%, compared to before it's much better.

> Add option to set max limit on ResourceManager for 
> ApplicationClientProtocol.getApplications
> 
>
> Key: YARN-6285
> URL: https://issues.apache.org/jira/browse/YARN-6285
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: YARN-6285.001.patch, YARN-6285.002.patch, 
> YARN-6285.003.patch
>
>
> When users called ApplicationClientProtocol.getApplications, it will return 
> lots of data, and generate lots of garbage on ResourceManager which caused 
> long time GC.
> For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 
> applications.
> getApplications have limit parameter, but some user might not set it, and 
> then the limit will be Long.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6280) Add a query parameter in ResourceManager Cluster Applications REST API to control whether or not returns ResourceRequest

2017-03-04 Thread yunjiong zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15895840#comment-15895840
 ] 

yunjiong zhao commented on YARN-6280:
-

How about change the default behavior to hide ResouceRequest?

> Add a query parameter in ResourceManager Cluster Applications REST API to 
> control whether or not returns ResourceRequest
> 
>
> Key: YARN-6280
> URL: https://issues.apache.org/jira/browse/YARN-6280
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, restapi
>Affects Versions: 2.7.3
>Reporter: Lantao Jin
> Attachments: YARN-6280.001.patch, YARN-6280.002.patch
>
>
> Begin from v2.7, the ResourceManager Cluster Applications REST API returns   
> ResourceRequest list. It's a very large construction in AppInfo.
> As a test, we use below URI to query only 2 results:
> http:// address:port>/ws/v1/cluster/apps?states=running,accepted=2
> The results are very different:
> ||Hadoop version|Total Character|Total Word|Total Lines|Size||
> |2.4.1|1192|  42| 42| 1.2 KB|
> |2.7.1|1222179|   48740|  48735|  1.21 MB|
> Most RESTful API requesters don't know about this after upgraded and their 
> old queries may cause ResourceManager more GC consuming and slower. Even if 
> they know this but have no idea to reduce the impact of ResourceManager 
> except slow down their query frequency.
> The patch adding a query parameter "showResourceRequests" to help requesters 
> who don't need this information to reduce the overhead. In consideration of 
> compatibility of interface, the default value is true if they don't set the 
> parameter, so the behaviour is the same as now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications

2017-03-04 Thread yunjiong zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15895837#comment-15895837
 ] 

yunjiong zhao commented on YARN-6285:
-

I checked the returned result by rest API, the major size was ResourceRequest.
If in https://issues.apache.org/jira/browse/YARN-6280, change the default 
behavior to not show ResourceRequest will help.
However it's not enough, I have 2 reasons:
1. Slowness in getApplications, below stack trace files shows it spend at least 
2.25 seconds in getApplications. 
{code}
 grep -A20 " #7876 daemon " 829
"363440407@qtp-1966670937-117" #7876 daemon prio=5 os_prio=0 
tid=0x7f12093a2800 nid=0x1c46 runnable [0x7f05344b8000]
   java.lang.Thread.State: RUNNABLE
at java.util.regex.Matcher.search(Matcher.java:1248)
at java.util.regex.Matcher.find(Matcher.java:637)
at java.util.regex.Matcher.replaceAll(Matcher.java:951)
at java.lang.String.replace(String.java:2240)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ProtoUtils.convertFromProtoFormat(ProtoUtils.java:270)
at 
org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.LogAggregationReportPBImpl.convertFromProtoFormat(LogAggregationReportPBImpl.java:158)
at 
org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.LogAggregationReportPBImpl.getLogAggregationStatus(LogAggregationReportPBImpl.java:142)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getLogAggregationStatusForAppReport(RMAppImpl.java:1559)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:631)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:814)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:681)
at 
org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:89)
at 
org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:86)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at 
org.apache.hadoop.yarn.server.webapp.AppsBlock.fetchData(AppsBlock.java:84)
at 
org.apache.hadoop.yarn.server.webapp.AppsBlock.render(AppsBlock.java:101)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
{code}

{code}
 grep -A20 " #7876 daemon " 838
"363440407@qtp-1966670937-117" #7876 daemon prio=5 os_prio=0 
tid=0x7f12093a2800 nid=0x1c46 runnable [0x7f05344b8000]
   java.lang.Thread.State: RUNNABLE
at java.util.HashMap.hash(HashMap.java:338)
at java.util.HashMap.putMapEntries(HashMap.java:514)
at java.util.HashMap.putAll(HashMap.java:784)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getLogAggregationReportsForApp(RMAppImpl.java:1466)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getLogAggregationStatusForAppReport(RMAppImpl.java:1549)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:631)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:814)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:681)
at 
org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:89)
at 
org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:86)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at 
org.apache.hadoop.yarn.server.webapp.AppsBlock.fetchData(AppsBlock.java:84)
at 
org.apache.hadoop.yarn.server.webapp.AppsBlock.render(AppsBlock.java:101)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
{code}

{code}
grep -A20 " #7876 daemon " 839
"363440407@qtp-1966670937-117" #7876 daemon prio=5 os_prio=0 
tid=0x7f12093a2800 nid=0x1c46 runnable [0x7f05344b8000]
   java.lang.Thread.State: RUNNABLE
at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
at java.util.LinkedHashSet.(LinkedHashSet.java:169)
at 
org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1569)
- locked <0x7f06bb34c0b8> (a 

[jira] [Updated] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications

2017-03-03 Thread yunjiong zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yunjiong zhao updated YARN-6285:

Attachment: YARN-6285.003.patch

Update patch according to [~benoyantony] comments.
Set default value to Long.MAX_VALUE, so by default, it changes nothing.
Thanks [~benoyantony]  for your time.

> Add option to set max limit on ResourceManager for 
> ApplicationClientProtocol.getApplications
> 
>
> Key: YARN-6285
> URL: https://issues.apache.org/jira/browse/YARN-6285
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: YARN-6285.001.patch, YARN-6285.002.patch, 
> YARN-6285.003.patch
>
>
> When users called ApplicationClientProtocol.getApplications, it will return 
> lots of data, and generate lots of garbage on ResourceManager which caused 
> long time GC.
> For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 
> applications.
> getApplications have limit parameter, but some user might not set it, and 
> then the limit will be Long.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications

2017-03-03 Thread yunjiong zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15895190#comment-15895190
 ] 

yunjiong zhao edited comment on YARN-6285 at 3/3/17 10:54 PM:
--

Fix checkstyle.
Failed unit test in TestRMRestart is not related.


was (Author: zhaoyunjiong):
Fix checkstyle.
Failure unit test is not related.

> Add option to set max limit on ResourceManager for 
> ApplicationClientProtocol.getApplications
> 
>
> Key: YARN-6285
> URL: https://issues.apache.org/jira/browse/YARN-6285
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: YARN-6285.001.patch, YARN-6285.002.patch
>
>
> When users called ApplicationClientProtocol.getApplications, it will return 
> lots of data, and generate lots of garbage on ResourceManager which caused 
> long time GC.
> For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 
> applications.
> getApplications have limit parameter, but some user might not set it, and 
> then the limit will be Long.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications

2017-03-03 Thread yunjiong zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yunjiong zhao updated YARN-6285:

Attachment: YARN-6285.002.patch

Fix checkstyle.
Failure unit test is not related.

> Add option to set max limit on ResourceManager for 
> ApplicationClientProtocol.getApplications
> 
>
> Key: YARN-6285
> URL: https://issues.apache.org/jira/browse/YARN-6285
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: YARN-6285.001.patch, YARN-6285.002.patch
>
>
> When users called ApplicationClientProtocol.getApplications, it will return 
> lots of data, and generate lots of garbage on ResourceManager which caused 
> long time GC.
> For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 
> applications.
> getApplications have limit parameter, but some user might not set it, and 
> then the limit will be Long.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications

2017-03-03 Thread yunjiong zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894960#comment-15894960
 ] 

yunjiong zhao edited comment on YARN-6285 at 3/3/17 9:22 PM:
-

This patch allow set a max limit on RM for 
ApplicationClientProtocol.getApplications.
Also in the log, it will tell cluster admin which user called the 
getApplications with bigger limit than the max limit like below
{quote}
INFO  [main] resourcemanager.ClientRMService 
(ClientRMService.java:getApplications(878)) - User yunjzhao called 
getApplications with limit=9223372036854775807
{quote}


was (Author: zhaoyunjiong):
This patch allowed set a max limit on RM for 
ApplicationClientProtocol.getApplications.
Also in the log, it will tell cluster admin which user called the 
getApplications with bigger limit than the max limit like below
{quote}
INFO  [main] resourcemanager.ClientRMService 
(ClientRMService.java:getApplications(878)) - User yunjzhao called 
getApplications with limit=9223372036854775807
{quote}

> Add option to set max limit on ResourceManager for 
> ApplicationClientProtocol.getApplications
> 
>
> Key: YARN-6285
> URL: https://issues.apache.org/jira/browse/YARN-6285
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: YARN-6285.001.patch
>
>
> When users called ApplicationClientProtocol.getApplications, it will return 
> lots of data, and generate lots of garbage on ResourceManager which caused 
> long time GC.
> For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 
> applications.
> getApplications have limit parameter, but some user might not set it, and 
> then the limit will be Long.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications

2017-03-03 Thread yunjiong zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yunjiong zhao updated YARN-6285:

Attachment: YARN-6285.001.patch

This patch allowed set a max limit on RM for 
ApplicationClientProtocol.getApplications.
Also in the log, it will tell cluster admin which user called the 
getApplications with bigger limit than the max limit like below
{quote}
INFO  [main] resourcemanager.ClientRMService 
(ClientRMService.java:getApplications(878)) - User yunjzhao called 
getApplications with limit=9223372036854775807
{quote}

> Add option to set max limit on ResourceManager for 
> ApplicationClientProtocol.getApplications
> 
>
> Key: YARN-6285
> URL: https://issues.apache.org/jira/browse/YARN-6285
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: YARN-6285.001.patch
>
>
> When users called ApplicationClientProtocol.getApplications, it will return 
> lots of data, and generate lots of garbage on ResourceManager which caused 
> long time GC.
> For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 
> applications.
> getApplications have limit parameter, but some user might not set it, and 
> then the limit will be Long.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications

2017-03-03 Thread yunjiong zhao (JIRA)
yunjiong zhao created YARN-6285:
---

 Summary: Add option to set max limit on ResourceManager for 
ApplicationClientProtocol.getApplications
 Key: YARN-6285
 URL: https://issues.apache.org/jira/browse/YARN-6285
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: yunjiong zhao
Assignee: yunjiong zhao


When users called ApplicationClientProtocol.getApplications, it will return 
lots of data, and generate lots of garbage on ResourceManager which caused long 
time GC.
For example, on one of our RM, when called rest API " http:///ws/v1/cluster/apps" it can return 150MB data which have 944 
applications.
getApplications have limit parameter, but some user might not set it, and then 
the limit will be Long.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6254) Provide a mechanism to whitelist the RM REST API clients

2017-02-28 Thread yunjiong zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889706#comment-15889706
 ] 

yunjiong zhao commented on YARN-6254:
-

Reduce yarn.resourcemanager.max-completed-applications from default value 1 
 to a small value like 500 should solve the problem.

> Provide a mechanism to whitelist the RM REST API clients
> 
>
> Key: YARN-6254
> URL: https://issues.apache.org/jira/browse/YARN-6254
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Aroop Maliakkal
>
> Currently RM REST APIs are open to everyone. Can we provide a whitelist 
> feature so that we can control what frequency and what hosts can hit the RM 
> REST APIs ?
> Thanks,
> /Aroop



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org