[jira] [Comment Edited] (YARN-7672) hadoop-sls can not simulate huge scale of YARN

stefanlee (JIRA) Fri, 12 Jan 2018 01:19:59 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323728#comment-16323728
 ]


stefanlee edited comment on YARN-7672 at 1/12/18 9:18 AM:
----------------------------------------------------------

[~zsl2007] thanks for this jira. i have merged this patch to my hadoop version 
and there is a problem occurred during my testing.

{code:java}
1. RM1 is active ,RM2 is standby
2. i run SLSRunnerForRealRM and my jobs will running in my cluster with correct 
user name and queue name.
then:
1. RM1 is standby , RM2 is active
2. i run SLSRunnerForRealRM and my jobs will failover to RM2, then them will 
running in my cluster with the user who 
 run SLSRunnerForRealRM. that is ,them will running in one queue.
{code}
i review the hadoop resource and found this prolem occurred in 
*ConfiguredRMFailoverProxyProivder.getProxyInternal->RMProxy.getProxy*

{code:java}
  static <T> T getProxy(final Configuration conf,
      final Class<T> protocol, final InetSocketAddress rmAddress)
      throws IOException {
    return UserGroupInformation.getCurrentUser().doAs(
      new PrivilegedAction<T>() {
        @Override
        public T run() {
          return (T) YarnRPC.create(conf).getProxy(protocol, rmAddress, conf);
        }
      });
{code}

here, it will *getCurrentUser()*, so we should come up with a solution to 
resolve it.
but if we have only one RM, it will run well.:D


was (Author: imstefanlee):
[~zsl2007] thanks for this jira. i have merged this patch to my hadoop version 
and there is a problem occurred during my testing.

{code:java}
1. RM1 is active ,RM2 is standby
2. i run SLSRunnerForRealRM and my jobs will running in my cluster with correct 
user name and queue name.
then:
1. RM1 is standby , RM2 is active
2. i run SLSRunnerForRealRM and my jobs will failover to RM2, then them will 
running in my cluster with the user who 
 run SLSRunnerForRealRM. that is ,them will running in one queue.
{code}
i review the hadoop resource and found this prolem occurred in 
*ConfiguredRMFailoverProxyProivder.getProxyInternal->RMProxy.getProxy*

{code:java}
  static <T> T getProxy(final Configuration conf,
      final Class<T> protocol, final InetSocketAddress rmAddress)
      throws IOException {
    return UserGroupInformation.getCurrentUser().doAs(
      new PrivilegedAction<T>() {
        @Override
        public T run() {
          return (T) YarnRPC.create(conf).getProxy(protocol, rmAddress, conf);
        }
      });
{code}

here, it will *getCurrentUser()*, so we should come up with a solution to 
resolve it.

> hadoop-sls can not simulate huge scale of YARN
> ----------------------------------------------
>
>                 Key: YARN-7672
>                 URL: https://issues.apache.org/jira/browse/YARN-7672
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: zhangshilong
>            Assignee: zhangshilong
>         Attachments: YARN-7672.patch
>
>
> Our YARN cluster scale to nearly 10 thousands nodes. We need to do scheduler 
> pressure test.
> Using SLS,we start  2000+ threads to simulate NM and AM. But  cpu.load very 
> high to 100+. I thought that will affect  performance evaluation of 
> scheduler. 
> So I thought to separate the scheduler from the simulator.
> I start a real RM. Then SLS will register nodes to RM,And submit apps to RM 
> using RM RPC.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7672) hadoop-sls can not simulate huge scale of YARN

Reply via email to