[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338237#comment-17338237
 ] 

Andras Gyori commented on YARN-9927:
------------------------------------

Thank you [~zhuqi] for the patch. I think this approach is well thought, 
because it reuses existing logic already established elsewhere (services as 
separate threads). On this part I have one addition:
 * Please move MultiThreadDispatcher to its own file as ResourceManager already 
has quite a good amount of code.

As for RMNode event handling, I have one proposition. Usually keeping code as 
simple as possible is a good recommendation, but I do think event handling is a 
crucial part of YARN, and it might be worthwhile to provide fine tuning 
options. The RMNode event handling is a good way to improve performance, but I 
could see a value in providing a more generic way of event handling. A proof of 
concept implementation of my proposition is:
 # Create a MultiThreadEventHandler wrapper
{code:java}
 public static class MultiThreadEventHandler implements EventHandler<Event<?>> {
    private final ThreadPoolExecutor multiHandlerThreadPool;
    private final EventHandler<Event<?>> handler;

    public MultiThreadEventHandler(EventHandler<Event<?>> handler,
                                   int maximumPoolSize) {
      this.handler = handler;
      ThreadFactory threadFactory = new ThreadFactoryBuilder()
          .setNameFormat("multiHandlerThread #%d")
          .build();
      multiHandlerThreadPool = new ThreadPoolExecutor(
          5, maximumPoolSize, 10, TimeUnit.SECONDS,
          new LinkedBlockingQueue<>(), threadFactory);
    }

    @Override
    public void handle(Event<?> event) {
      multiHandlerThreadPool.submit(() -> handler.handle(event));
    }
  }
{code}

 # Provide configuration values to set MultiThreadEventHandler for a specific 
EventType and the MultiThreadDispatcher#register would look like this
{code:java}
 @Override
    public void register(Class<? extends Enum> eventType,
        EventHandler handler) {
      if (eventTypeDispatcherMap.get(eventType) == null) {
        AsyncDispatcher asyncDispatcher =
            createDispatcher(eventType);
        eventTypeDispatcherMap.put(eventType,
            asyncDispatcher);
        addIfService(asyncDispatcher);
      }
      EventHandler registeredHandler = handler;
      boolean isMultiThreadEventHandler = 
getConfig().getBoolean("yarn.scheduler.event." + eventType.getCanonicalName()
          + ".multi-thread-handler.enabled", false);
      if (isMultiThreadEventHandler) {
        int poolSize = getConfig().getInt("yarn.scheduler.event." + 
eventType.getCanonicalName()
            + ".multi-thread-handler.max-pool-size", 5);
        registeredHandler = new MultiThreadEventHandler(handler, poolSize);
      }

      eventTypeDispatcherMap.
          get(eventType).register(eventType, registeredHandler);
    }
{code}

As it was emphasised before, this is a performance critical section of YARN, 
therefore some kind of stress test done via SLS or manually would need to be 
done to make sure RM is not crippled by these changes and the performance 
increase justifies this complexity and extended hardware resource usage. 

> RM multi-thread event processing mechanism
> ------------------------------------------
>
>                 Key: YARN-9927
>                 URL: https://issues.apache.org/jira/browse/YARN-9927
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>    Affects Versions: 3.0.0, 2.9.2
>            Reporter: hcarrot
>            Assignee: Qi Zhu
>            Priority: Major
>         Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch, YARN-9927.002.patch, YARN-9927.003.patch, 
> YARN-9927.004.patch, YARN-9927.005.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to