Hi,
    I met the same problem as : 
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201303.mbox/%3c482c5f6f-6feb-4552-99f5-07c8b54ac...@apache.org%3E
 
<http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201303.mbox/%3c482c5f6f-6feb-4552-99f5-07c8b54ac...@apache.org%3E>

 Any idea about that?
  It almost hadoop every 3 or 4weeks in my cluster(about 150 nodes).
I check the log, no warn, no error, no exception, but the ResouceManager hung, 
not crash.

I found this code, but I have no idea why it happens, why the event is bigger 
and bigger?

thanks.

   private final class EventProcessor implements Runnable {
      @Override
      public void run() {

        SchedulerEvent event;

        while (!stopped && !Thread.currentThread().isInterrupted()) {
          try {
            event = eventQueue.take();
          } catch (InterruptedException e) {
            LOG.error("Returning, interrupted : " + e);
            return; // TODO: Kill RM.
          }

          try {
            scheduler.handle(event);
          } catch (Throwable t) {
            // An error occurred, but we are shutting down anyway.
            // If it was an InterruptedException, the very act of 
            // shutdown could have caused it and is probably harmless.
            if (stopped) {
              LOG.warn("Exception during shutdown: ", t);
              break;
            }
            LOG.fatal("Error in handling event type " + event.getType()
                + " to the scheduler", t);
            if (shouldExitOnError
                && !ShutdownHookManager.get().isShutdownInProgress()) {
              LOG.info("Exiting, bbye..");
              System.exit(-1);
            }
          }
        }
      }
    }

    @Override
    protected void serviceStop() throws Exception {
      this.stopped = true;
      this.eventProcessor.interrupt();
      try {
        this.eventProcessor.join();
      } catch (InterruptedException e) {
        throw new YarnRuntimeException(e);
      }
      super.serviceStop();
    }

    @Override
    public void handle(SchedulerEvent event) {
      try {
        int qSize = eventQueue.size();
        if (qSize !=0 && qSize %1000 == 0) {
          LOG.info("Size of scheduler event-queue is " + qSize);
        }
        int remCapacity = eventQueue.remainingCapacity();
        if (remCapacity < 1000) {
          LOG.info("Very low remaining capacity on scheduler event queue: "
              + remCapacity);
        }
        this.eventQueue.put(event);
      } catch (InterruptedException e) {
        throw new YarnRuntimeException(e);
      }
    }
  }

logs:

grep 'Size of event-queue' 
yarn-hadoop-resourcemanager-gdc-hm01-formal.i.nease.net.log
2015-05-29 00:54:46,985 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 1000
2015-05-29 00:55:28,850 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 2000
2015-05-29 00:56:10,204 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 3000
2015-05-29 00:56:51,995 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 4000
2015-05-29 00:57:33,981 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 5000
2015-05-29 00:58:15,324 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 6000
2015-05-29 00:58:57,111 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 7000
2015-05-29 00:59:38,593 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 8000
2015-05-29 01:00:20,215 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 9000
2015-05-29 01:01:00,559 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 10000
2015-05-29 01:01:39,614 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 11000
2015-05-29 01:02:21,364 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 12000
2015-05-29 01:03:03,233 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 13000
2015-05-29 01:03:44,701 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 14000
2015-05-29 01:04:26,494 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 15000
2015-05-29 01:05:08,180 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 16000
2015-05-29 01:05:50,331 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Size 
of event-queue is 17000


Reply via email to