[ https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928977#comment-15928977 ]
Joep Rottinghuis commented on YARN-6342: ---------------------------------------- A possible other approach is to code / configure one overall timeout under which we need to stay to shut down all clients. We can then give each client a fraction thereof and keep track of how much is left. I'm doing something similar in the spooling code see YARN-4061 and HBASE-17018 (thought that isn't complete yet and I need to pick up that work again). Wrt. data loss on shut-down, note that the loss will be limited to TIMELINE_SERVICE_WRITER_FLUSH_INTERVAL_SECONDS, which defaults to 1 minute. On average the loss could be half that, but in the worst case it would be 30 seconds. The timeout during shutdown and the timout at which we detect HBase doesn't accept writes (and we end up spooling to file) should be carefully tuned to not loose any data under normal operating circumstances, even when HBase is down. Perhaps we should have a config for this, or at least have one be a multiple of the other. I'll keep this in mind in the spooling work. > Issues in async API of TimelineClient > ------------------------------------- > > Key: YARN-6342 > URL: https://issues.apache.org/jira/browse/YARN-6342 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Jian He > > Found these with [~rohithsharma] while browsing the code > - In stop: it calls shutdownNow which doens't wait for pending tasks, should > it use shutdown instead ? > {code} > public void stop() { > LOG.info("Stopping TimelineClient."); > executor.shutdownNow(); > try { > executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS); > } catch (InterruptedException e) { > {code} > - In TimelineClientImpl#createRunnable: > If any exception happens when publish one entity > (publishWithoutBlockingOnQueue), the thread exists. I think it should try > best effort to continue publishing the timeline entities, one failure should > not cause all followup entities not published. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org