[
https://issues.apache.org/jira/browse/YARN-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332344#comment-14332344
]
Hadoop QA commented on YARN-3242:
---------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12700107/YARN-3242.002.patch
against trunk revision fe7a302.
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+1 tests included{color}. The patch appears to include 2 new
or modified test files.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javadoc{color}. There were no new javadoc warning messages.
{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.
{color:red}-1 findbugs{color}. The patch appears to introduce 5 new
Findbugs (version 2.0.3) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 core tests{color}. The patch passed unit tests in
hadoop-common-project/hadoop-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.
Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/6693//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-YARN-Build/6693//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6693//console
This message is automatically generated.
> Old ZK client session watcher event causes ZKRMStateStore out of sync with
> current ZK client session due to ZooKeeper asynchronously closing client
> session.
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-3242
> URL: https://issues.apache.org/jira/browse/YARN-3242
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.6.0
> Reporter: zhihai xu
> Assignee: zhihai xu
> Priority: Critical
> Attachments: YARN-3242.000.patch, YARN-3242.001.patch,
> YARN-3242.002.patch
>
>
> Old ZK client session watcher event messed up new ZK client session due to
> ZooKeeper asynchronously closing client session.
> The watcher event from old ZK client session can still be sent to
> ZKRMStateStore after the old ZK client session is closed.
> This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper
> session.
> We only have one ZKRMStateStore but we can have multiple ZK client sessions.
> Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher
> event is from current session. So the watcher event from old ZK client
> session which just is closed will still be processed.
> For example, If a Disconnected event received from old session after new
> session is connected, the zkClient will be set to null
> {code}
> case Disconnected:
> LOG.info("ZKRMStateStore Session disconnected");
> oldZkClient = zkClient;
> zkClient = null;
> break;
> {code}
> Then ZKRMStateStore won't receive SyncConnected event from new session
> because new session is already in SyncConnected state and it won't send
> SyncConnected event until it is disconnected and connected again.
> Then we will see all the ZKRMStateStore operations fail with IOException
> "Wait for ZKClient creation timed out" until RM shutdown.
> The following code from zookeeper(ClientCnxn#EventThread) show even after
> receive eventOfDeath, EventThread will still process all the events until
> waitingEvents queue is empty.
> {code}
> while (true) {
> Object event = waitingEvents.take();
> if (event == eventOfDeath) {
> wasKilled = true;
> } else {
> processEvent(event);
> }
> if (wasKilled)
> synchronized (waitingEvents) {
> if (waitingEvents.isEmpty()) {
> isRunning = false;
> break;
> }
> }
> }
> private void processEvent(Object event) {
> try {
> if (event instanceof WatcherSetEventPair) {
> // each watcher will process the event
> WatcherSetEventPair pair = (WatcherSetEventPair) event;
> for (Watcher watcher : pair.watchers) {
> try {
> watcher.process(pair.event);
> } catch (Throwable t) {
> LOG.error("Error while calling watcher ", t);
> }
> }
> } else {
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)