[jira] [Created] (HDFS-13165) [SPS]: Collects successfully moved block details via IBR

Rakesh R (JIRA) Sun, 18 Feb 2018 00:05:37 -0800

Rakesh R created HDFS-13165:
-------------------------------

             Summary: [SPS]: Collects successfully moved block details via IBR
                 Key: HDFS-13165
                 URL: https://issues.apache.org/jira/browse/HDFS-13165
             Project: Hadoop HDFS
          Issue Type: Sub-task
            Reporter: Rakesh R

This task to make use of the existing IBR to get moved block details and remove
unwanted future tracking logic exists in BlockStorageMovementTracker code, this
is no more needed as the file level tracking maintained at NN itself.

Following comments taken from HDFS-10285,
[here|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16347472]

Comment-3)
{quote}BPServiceActor
Is it actually sending back the moved blocks? Aren’t IBRs sufficient?{quote}

Comment-21)
{quote}
BlockStorageMovementTracker
Many data structures are riddled with non-threadsafe race conditions and risk
of CMEs.

Ex. The moverTaskFutures map. Adding new blocks and/or adding to a block's list
of futures is synchronized. However the run loop does an unsynchronized block
get, unsynchronized future remove, unsynchronized isEmpty, possibly another
unsynchronized get, only then does it do a synchronized remove of the block.
The whole chunk of code should be synchronized.

Is the problematic moverTaskFutures even needed? It's aggregating futures
per-block for seemingly no reason. Why track all the futures at all instead of
just relying on the completion service? As best I can tell:

It's only used to determine if a future from the completion service should be
ignored during shutdown. Shutdown sets the running boolean to false and clears
the entire datastructure so why not use the running boolean like a check just a
little further down?
As synchronization to sleep up to 2 seconds before performing a blocking
moverCompletionService.take, but only when it thinks there are no active
futures. I'll ignore the missed notify race that the bounded wait masks, but
the real question is why not just do the blocking take?
Why all the complexity? Am I missing something?

BlocksMovementsStatusHandler
Suffers same type of thread safety issues as StoragePolicySatisfyWorker. Ex.
blockIdVsMovementStatus is inconsistent synchronized. Does synchronize to
return an unmodifiable list which sadly does nothing to protect the caller from
CME.

handle is iterating over a non-thread safe list.
{quote}

--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13165) [SPS]: Collects successfully moved block details via IBR

Reply via email to