HDFS across data centers: HighTide
----------------------------------

                 Key: HDFS-1432
                 URL: https://issues.apache.org/jira/browse/HDFS-1432
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: dhruba borthakur
            Assignee: dhruba borthakur


here are many instances when the same piece of data resides on multiple HDFS 
clusters in different data centers.  The primary reason being that the physical 
limitation of one data center is insufficient to host the entire data set. In 
that case, the administrator(s) typically partition that data into two  (or 
more) HDFS clusters on two different data centers and then duplicates some 
subset of that data into both the HDFS clusters.

In such a situation, there will be six physical copies of data that is 
duplicated, three copies in one data center and another three copies in another 
data center. It would be nice if we can keep fewer than 3 replicas on each of 
the data centers and have the ability to fix a replica in the local data center 
by copying data from the remote copy in the remote data center.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to