Hi,

I have a project that needs to store large number of image and video files,
the file size varies from 10MB to 10GB, the initial number of files will be
0.1 billion and would grow over 1 billion, what will be the practical
recommendations to store and view these files?



#1 One cluster, store the HDFS URL in HBase and store the actual file in
HDFS? (block_size as 128MB and replication factor as 3)


#2 One cluster, Store small files in HBase directly and use #1 for large
files? (block_size as 128MB and replication factor as 3)


#3 Multiple Hadoop/HBase clusters, each with different block_size settings?


     e.g. cluster 1 (small): block_size as 128MB and replication factor as
3, store all files in HBase if their file size is smaller 128MB

            cluster 2 (large): bigger block_size, say 4GB, replication
factor as 3, store the HDFS URL in HBase and store the actual file in HDFS



#4 Use Hadoop Federation for large number of files?


About Fault Tolerance, need to consider four types of failures: driver,
host, rack, and  datacenter failures.


Regards

Reply via email to