First of all, that might not be the right approach to choose the underlying storage. You should choose HDFS or HBase depending on whether the data is going to be used for batch processing or you need random access on top of it. HBase is just another layer on top of HDFS. So obviously the queries running on top of HBase are going to be less efficient. So if you can get away with using HDFS, I would say that is the best and simplest approach.
On Wed, Jul 17, 2013 at 12:40 PM, Hamza Asad <hamza.asa...@gmail.com> wrote: > Please let me knw which approach is better. Either i save my data directly > to HDFS and run hive (shark) queries over it OR store my data in HBASE, and > then query it.. as i want to ensure efficient data retrieval and data > remains safe and can easily recover if hadoop crashes. > > -- > *Muhammad Hamza Asad* > -- Swarnim