On 25 October 2012 23:17, Daniel Käfer <d.kae...@hs-furtwangen.de> wrote:
> Am Donnerstag, den 25.10.2012, 22:10 +0100 schrieb Steve Loughran: > > Regarding storing DB data, HBase-on-HDFS is where people keep it; Pig > > and Hive can work with that as well as rawer data kept in HDFS > > directly > > But is that the best idea? HBase is great for random read and small > range scan. But the Hive (SQL) performance is 4-5x slower than plain > HDFS. [0] > > > I guess first data (raw data) in HDFS and last data in HBase is a good > idea. But how to store the data between individual mapreduce jobs? > Depends on the amount of data and expected use. If it's transient food for the next MR jobs: HDFS