On 25 October 2012 23:17, Daniel Käfer <d.kae...@hs-furtwangen.de> wrote:

> Am Donnerstag, den 25.10.2012, 22:10 +0100 schrieb Steve Loughran:
> > Regarding storing DB data, HBase-on-HDFS is where people keep it; Pig
> > and Hive can work with that as well as rawer data kept in HDFS
> > directly
>
> But is that the best idea? HBase is great for random read and small
> range scan. But the Hive (SQL) performance is 4-5x slower than plain
> HDFS. [0]
>
>

> I guess first data (raw data) in HDFS and last data in HBase is a good
> idea. But how to store the data between individual mapreduce jobs?
>

Depends on the amount of data and expected use. If it's transient food for
the next MR jobs: HDFS

Reply via email to