Thanks Serega,

Our usecase details:
We have a location table which will be stored in HBase with locationID as
the rowkey / Joinkey.
We intend to join this table with a transactional WebLog file in HDFS
(Expected size can be around 2TB).
Joining query will be passed from Pig.
Can we expect a performance improvement when compared with mapreduce
appoach?.

Regards,
Krishna

On Sat, Sep 27, 2014 at 9:13 PM, Serega Sheypak <serega.shey...@gmail.com>
wrote:

> Depends on the datasets size and HBase workload. The best way is to do join
> in pig, store it and then use HBase bulk load tool.
> It's general recommendation. I have no idea about your task details
>
> 2014-09-27 7:32 GMT+04:00 Krishna Kalyan <krishnakaly...@gmail.com>:
>
> > Hi,
> > We have a use case that involves ETL on data coming from several
> different
> > sources using pig.
> > We plan to store the final output table in HBase.
> > What will be the performance impact if we do a join with an external CSV
> > table using pig?.
> >
> > Regards,
> > Krishna
> >
>

Reply via email to