Re: Pig HBase integration

Serega Sheypak Sun, 28 Sep 2014 05:23:11 -0700

store location to hdfs
store weblog to hdfs
join them
use HBase bulk load tool to load join result to hbase.


What's the reason to keep location dataset in hbase and weblogs in hdfs?

You can expect data load perfomance improvement. For me it takes few
minutes to bulk load 500.000.000 records to 10-nodes hbase with presplitted
table.

2014-09-28 16:04 GMT+04:00 Krishna Kalyan <[email protected]>:

> Thanks Serega,
>
> Our usecase details:
> We have a location table which will be stored in HBase with locationID as
> the rowkey / Joinkey.
> We intend to join this table with a transactional WebLog file in HDFS
> (Expected size can be around 2TB).
> Joining query will be passed from Pig.
> Can we expect a performance improvement when compared with mapreduce
> appoach?.
>
> Regards,
> Krishna
>
> On Sat, Sep 27, 2014 at 9:13 PM, Serega Sheypak <[email protected]>
> wrote:
>
>> Depends on the datasets size and HBase workload. The best way is to do
>> join
>> in pig, store it and then use HBase bulk load tool.
>> It's general recommendation. I have no idea about your task details
>>
>> 2014-09-27 7:32 GMT+04:00 Krishna Kalyan <[email protected]>:
>>
>> > Hi,
>> > We have a use case that involves ETL on data coming from several
>> different
>> > sources using pig.
>> > We plan to store the final output table in HBase.
>> > What will be the performance impact if we do a join with an external CSV
>> > table using pig?.
>> >
>> > Regards,
>> > Krishna
>> >
>>
>
>

Re: Pig HBase integration

Reply via email to