Thanks Serega, Our usecase details: We have a location table which will be stored in HBase with locationID as the rowkey / Joinkey. We intend to join this table with a transactional WebLog file in HDFS (Expected size can be around 2TB). Joining query will be passed from Pig. Can we expect a performance improvement when compared with mapreduce appoach?.
Regards, Krishna On Sat, Sep 27, 2014 at 9:13 PM, Serega Sheypak <serega.shey...@gmail.com> wrote: > Depends on the datasets size and HBase workload. The best way is to do join > in pig, store it and then use HBase bulk load tool. > It's general recommendation. I have no idea about your task details > > 2014-09-27 7:32 GMT+04:00 Krishna Kalyan <krishnakaly...@gmail.com>: > > > Hi, > > We have a use case that involves ETL on data coming from several > different > > sources using pig. > > We plan to store the final output table in HBase. > > What will be the performance impact if we do a join with an external CSV > > table using pig?. > > > > Regards, > > Krishna > > >