Pushing this logic into the storefunc would force an MR boundary before the store (unless the StoreFunc passed, I suppose) which can make things overly complex.
I think for the purposes of bulk-loading into HBase, a better approach might be to use the native map-reduce functionality and feed results you want to store into a map-reduce job created as per http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/mapreduce/package-summary.html(the bulk loading section). D On Mon, Jan 24, 2011 at 11:51 AM, Dmitriy Lyubimov <dlie...@gmail.com>wrote: > Better yet, it would've seem to be logical if partitioning and advise on > partition #s is somehow tailored to a storefunc . It would stand to reason > that for as long as we are not storing to hdfs, store func is in the best > position to determine optimal save parameters such as order, partitioning > and parallelism. > > On Mon, Jan 24, 2011 at 11:47 AM, Dmitriy Lyubimov <dlie...@gmail.com > >wrote: > > > Hi, > > > > so it seems to be more efficient if storing to hbase partitions by > regions > > and orders by hbase keys. > > > > I see that pig 0.8 (pig-282) added custom partitioner in a group but i am > > not sure if order is enforced there. > > > > Is there a way to run single MR that orders and partitions data as per > > above and uses an explicitly specifed store func in reducers? > > > > Thank you. > > >