Pushing this logic into the storefunc would force an MR boundary before the
store (unless the StoreFunc passed, I suppose) which can make things overly
complex.

I think for the purposes of bulk-loading into HBase, a better approach might
be to use the native map-reduce functionality and feed results you want to
store into a map-reduce job created as per
http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/mapreduce/package-summary.html(the
bulk loading section).

D

On Mon, Jan 24, 2011 at 11:51 AM, Dmitriy Lyubimov <dlie...@gmail.com>wrote:

> Better yet, it would've seem to be logical if partitioning and advise on
> partition #s is somehow tailored to a storefunc . It would stand to reason
> that for as long as we are not storing to hdfs, store func is in the best
> position to determine optimal save parameters such as order, partitioning
> and parallelism.
>
> On Mon, Jan 24, 2011 at 11:47 AM, Dmitriy Lyubimov <dlie...@gmail.com
> >wrote:
>
> > Hi,
> >
> > so it seems to be more efficient if storing to hbase partitions by
> regions
> > and orders by hbase keys.
> >
> > I see that pig 0.8 (pig-282) added custom partitioner in a group but i am
> > not sure if order is enforced there.
> >
> > Is there a way to run single MR that orders and partitions data as per
> > above and uses an explicitly specifed store func in reducers?
> >
> > Thank you.
> >
>

Reply via email to