Interesting (and mildly terrifying) point, Ryan.

Is there a valid pattern for storing a sum in HBase then using mapreduce to
calculate an update to that sum based on incremental data updates?

It seems a cycle like the following would avoid double increment problems,
but would suffer from a monster race condition.

1. Mapreduce updated values into aggregates (written to HDFS).
2. Mapreduce aggregates with existing value in HBase into new target value
for HBase (but written to HDFS).
3. Mapreduce writing new values to HBases.

Please tell me there's a better way.

Thanks,

Leif

On Fri, Jun 17, 2011 at 3:33 PM, Ryan Rawson <ryano...@gmail.com> wrote:

> Watch out - increment is not idempotent, so you will have to somehow
> ensure that a map runs exactly 1x and never more or less than that.
> Also job failures will ruin the data as well.
>
> -ryan
>
> On Fri, Jun 17, 2011 at 1:57 PM, Stack <st...@duboce.net> wrote:
> > Go for it!
> > St.Ack
> >
> > On Fri, Jun 17, 2011 at 1:43 PM, Leif Wickland <leifwickl...@gmail.com>
> wrote:
> >> I tried to use TableMapper and TableOutputFormat in
> >> from org.apache.hadoop.hbase.mapreduce to write a map-reduce which
> >> incremented some columns.  I noticed that TableOutputFormat.write()
> doesn't
> >> support Increment, only Put and Delete.
> >>
> >> Is there a reason that TableOutputFormat shouldn't support increment?
> >>
> >> I think adding support for increment would only require adding a copy
> >> constructor to Increment and a few lines to TableOutputFormat:  I'd be
> >> willing to give writing the patch a try if there's no objection.
> >>
> >> Leif Wickland
> >>
> >
>

Reply via email to