My patch to add support for Increment to TableOutputFormat follows. (I did
the svn diff in trunk/src/main/java/org/apache/hadoop/hbase)
One point I was unsure about was whether I should duplicate the TimeRange in
the Increment's copy constructor. TimeRange is immutable except for its
I think you could store deltas and roll them up later. You would have
to store them under a qualifier that's unique for each job so that
failures and speculative execution (if enabled) only overwrites
instead of incrementing something. At read time you would need to sum
up those columns together.
I tried to use TableMapper and TableOutputFormat in
from org.apache.hadoop.hbase.mapreduce to write a map-reduce which
incremented some columns. I noticed that TableOutputFormat.write() doesn't
support Increment, only Put and Delete.
Is there a reason that TableOutputFormat shouldn't support
+1
On Jun 17, 2011 4:43 PM, Leif Wickland leifwickl...@gmail.com wrote:
I tried to use TableMapper and TableOutputFormat in
from org.apache.hadoop.hbase.mapreduce to write a map-reduce which
incremented some columns. I noticed that TableOutputFormat.write() doesn't
support Increment, only Put
Go for it!
St.Ack
On Fri, Jun 17, 2011 at 1:43 PM, Leif Wickland leifwickl...@gmail.com wrote:
I tried to use TableMapper and TableOutputFormat in
from org.apache.hadoop.hbase.mapreduce to write a map-reduce which
incremented some columns. I noticed that TableOutputFormat.write() doesn't
Watch out - increment is not idempotent, so you will have to somehow
ensure that a map runs exactly 1x and never more or less than that.
Also job failures will ruin the data as well.
-ryan
On Fri, Jun 17, 2011 at 1:57 PM, Stack st...@duboce.net wrote:
Go for it!
St.Ack
On Fri, Jun 17, 2011
Interesting (and mildly terrifying) point, Ryan.
Is there a valid pattern for storing a sum in HBase then using mapreduce to
calculate an update to that sum based on incremental data updates?
It seems a cycle like the following would avoid double increment problems,
but would suffer from a