Re: Is there a reason mapreduce.TableOutputFormat doesn't support Increment?

2011-06-21 Thread Leif Wickland
My patch to add support for Increment to TableOutputFormat follows. (I did the svn diff in trunk/src/main/java/org/apache/hadoop/hbase) One point I was unsure about was whether I should duplicate the TimeRange in the Increment's copy constructor. TimeRange is immutable except for its

Re: Is there a reason mapreduce.TableOutputFormat doesn't support Increment?

2011-06-20 Thread Jean-Daniel Cryans
I think you could store deltas and roll them up later. You would have to store them under a qualifier that's unique for each job so that failures and speculative execution (if enabled) only overwrites instead of incrementing something. At read time you would need to sum up those columns together.

Is there a reason mapreduce.TableOutputFormat doesn't support Increment?

2011-06-17 Thread Leif Wickland
I tried to use TableMapper and TableOutputFormat in from org.apache.hadoop.hbase.mapreduce to write a map-reduce which incremented some columns. I noticed that TableOutputFormat.write() doesn't support Increment, only Put and Delete. Is there a reason that TableOutputFormat shouldn't support

Re: Is there a reason mapreduce.TableOutputFormat doesn't support Increment?

2011-06-17 Thread Joey Echeverria
+1 On Jun 17, 2011 4:43 PM, Leif Wickland leifwickl...@gmail.com wrote: I tried to use TableMapper and TableOutputFormat in from org.apache.hadoop.hbase.mapreduce to write a map-reduce which incremented some columns. I noticed that TableOutputFormat.write() doesn't support Increment, only Put

Re: Is there a reason mapreduce.TableOutputFormat doesn't support Increment?

2011-06-17 Thread Stack
Go for it! St.Ack On Fri, Jun 17, 2011 at 1:43 PM, Leif Wickland leifwickl...@gmail.com wrote: I tried to use TableMapper and TableOutputFormat in from org.apache.hadoop.hbase.mapreduce to write a map-reduce which incremented some columns.  I noticed that TableOutputFormat.write() doesn't

Re: Is there a reason mapreduce.TableOutputFormat doesn't support Increment?

2011-06-17 Thread Ryan Rawson
Watch out - increment is not idempotent, so you will have to somehow ensure that a map runs exactly 1x and never more or less than that. Also job failures will ruin the data as well. -ryan On Fri, Jun 17, 2011 at 1:57 PM, Stack st...@duboce.net wrote: Go for it! St.Ack On Fri, Jun 17, 2011

Re: Is there a reason mapreduce.TableOutputFormat doesn't support Increment?

2011-06-17 Thread Leif Wickland
Interesting (and mildly terrifying) point, Ryan. Is there a valid pattern for storing a sum in HBase then using mapreduce to calculate an update to that sum based on incremental data updates? It seems a cycle like the following would avoid double increment problems, but would suffer from a