There is nothing wrong in writing the output from a reducer to HBase. The question you have to ask yourself is why are you using a reducer in the first place. ;-)
Look, you have a database. Why do you need a reducer? It's a simple question... Right? ;-) Look, I apologize for being cryptic. This is one of those philosophical design questions where you the developer/architect have to figure out the answer for yourself. Maybe I should submit this as an HBaseconn topic for a presentation? Sort of like how to do an efficient table join in HBase.... HTH Sent from a remote device. Please excuse any typos... Mike Segel On Feb 28, 2012, at 11:16 PM, Jacques <whs...@gmail.com> wrote: > I see nothing wrong with using the output of the reducer into hbase. You > just need to make sure duplicated operations wouldn't cause problems. If > using tableoutputformat, don't use random seeded keys. If working straight > against htable, don't use increment. We do this for some situations and > either don't care about overwrites or use checkAndPut with a skip option in > the application code. > On Feb 28, 2012 9:40 AM, "Ben Snively" <bsniv...@gmail.com> wrote: > >> Is there an assertion that you would never need to run a reducer when >> writing to the DB? >> >> It seems that there are cases when you would not need one, but the general >> statement doesn't apply to all use cases. >> >> If you were trying to process data where you may have two a map task (or >> set of map tasks) output the same key, you could have a case where you >> need to reduce the data for that key prior to insert the result into hbase. >> >> Am I missing something, but to me, that would be the deciding factor. If >> the key/values output in the map task are the exact values that need to be >> inserted into HBase versus multiple values aggregated together and the >> results put into the hbase entry? >> >> Thanks, >> Ben >> >> >> On Tue, Feb 28, 2012 at 11:20 AM, Michael Segel >> <michael_se...@hotmail.com>wrote: >> >>> The better question is why would you need a reducer? >>> >>> That's a bit cryptic, I understand, but you have to ask yourself when do >>> you need to use a reducer when you are writing to a database... ;-) >>> >>> >>> Sent from my iPhone >>> >>> On Feb 28, 2012, at 10:14 AM, "T Vinod Gupta" <tvi...@readypulse.com> >>> wrote: >>> >>>> Mike, >>>> I didn't understand - why would I not need reducer in hbase m/r? there >>> can >>>> be cases right. >>>> My use case is very similar to Sujee's blog on frequency counting - >>>> http://sujee.net/tech/articles/hadoop/hbase-map-reduce-freq-counter/ >>>> So in the reducer, I can do all the aggregations. Is there a better >> way? >>> I >>>> can think of another way - to use increments in the map job itself. i >>> have >>>> to figure out if thats possible though. >>>> >>>> thanks >>>> >>>> On Tue, Feb 28, 2012 at 7:44 AM, Michel Segel < >> michael_se...@hotmail.com >>>> wrote: >>>> >>>>> Yes you can do it. >>>>> But why do you have a reducer when running a m/r job against HBase? >>>>> >>>>> The trick in writing multiple rows... You do it independently of the >>>>> output from the map() method. >>>>> >>>>> >>>>> Sent from a remote device. Please excuse any typos... >>>>> >>>>> Mike Segel >>>>> >>>>> On Feb 28, 2012, at 8:34 AM, T Vinod Gupta <tvi...@readypulse.com> >>> wrote: >>>>> >>>>>> while doing map reduce on hbase tables, is it possible to do multiple >>>>> puts >>>>>> in the reducer? what i want is a way to be able to write multiple >> rows. >>>>> if >>>>>> its not possible, then what are the other alternatives? i mean like >>>>>> creating a wider table in that case. >>>>>> >>>>>> thanks >>>>> >>> >>