On Thu, May 10, 2012 at 11:59 AM, Michael Segel <michael_se...@hotmail.com> wrote: > Sigh. > > Dave, > I really think you need to think more about the problem. > > Think about what a reduce does and then think about what happens in side of > HBase. > > Then think about which runs faster... a job with two mappers writing the > intermediate and final results in HBase, > or a M/R job that writes its output to HBase. > > If you really truly think about the problem, you will start to understand why > I say you really don't want to use a reducer when you're working w HBase. >
We have a bit of doc that usually you might want to forego reduce phase, http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#sink. Do we need to add to it? That said, you can't make an hard and fast rule that the reduce is to be avoided absolutely. There will be cases where it makes sense (MR sort orthogonal to HBase's or a fat aggregating reduce, etc.) St.Ack P.S. Hey Michael. Go easy on the 'sighs'. The participants in this thread have a clue. I can testify to that. Also, I know you don't mean it, but on occasion, both in this thread and in others I've seen you on, your tone can come across as condescending (and there is nothing like condescension for raising the rankles). We all have our style's but you might want to review with this in mind before you hit send the next time. Just a suggestion.