Re: Severely hit by "curse of last reducer"

2011-11-16 Thread Ayon Sinha
Only one reducer is always stuck. My table2 is small but using a Mapjoin makes my mappers run out of memory. My max reducers is 32 (also max reduce capacity). I tried setting num reducers to higher number (even 6000, which is appx. combination of dates & names I have) only to have lots of reduce

Re: Severely hit by "curse of last reducer"

2011-11-16 Thread Mark Grover
Hi Ayon, Is it one particular reduce task that is slow or the entire reduce phase? How many reduce tasks did you have, anyways? Looking into what the reducer key was might only make sense if a particular reduce task was slow. If your table2 is small enough to fit in memory, you might want to tr

Severely hit by "curse of last reducer"

2011-11-16 Thread Ayon Sinha
Hi, Where do I find the log of what reducer key is causing the last reducer to go on for hours? The reducer logs don't say much about the key its processing. Is there a way to enable a debug mode where it would log the key it's processing? My query looks like: select partner_name, dates, sum(co

December 2011 SF Hadoop User Group

2011-11-16 Thread Aaron Kimball
After a month's hiatus for Hadoop World, we're back! The December Hadoop meetup will be held Wednesday, December 14, from 6pm to 8pm. This meetup will be hosted by Splunk at their office on Brannan St. As usual, we will use the discussion-based "unconference" format. At the beginning of the meetup

Profiling Hive / Metrics

2011-11-16 Thread john smith
Hey devs, My Hive reducers are running for too long. I wan't to profile Hive and collect metrics so as to find where most of the time is spent in execution. Can any one tell me where to start ? Are any profilers attached to Hive by default? Any help is appreciated. Thanks, jS