Basic question on how reducer works

2012-07-07 Thread Grandl Robert
Hi, I have some questions related to basic functionality in Hadoop.  1. When a Mapper process the intermediate output data, how it knows how many partitions to do(how many reducers will be) and how much data to go in each  partition for each reducer ? 2. A JobTracker when assigns a task to a r

Re: Basic question on how reducer works

2012-07-07 Thread Harsh J
Hi Robert, Inline. (Answer is specific to Hadoop 1.x since you asked for that alone, but certain things may vary for Hadoop 2.x). On Sun, Jul 8, 2012 at 7:07 AM, Grandl Robert wrote: > Hi, > > I have some questions related to basic functionality in Hadoop. > > 1. When a Mapper process the interm

Re: Basic question on how reducer works

2012-07-08 Thread Harsh J
called and which not. Even more in ReduceTask.java. > > Do you have any ideas ? > > Thanks a lot for your answer, > Robert > > > From: Harsh J > To: mapreduce-user@hadoop.apache.org; Grandl Robert > Sent: Sunday, July 8, 2012 1:34 AM &g

Re: Basic question on how reducer works

2012-07-08 Thread Grandl Robert
I see. I was looking into tasktracker log :). Thanks a lot, Robert From: Harsh J To: Grandl Robert ; mapreduce-user Sent: Sunday, July 8, 2012 9:16 PM Subject: Re: Basic question on how reducer works The changes should appear in your Task's userlogs

Re: Basic question on how reducer works

2012-07-08 Thread Pavan Kulkarni
e.org> > *Sent:* Sunday, July 8, 2012 9:16 PM > > *Subject:* Re: Basic question on how reducer works > > The changes should appear in your Task's userlogs (not the TaskTracker > logs). Have you deployed your changed code properly (i.e. do you > generate a new tarball, or per

Re: Basic question on how reducer works

2012-07-08 Thread Harsh J
t;> >> I see. I was looking into tasktracker log :). >> >> Thanks a lot, >> Robert >> >> >> From: Harsh J >> To: Grandl Robert ; mapreduce-user >> >> Sent: Sunday, July 8, 2012 9:16 PM >> >> Subject: R

Re: Basic question on how reducer works

2012-07-08 Thread Pavan Kulkarni
t;> > >> ________________ > >> From: Harsh J > >> To: Grandl Robert ; mapreduce-user > >> > >> Sent: Sunday, July 8, 2012 9:16 PM > >> > >> Subject: Re: Basic question on how reducer works > >> > >> The chan

Re: Basic question on how reducer works

2012-07-09 Thread Arun C Murthy
Robert, On Jul 7, 2012, at 6:37 PM, Grandl Robert wrote: > Hi, > > I have some questions related to basic functionality in Hadoop. > > 1. When a Mapper process the intermediate output data, how it knows how many > partitions to do(how many reducers will be) and how much data to go in each >

Re: Basic question on how reducer works

2012-07-09 Thread Manoj Babu
Hi, It would be more helpful, If you could more details for the below doubts. 1, How the partitioner knows which reducer needs to be called? 2, When we are using more than one reducers, the output gets separated. Actually for what scenario we have to go for multiple reducers? Cheers! Manoj. O

Re: Basic question on how reducer works

2012-07-09 Thread Harsh J
Manoj, Think of it this way, and you shouldn't be confused: A reducer == a partition. For (1) - Partitioners do not 'call' a reduce, just write the data with a proper partition ID. The reducer thats same as the partition ID, picks it up for itself later. This we have already explained earlier. F

Re: Basic question on how reducer works

2012-07-09 Thread Manoj Babu
Hi Harsh, Thanks for clarifying. I was in thought earlier that Partitioner is picking the reducer. My cluster setup provides options for multiple reducers so i want to know when and in which scenario we have go for multiple reducers? Cheers! Manoj. On Mon, Jul 9, 2012 at 11:27 PM, Harsh J wr

Re: Basic question on how reducer works

2012-07-09 Thread Karthik Kambatla
Hi Manoj, As Harsh said, we would almost always need multiple reducers. As each reduce is potentially executed on a different core (same machine or a different one), in most cases, we would want at least as many reduces as the number of cores for maximum parallelism/performance. Karthik On Mon,

Re: Basic question on how reducer works

2012-07-09 Thread Grandl Robert
e a bit on how the data is written to which partition ? Thanks, Robert From: Arun C Murthy To: mapreduce-user@hadoop.apache.org Sent: Monday, July 9, 2012 9:24 AM Subject: Re: Basic question on how reducer works Robert, On Jul 7, 2012, at 6:37 PM, Grandl Ro

Re: Basic question on how reducer works

2012-07-09 Thread Arun C Murthy
27; and the actual 'key' in the map-output of as the 'secondary key'. hth, Arun > Thanks, > Robert > > From: Arun C Murthy > To: mapreduce-user@hadoop.apache.org > Sent: Monday, July 9, 2012 9:24 AM > Subject: Re: Basic question on how reducer works

Re: Basic question on how reducer works

2012-07-09 Thread Grandl Robert
' and the actual 'key' in the map-output of as the 'secondary key'. hth, Arun Thanks, >Robert > > > > > From: Arun C Murthy >To: mapreduce-user@hadoop.apache.org >Sent: Monday, July 9, 2012 9:24 AM >Sub

Re: Basic question on how reducer works

2012-07-09 Thread Karthik Kambatla
computed to find the corresponding partition ? > > Robert > > -- > *From:* Arun C Murthy > *To:* mapreduce-user@hadoop.apache.org > *Sent:* Monday, July 9, 2012 4:33 PM > > *Subject:* Re: Basic question on how reducer works > > > On

Re: Basic question on how reducer works

2012-07-10 Thread Subir S
alue> is added into a partition a hash on the partition ID will be >> computed to find the corresponding partition ? >> >> Robert >> >> ------ >> *From:* Arun C Murthy >> *To:* mapreduce-user@hadoop.apache.org >> *Sent:* Mo

Re: Basic question on how reducer works

2012-07-13 Thread Subir S
e >>> computed to find the corresponding partition ? >>> >>> Robert >>> >>> -- >>> *From:* Arun C Murthy >>> *To:* mapreduce-user@hadoop.apache.org >>> *Sent:* Monday, July 9, 2012 4:33 PM >>

Re: Basic question on how reducer works

2012-07-13 Thread Harsh J
uffer and >>> spill >>> after buffer is full. Could you please elaborate a bit on how the data is >>> written to which partition ? >>> >>> >>> Essentially you can think of the partition-id as the 'primary key' and >>> the >&g

Re: Basic question on how reducer works

2012-07-14 Thread Subir S
t;> >>> value> is added into a partition a hash on the partition ID will be >>>> computed to find the corresponding partition ? >>>> >>>> Robert >>>> >>>> -- >>>> *From:* Arun C Murth

Re: Basic question on how reducer works

2012-07-14 Thread Harsh J
Subir, On Sat, Jul 14, 2012 at 5:30 PM, Subir S wrote: > Harsh, Thanks I think this is what I was looking for. I have 3 related > questions. > > 1.) Will this work in 0.20.2-cdh3u3 Yes, will work. (Btw, best to ask CDH-specific questions on the cdh-u...@cloudera.org lists) > 2.) What is the har

Re: Basic question on how reducer works

2012-07-16 Thread Subir S
Just for reference of others who might see this thread. Jira corresponding to parameter on reduce input limit is MAPREDUCE-2324 On 7/14/12, Harsh J wrote: > Subir, > > On Sat, Jul 14, 2012 at 5:30 PM, Subir S wrote: >> Harsh, Thanks I think this is what I was looking for. I have 3 related >> que