How the number of mapper tasks is calculated

2011-07-25 Thread Anfernee Xu
I have a generic question about how the number of mapper tasks is calculated, as far as I know, the number is primarily based on the number of splits, say if I have 5 splits and I have 10 tasktracker running in the cluster, I will have 5 mapper tasks running in my MR job, right? But what I found i

Re: long running reduce task was killed due to failed to report status for 602 seconds

2011-01-27 Thread Anfernee Xu
a little testing suite to actually > see how much time is required by your algorithm. I discovered that the > mine was taking 18 minutes! > Actually, I guess that your problem lies in your comment "massive work". > > Ivan > > 2011/1/27 Anfernee Xu : > > Thi

long running reduce task was killed due to failed to report status for 602 seconds

2011-01-27 Thread Anfernee Xu
This question has been asked before, but I tried suggested solutions such as call Context.setStatus() or progress(), neither them helped. Please advise. My reduce task is doing some CPU extensive work in reduce task, below is my code snippet @Override protected void reduce(Text inpput, Iterable

Re: how to implement post-mapper processing

2010-08-25 Thread Anfernee Xu
Yes, it works if the node only has a single split, if it has multiple, that's still a problem since not all data has been processed. On Wed, Aug 25, 2010 at 11:08 PM, David Rosenstrauch wrote: > On 08/25/2010 10:36 AM, Anfernee Xu wrote: > >> Thanks all for your help. >&

Re: how to implement post-mapper processing

2010-08-25 Thread Anfernee Xu
. Anfernee On Wed, Aug 25, 2010 at 10:18 PM, David Rosenstrauch wrote: > On 08/25/2010 09:07 AM, Anfernee Xu wrote: > >> I'm new to Hadoop and I want to use it for my data processing. My >> understanding is that each Split will be processed by a mapper task, so >> for >&

Re: how to implement post-mapper processing

2010-08-25 Thread Anfernee Xu
Hi Ken, Thanks for your reply, but I do not think chainmapper is helperful for me, since the next mapper will process the current record(key/value pair) produced by current mapper in the chain, but I want the next mapper shoud run only after all records and all splits are processed. Anfernee On

how to implement post-mapper processing

2010-08-25 Thread Anfernee Xu
I'm new to Hadoop and I want to use it for my data processing. My understanding is that each Split will be processed by a mapper task, so for my application I have mapper in which I populate backend data store with data from splits, after all splits are consumed, I want to run a piece of code to po