Re: Restricting number of records from map output

2011-01-14 Thread Hari Sreekumar
Ideally, mappers should be independent of other mappers. Still, you can use counters and start skipping records when countersome value to achieve similar behavior. It will not be very reliable if you want very exact results though. On Thu, Jan 13, 2011 at 12:43 AM, Anthony Urso

Re: can Hadoop writes to more than one table the same reduce?

2011-01-14 Thread Hari Sreekumar
I have never tried it, but just for my info... which outputformat do you use to write into one table? You could try using a ConnectionPool, innitiate conection to all 3 tables in setup() method and write to the tables in the reduce phase. Have you tried it? hope that helps, Hari On Thu, Jan 13,

Re: Restricting number of records from map output

2011-01-14 Thread Alex Kozlov
Hi Rakesh, What do you mean by the top N? The first ones or you need to sort them in memory? You can always output records in the cleanup() method at the end of the mapper run. On Fri, Jan 14, 2011 at 7:05 AM, Hari Sreekumar hsreeku...@clickable.comwrote: Ideally, mappers should be

Re: Restricting number of records from map output

2011-01-14 Thread Niels Basjes
Hi, I have a sort job consisting of only the Mapper (no Reducer) task. I want my results to contain only the top n records. Is there any way of restricting the number of records that are emitted by the Mappers? Basically I am looking to see if there is an equivalent of achieving the

Re: can Hadoop writes to more than one table the same reduce?

2011-01-14 Thread Jander g
Hi, Hari Thank you very much. I use DBOutputFormat to write mysql and write into one table successfully. Well, I will try the ConnectionPoll. Thanks again, Jander On Fri, Jan 14, 2011 at 11:32 PM, Hari Sreekumar hsreeku...@clickable.comwrote: I have never tried it, but just for my info...

Re: Import data from mysql

2011-01-14 Thread Brian McSweeney
Hi Mark, what a very interesting email ! And it sounds like you are writing a very interesting and timely book. I'm glad you enjoyed the thread. I did too :-) I would love to help you all I can with your book and would be fascinated to read the chapter you're writing that is related to my

Re: new mapreduce API and NLineInputFormat

2011-01-14 Thread Attila Csordas
Hi, what other jars should be added to the build path from 0.21.0 besides hadoop-common-0.21.0.jar in order to make 0.21.0 NLineInputFormat work in 0.20.2 as suggested below? Generally can somebody provide me a working example code? Thanks, Attila On Wed, Nov 10, 2010 at 5:06 AM, Harsh J

Re: new mapreduce API and NLineInputFormat

2011-01-14 Thread Edward Capriolo
On Fri, Jan 14, 2011 at 5:05 PM, Attila Csordas attilacsor...@gmail.com wrote: Hi, what other jars should be added to the build path from 0.21.0 besides hadoop-common-0.21.0.jar in order to make 0.21.0 NLineInputFormat work in 0.20.2 as suggested below? Generally can somebody provide me a

Question about Hadoop Default FCFS Job Scheduler

2011-01-14 Thread He Chen
Hey all Why does the FCFS scheduler only let a node chooses one task at a time in one job? In order to increase the data locality, it is reasonable to let a node to choose all its local tasks (if it can) from a job at a time. Any reply will be appreciated. Thanks Chen