Re: About Hadoop optimizations

2009-05-07 Thread Tom White
On Thu, May 7, 2009 at 6:05 AM, Foss User wrote: > Thanks for your response again. I could not understand a few things in > your reply. So, I want to clarify them. Please find my questions > inline. > > On Thu, May 7, 2009 at 2:28 AM, Todd Lipcon wrote: >> On Wed, May 6, 2009 at 1:46 PM, Foss Use

Re: About Hadoop optimizations

2009-05-06 Thread Foss User
Thanks for your response again. I could not understand a few things in your reply. So, I want to clarify them. Please find my questions inline. On Thu, May 7, 2009 at 2:28 AM, Todd Lipcon wrote: > On Wed, May 6, 2009 at 1:46 PM, Foss User wrote: >> 2. Is the meta data for file blocks on data nod

Re: About Hadoop optimizations

2009-05-06 Thread Todd Lipcon
On Wed, May 6, 2009 at 1:46 PM, Foss User wrote: > Thanks for your response. I got a few more questions regarding > optimizations. > > 1. Does hadoop clients locally cache the data it last requested? > I don't know the DFS read path very well, but I don't believe there is any built in cache here

Re: About Hadoop optimizations

2009-05-06 Thread Foss User
Thanks for your response. I got a few more questions regarding optimizations. 1. Does hadoop clients locally cache the data it last requested? 2. Is the meta data for file blocks on data node kept in the underlying OS's file system on namenode or is it kept in RAM of the name node? 3. If no mapp

Re: About Hadoop optimizations

2009-05-06 Thread Todd Lipcon
On Wed, May 6, 2009 at 12:22 PM, Foss User wrote: > 1. Do the reducers of a job start only after all mappers have finished? > The reducer tasks start so they can begin copying map output, but your actual reduce function does not. This is because it doesn't know that the data for any given key h

About Hadoop optimizations

2009-05-06 Thread Foss User
1. Do the reducers of a job start only after all mappers have finished? 2. Say there are 10 slave nodes. Let us say one of the nodes is very slow as compared to other nodes. So, while the mappers in the other 9 have finished in 2 minutes, the one on the slow one might take 20 minutes. Is Hadoop i