Re: issue with map running time

2012-07-09 Thread Manoj Babu
Thanks Karthik. But how we can overcome that? do we need to user different file format? Also am using the below code to merge all files into single file. Is it a proper way to do it? FileStatus[] inputFiles = local.listStatus(inputDir); FSDataOutputStream out = hdfs.create(hdfsFile);

Re: How to change name node storage directory?

2012-07-09 Thread Manoj Babu
Hi Harsh, What permission do we need to provide for dfs.name.dir folder? and the remaining internal folder structures will it be created auto or do we need to create manually? Also How to clean data node? Thanks in Advance! Cheers! Manoj. On Tue, Jul 10, 2012 at 11:58 AM, Harsh J wrote:

Re: How to change name node storage directory?

2012-07-09 Thread Harsh J
Manoj, If you change your dfs.name.dir (Which is the right property for 0.20.x/1.x) or dfs.namenode.name.dir (Which is the right property for 0.23/2.x) completely to a different directory, you will need to move the contents of the original, older name-directory to the new one to preserve data, or

Re: Basic question on how reducer works

2012-07-09 Thread Karthik Kambatla
The partitioner is configurable. The default partitioner, from what I remember, computes the partition as the hashcode modulo number of reducers/partitions. For random input, it is balanced, but some cases can have very skewed key distribution. Also, as you have pointed out, the number of values pe

Re: Basic question on how reducer works

2012-07-09 Thread Grandl Robert
Thanks Arun. So just for my clarification. The map will create partitions according to the number of reducers s.t. each reducer to get almost same number of keys in its partition. However, each key can have different number of values so the "weight" of each partition will depend on that. Also w

Re: Basic question on how reducer works

2012-07-09 Thread Arun C Murthy
On Jul 9, 2012, at 12:55 PM, Grandl Robert wrote: > Thanks a lot guys for answers. > > Still I am not able to find exactly the code for the following things: > > 1. reducer to read from a Map output only its partition. I looked into > ReduceTask#getMapOutput which do the actual read in > Red

Re: Basic question on how reducer works

2012-07-09 Thread Grandl Robert
Thanks a lot guys for answers. Still I am not able to find exactly the code for the following things: 1. reducer to read from a Map output only its partition. I looked into ReduceTask#getMapOutput which do the actual read in ReduceTask#shuffleInMemory, but I don't see where it specify which p

Re: issue with map running time

2012-07-09 Thread Karthik Kambatla
Hi Manoj, It seems like a different issue. Let me understand you case better. Is your input 656 files of 11 MB each? In that case, MapReduce does create 656 map tasks. In general, an input split is the data read from a single file, but limited to the block size (64 MB in your case). As the files

Re: Basic question on how reducer works

2012-07-09 Thread Karthik Kambatla
Hi Manoj, As Harsh said, we would almost always need multiple reducers. As each reduce is potentially executed on a different core (same machine or a different one), in most cases, we would want at least as many reduces as the number of cores for maximum parallelism/performance. Karthik On Mon,

Re: Basic question on how reducer works

2012-07-09 Thread Manoj Babu
Hi Harsh, Thanks for clarifying. I was in thought earlier that Partitioner is picking the reducer. My cluster setup provides options for multiple reducers so i want to know when and in which scenario we have go for multiple reducers? Cheers! Manoj. On Mon, Jul 9, 2012 at 11:27 PM, Harsh J wr

Re: Basic question on how reducer works

2012-07-09 Thread Harsh J
Manoj, Think of it this way, and you shouldn't be confused: A reducer == a partition. For (1) - Partitioners do not 'call' a reduce, just write the data with a proper partition ID. The reducer thats same as the partition ID, picks it up for itself later. This we have already explained earlier. F

Re: issue with map running time

2012-07-09 Thread Manoj Babu
Hi Bobby, I have faced a similar issue, In the job the block size is 64MB and the no of the maps created is 656 and the no of files uploaded to HDFS is 656 and its each file size is 11MB. I assume that if small files exist it will not able to group. Could kindly clarify it? Cheers! Manoj. On

Re: Basic question on how reducer works

2012-07-09 Thread Manoj Babu
Hi, It would be more helpful, If you could more details for the below doubts. 1, How the partitioner knows which reducer needs to be called? 2, When we are using more than one reducers, the output gets separated. Actually for what scenario we have to go for multiple reducers? Cheers! Manoj. O

Re: Basic question on how reducer works

2012-07-09 Thread Arun C Murthy
Robert, On Jul 7, 2012, at 6:37 PM, Grandl Robert wrote: > Hi, > > I have some questions related to basic functionality in Hadoop. > > 1. When a Mapper process the intermediate output data, how it knows how many > partitions to do(how many reducers will be) and how much data to go in each >