Thanks Karthik. But how we can overcome that? do we need to user different
file format?
Also am using the below code to merge all files into single file. Is it a
proper way to do it?
FileStatus[] inputFiles = local.listStatus(inputDir);
FSDataOutputStream out = hdfs.create(hdfsFile);
Hi Harsh,
What permission do we need to provide for dfs.name.dir folder? and the
remaining internal folder structures will it be created auto or do we need
to create manually?
Also How to clean data node?
Thanks in Advance!
Cheers!
Manoj.
On Tue, Jul 10, 2012 at 11:58 AM, Harsh J wrote:
Manoj,
If you change your dfs.name.dir (Which is the right property for
0.20.x/1.x) or dfs.namenode.name.dir (Which is the right property for
0.23/2.x) completely to a different directory, you will need to move
the contents of the original, older name-directory to the new one to
preserve data, or
The partitioner is configurable. The default partitioner, from what I
remember, computes the partition as the hashcode modulo number of
reducers/partitions. For random input, it is balanced, but some cases can
have very skewed key distribution. Also, as you have pointed out, the
number of values pe
Thanks Arun.
So just for my clarification. The map will create partitions according to the
number of reducers s.t. each reducer to get almost same number of keys in its
partition. However, each key can have different number of values so the
"weight" of each partition will depend on that. Also w
On Jul 9, 2012, at 12:55 PM, Grandl Robert wrote:
> Thanks a lot guys for answers.
>
> Still I am not able to find exactly the code for the following things:
>
> 1. reducer to read from a Map output only its partition. I looked into
> ReduceTask#getMapOutput which do the actual read in
> Red
Thanks a lot guys for answers.
Still I am not able to find exactly the code for the following things:
1. reducer to read from a Map output only its partition. I looked into
ReduceTask#getMapOutput which do the actual read in ReduceTask#shuffleInMemory,
but I don't see where it specify which p
Hi Manoj,
It seems like a different issue.
Let me understand you case better. Is your input 656 files of 11 MB each?
In that case, MapReduce does create 656 map tasks. In general, an input
split is the data read from a single file, but limited to the block size
(64 MB in your case). As the files
Hi Manoj,
As Harsh said, we would almost always need multiple reducers. As each
reduce is potentially executed on a different core (same machine or a
different one), in most cases, we would want at least as many reduces as
the number of cores for maximum parallelism/performance.
Karthik
On Mon,
Hi Harsh,
Thanks for clarifying. I was in thought earlier that Partitioner is picking
the reducer.
My cluster setup provides options for multiple reducers so i want to know
when and in which scenario we have go for multiple reducers?
Cheers!
Manoj.
On Mon, Jul 9, 2012 at 11:27 PM, Harsh J wr
Manoj,
Think of it this way, and you shouldn't be confused: A reducer == a partition.
For (1) - Partitioners do not 'call' a reduce, just write the data
with a proper partition ID. The reducer thats same as the partition
ID, picks it up for itself later. This we have already explained
earlier.
F
Hi Bobby,
I have faced a similar issue, In the job the block size is 64MB and the no
of the maps created is 656 and the no of files uploaded to HDFS is 656 and
its each file size is 11MB. I assume that if small files exist it will not
able to group.
Could kindly clarify it?
Cheers!
Manoj.
On
Hi,
It would be more helpful, If you could more details for the below doubts.
1, How the partitioner knows which reducer needs to be called?
2, When we are using more than one reducers, the output gets separated.
Actually for what scenario we have to go for multiple reducers?
Cheers!
Manoj.
O
Robert,
On Jul 7, 2012, at 6:37 PM, Grandl Robert wrote:
> Hi,
>
> I have some questions related to basic functionality in Hadoop.
>
> 1. When a Mapper process the intermediate output data, how it knows how many
> partitions to do(how many reducers will be) and how much data to go in each
>
14 matches
Mail list logo