partition as block?

2013-04-30 Thread Jay Vyas
Hi guys: Im wondering - if I'm running mapreduce jobs on a cluster with large block sizes - can i increase performance with either: 1) A custom FileInputFormat 2) A custom partitioner 3) -DnumReducers Clearly, (3) will be an issue due to the fact that it might overload tasks and network

Re: partition as block?

2013-04-30 Thread Jay Vyas
Well, to be more clear, I'm wondering how hadoop-mapreduce can be optimized in a block-less filesystem... And am thinking about application tier ways to simulate blocks - i.e. by making the granularity of partitions smaller. Wondering, if there is a way to hack an increased numbers of partitions

Re: partition as block?

2013-04-30 Thread Jay Vyas
Yes it is a problem at the first stage. What I'm wondering, though, is wether the intermediate results - which happen after the mapper phase - can be optimized. On Tue, Apr 30, 2013 at 3:38 PM, Mohammad Tariq donta...@gmail.com wrote: Hmmm. I was actually thinking about the very first step.

Re: partition as block?

2013-04-30 Thread Mohammad Tariq
Increasing the size can help us to an extent, but increasing it further might cause problems during copy and shuffle. If the partitions are too big to be held in the memory, we'll end up with *disk based shuffle* which is gonna be slower than *RAM based shuffle,* thus delaying the entire reduce

Re: partition as block?

2013-04-30 Thread Jay Vyas
What do you mean increasing the size? Im talking more about increasing the number of partitions... Which actually decreases individual file size. On Apr 30, 2013, at 4:09 PM, Mohammad Tariq donta...@gmail.com wrote: Increasing the size can help us to an extent, but increasing it further might