I am a newbie to Hadoop and trying to understand how to Size a Hadoop
cluster.
What are factors I should consider deciding the number of datanodes ?
Datanode configuration ? CPU, Memory
Amount of memory required for namenode ?
My client is looking at 1 PB of usable data and will be
Hi,
Does streaming jar create 1 reducer by default ? We have reduce tasks per
task tracker configured to be more than 1 but my job has about 150 mappers
and only 1 reducer:
reducer.py basically just reads the line and prints it.
Why doesn't streaming.jar invokes multiple reducers for this case ?