Sizing help

2011-10-21 Thread Steve Ed
I am a newbie to Hadoop and trying to understand how to Size a Hadoop cluster. What are factors I should consider deciding the number of datanodes ? Datanode configuration ? CPU, Memory Amount of memory required for namenode ? My client is looking at 1 PB of usable data and will be

Streaming jar creates only 1 reducer

2011-10-21 Thread Mapred Learn
Hi, Does streaming jar create 1 reducer by default ? We have reduce tasks per task tracker configured to be more than 1 but my job has about 150 mappers and only 1 reducer: reducer.py basically just reads the line and prints it. Why doesn't streaming.jar invokes multiple reducers for this case ?