Hi I am using Hadoop 0.20.203.
I have performed simple vertical scalability experiments of Hadoop with the use of Graph datasets and BFS algorithm. My experiment configuration is 20workers + Master. In each test I divided the Map slots and Reduce Slots equally (M==R), I can process the Job(single BFS step) in a single wave. For the experiments I have used the wall clock time metric. The total size of the dataset is around 311mb. Below are my recorded times (I'll ask the question after presenting them). Per worker : 1CPU (1Slot) 2CPU(2Slots) 3CPU 4CPU 5CPU 6CPU 7CPU // 1 cpu is always left for deamons Time : 429s 434s 417s 412s 429s 430s 470s // presented times are averages from 10 algorithm executions As it can be seen initial trend (1->2 cpu) is actually increasing the execution time. However another trend occurs for the (2->3 cpu), which lasts till 4cpu (time decrease). After that the execution time increase all the way till 7cpus. Can some one explain to me why Hadoop scales like that, I can understand that for the large number of CPUs, tasks compete over BUS, local disk I/O. But I can not explain (do not understand) the change in trends. Thanks in advance. regards blah