Hi

I am using Hadoop 0.20.203.

I have performed simple vertical scalability experiments of Hadoop with the
use of Graph datasets and BFS algorithm. My experiment configuration is
20workers + Master. In each test I divided the Map slots and Reduce Slots
equally (M==R), I can process the Job(single BFS step) in a single wave.
For the experiments I have used the wall clock time metric. The total size
of the dataset is around 311mb. Below are my recorded times (I'll ask the
question after presenting them).

Per worker : 1CPU (1Slot)  2CPU(2Slots) 3CPU  4CPU 5CPU  6CPU 7CPU  // 1
cpu is always left for deamons
Time          : 429s              434s              417s   412s   429s
  430s   470s   // presented times are averages from 10 algorithm executions

As it can be seen initial trend (1->2 cpu) is actually increasing the
execution time. However another trend occurs for the (2->3 cpu), which
lasts till 4cpu (time decrease). After that the execution time increase all
the way till 7cpus. Can some one explain to me why Hadoop scales like that,
I can understand that for the large number of CPUs, tasks compete over BUS,
local disk I/O. But I can not explain (do not understand) the change in
trends.

Thanks in advance.
regards
blah

Reply via email to