I'm doing BFS search through the Wikipedia (spanish edition) site. I
converted the [dump][1] into a file that could be read with Giraph.

Using 1 worker, a file of 1 GB took 492 seconds. I executed Giraph with
this command:

    /home/hadoop/bin/yarn jar /home/hadoop/giraph/giraph.jar
ar.edu.info.unlp.tesina.lectura.grafo.BusquedaDeCaminosNavegacionalesWikiquote
-vif
ar.edu.info.unlp.tesina.vertice.estructuras.IdTextWithComplexValueInputFormat
-vip /user/hduser/input/grafo-wikipedia.txt -vof
ar.edu.info.unlp.tesina.vertice.estructuras.IdTextWithComplexValueOutputFormat
-op /user/hduser/output/caminosNavegacionales -w 1 -yh 120000 -ca
giraph.metrics.enable=true,giraph.useOutOfCoreMessages=true

Container logs:

    16/08/24 21:17:02 INFO master.BspServiceMaster:
generateVertexInputSplits: Got 8 input splits for 1 input threads
    16/08/24 21:17:02 INFO master.BspServiceMaster:
createVertexInputSplits: Starting to write input split data to zookeeper
with 1 threads
    16/08/24 21:17:02 INFO master.BspServiceMaster:
createVertexInputSplits: Done writing input split data to zookeeper
    16/08/24 21:17:02 INFO yarn.GiraphYarnTask: [STATUS: task-0]
MASTER_ZOOKEEPER_ONLY checkWorkers: Done - Found 1 responses of 1 needed to
start superstep -1
    16/08/24 21:17:02 INFO netty.NettyClient: Using Netty without
authentication.
    16/08/24 21:17:02 INFO netty.NettyClient: connectAllAddresses:
Successfully added 1 connections, (1 total connected) 0 failed, 0 failures
total.
    16/08/24 21:17:02 INFO partition.PartitionUtils: computePartitionCount:
Creating 1, default would have been 1 partitions.
    ...
    16/08/24 21:25:40 INFO netty.NettyClient: stop: Halting netty client
    16/08/24 21:25:40 INFO netty.NettyClient: stop: reached wait threshold,
1 connections closed, releasing resources now.
    16/08/24 21:25:43 INFO netty.NettyClient: stop: Netty client halted
    16/08/24 21:25:43 INFO netty.NettyServer: stop: Halting netty server
    16/08/24 21:25:43 INFO netty.NettyServer: stop: Start releasing
resources
    16/08/24 21:25:44 INFO bsp.BspService: process:
cleanedUpChildrenChanged signaled
    16/08/24 21:25:47 INFO netty.NettyServer: stop: Netty server halted
    16/08/24 21:25:47 INFO bsp.BspService: process:
masterElectionChildrenChanged signaled
    16/08/24 21:25:47 INFO master.MasterThread: setup: Took 0.898 seconds.
    16/08/24 21:25:47 INFO master.MasterThread: input superstep: Took
452.531 seconds.
    16/08/24 21:25:47 INFO master.MasterThread: superstep 0: Took 64.376
seconds.
    16/08/24 21:25:47 INFO master.MasterThread: superstep 1: Took 1.591
seconds.
    16/08/24 21:25:47 INFO master.MasterThread: shutdown: Took 6.609
seconds.
    16/08/24 21:25:47 INFO master.MasterThread: total: Took 526.006 seconds.

As you guys can see, the first line tell us that input superstep is
executing with only **one** thread. And took 492 second in finish Input
Superstep.

I did another test, using giraph.numInputThreads=8, tryng to do the input
superstep with 8 threads:

    /home/hadoop/bin/yarn jar /home/hadoop/giraph/giraph.jar
ar.edu.info.unlp.tesina.lectura.grafo.BusquedaDeCaminosNavegacionalesWikiquote
-vif
ar.edu.info.unlp.tesina.vertice.estructuras.IdTextWithComplexValueInputFormat
-vip /user/hduser/input/grafo-wikipedia.txt -vof
ar.edu.info.unlp.tesina.vertice.estructuras.IdTextWithComplexValueOutputFormat
-op /user/hduser/output/caminosNavegacionales -w 1 -yh 120000 -ca
giraph.metrics.enable=true,giraph.useOutOfCoreMessages=true,giraph.numInputThreads=8

The result was the following one:

        16/08/24 21:54:00 INFO master.BspServiceMaster:
generateVertexInputSplits: Got 8 input splits for 8 input threads
    16/08/24 21:54:00 INFO master.BspServiceMaster:
createVertexInputSplits: Starting to write input split data to zookeeper
with 1 threads
    16/08/24 21:54:00 INFO master.BspServiceMaster:
createVertexInputSplits: Done writing input split data to zookeeper
    ...

    16/08/24 22:10:07 INFO master.MasterThread: setup: Took 0.093 seconds.
    16/08/24 22:10:07 INFO master.MasterThread: input superstep: Took
891.339 seconds.
    16/08/24 22:10:07 INFO master.MasterThread: superstep 0: Took 66.635
seconds.
    16/08/24 22:10:07 INFO master.MasterThread: superstep 1: Took 1.837
seconds.
    16/08/24 22:10:07 INFO master.MasterThread: shutdown: Took 6.605
seconds.
    16/08/24 22:10:07 INFO master.MasterThread: total: Took 966.512 seconds.


So, my question is, how can be possible that Giraph is using 492 seconds
without input threads and 891 seconds with them? Should be exacly the
opposite, right?


  [1]: https://dumps.wikimedia.org/eswiki/20160601/ "dump"

Reply via email to