he cluster used for this was 1 master and one slave, both of a r3.8xlarge EC2 instance on AWS.
2016-08-25 19:26 GMT-03:00 José Luis Larroque <larroques...@gmail.com>: > I'm doing BFS search through the Wikipedia (spanish edition) site. I > converted the [dump][1] into a file that could be read with Giraph. > > Using 1 worker, a file of 1 GB took 492 seconds. I executed Giraph with > this command: > > /home/hadoop/bin/yarn jar /home/hadoop/giraph/giraph.jar > ar.edu.info.unlp.tesina.lectura.grafo.BusquedaDeCaminosNavegacionalesWikiquote > -vif ar.edu.info.unlp.tesina.vertice.estructuras. > IdTextWithComplexValueInputFormat -vip /user/hduser/input/grafo-wikipedia.txt > -vof ar.edu.info.unlp.tesina.vertice.estructuras. > IdTextWithComplexValueOutputFormat -op > /user/hduser/output/caminosNavegacionales > -w 1 -yh 120000 -ca giraph.metrics.enable=true, > giraph.useOutOfCoreMessages=true > > Container logs: > > 16/08/24 21:17:02 INFO master.BspServiceMaster: > generateVertexInputSplits: Got 8 input splits for 1 input threads > 16/08/24 21:17:02 INFO master.BspServiceMaster: > createVertexInputSplits: Starting to write input split data to zookeeper > with 1 threads > 16/08/24 21:17:02 INFO master.BspServiceMaster: > createVertexInputSplits: Done writing input split data to zookeeper > 16/08/24 21:17:02 INFO yarn.GiraphYarnTask: [STATUS: task-0] > MASTER_ZOOKEEPER_ONLY checkWorkers: Done - Found 1 responses of 1 needed to > start superstep -1 > 16/08/24 21:17:02 INFO netty.NettyClient: Using Netty without > authentication. > 16/08/24 21:17:02 INFO netty.NettyClient: connectAllAddresses: > Successfully added 1 connections, (1 total connected) 0 failed, 0 failures > total. > 16/08/24 21:17:02 INFO partition.PartitionUtils: > computePartitionCount: Creating 1, default would have been 1 partitions. > ... > 16/08/24 21:25:40 INFO netty.NettyClient: stop: Halting netty client > 16/08/24 21:25:40 INFO netty.NettyClient: stop: reached wait > threshold, 1 connections closed, releasing resources now. > 16/08/24 21:25:43 INFO netty.NettyClient: stop: Netty client halted > 16/08/24 21:25:43 INFO netty.NettyServer: stop: Halting netty server > 16/08/24 21:25:43 INFO netty.NettyServer: stop: Start releasing > resources > 16/08/24 21:25:44 INFO bsp.BspService: process: > cleanedUpChildrenChanged signaled > 16/08/24 21:25:47 INFO netty.NettyServer: stop: Netty server halted > 16/08/24 21:25:47 INFO bsp.BspService: process: > masterElectionChildrenChanged signaled > 16/08/24 21:25:47 INFO master.MasterThread: setup: Took 0.898 seconds. > 16/08/24 21:25:47 INFO master.MasterThread: input superstep: Took > 452.531 seconds. > 16/08/24 21:25:47 INFO master.MasterThread: superstep 0: Took 64.376 > seconds. > 16/08/24 21:25:47 INFO master.MasterThread: superstep 1: Took 1.591 > seconds. > 16/08/24 21:25:47 INFO master.MasterThread: shutdown: Took 6.609 > seconds. > 16/08/24 21:25:47 INFO master.MasterThread: total: Took 526.006 > seconds. > > As you guys can see, the first line tell us that input superstep is > executing with only **one** thread. And took 492 second in finish Input > Superstep. > > I did another test, using giraph.numInputThreads=8, tryng to do the input > superstep with 8 threads: > > /home/hadoop/bin/yarn jar /home/hadoop/giraph/giraph.jar > ar.edu.info.unlp.tesina.lectura.grafo.BusquedaDeCaminosNavegacionalesWikiquote > -vif ar.edu.info.unlp.tesina.vertice.estructuras. > IdTextWithComplexValueInputFormat -vip /user/hduser/input/grafo-wikipedia.txt > -vof ar.edu.info.unlp.tesina.vertice.estructuras. > IdTextWithComplexValueOutputFormat -op > /user/hduser/output/caminosNavegacionales > -w 1 -yh 120000 -ca giraph.metrics.enable=true, > giraph.useOutOfCoreMessages=true,giraph.numInputThreads=8 > > The result was the following one: > > 16/08/24 21:54:00 INFO master.BspServiceMaster: > generateVertexInputSplits: Got 8 input splits for 8 input threads > 16/08/24 21:54:00 INFO master.BspServiceMaster: > createVertexInputSplits: Starting to write input split data to zookeeper > with 1 threads > 16/08/24 21:54:00 INFO master.BspServiceMaster: > createVertexInputSplits: Done writing input split data to zookeeper > ... > > 16/08/24 22:10:07 INFO master.MasterThread: setup: Took 0.093 seconds. > 16/08/24 22:10:07 INFO master.MasterThread: input superstep: Took > 891.339 seconds. > 16/08/24 22:10:07 INFO master.MasterThread: superstep 0: Took 66.635 > seconds. > 16/08/24 22:10:07 INFO master.MasterThread: superstep 1: Took 1.837 > seconds. > 16/08/24 22:10:07 INFO master.MasterThread: shutdown: Took 6.605 > seconds. > 16/08/24 22:10:07 INFO master.MasterThread: total: Took 966.512 > seconds. > > > So, my question is, how can be possible that Giraph is using 492 seconds > without input threads and 891 seconds with them? Should be exacly the > opposite, right? > > > [1]: https://dumps.wikimedia.org/eswiki/20160601/ "dump" >