Re: giraph.numInputThreads execution time for "input superstep" it's the same using 1 or 8 threads, how this can be possible?

José Luis Larroque Fri, 26 Aug 2016 14:43:22 -0700

Ok, so basically in the same worker, there is no benefit of using
giraph.numInputThread > 2 . I remember seen emails that suggest that
giraph.numInputThreads = giraph.numComputeThreads =
giraph.numOutputThreads, and i was planning use at least 4 compute threads,
but apparently i should use different numbers of threads for input
superstep and compute supersteps.


Thanks Hassan for your patience!

Bye!
Jose

2016-08-26 15:58 GMT-03:00 Hassan Eslami <hsn.esl...@gmail.com>:

> In your case, it seems that the HDFS data node is the same as the worker.
> So again, you will be having multiple threads reading multiple locations of
> HDFS (which means multiple locations on a single disk). If this is the
> case, as I mentioned in the last email, it causes IO interference.
>
> giraph.userPartitionCount is not much relevant in input superstep (unless
> you truly can get benefit from parallelism reading multiple input splits at
> the same time, which reduces the lock contention on each partition while
> reading).
>
> On Fri, Aug 26, 2016 at 6:52 AM, José Luis Larroque <
> larroques...@gmail.com> wrote:
>
>> Hi Hassan, thanks for your answer.
>>
>> I don't know if is clear (because i'm using my own algorithm instead of
>> using one of the Giraph examples) but the input file (
>>  /user/hduser/input/grafo-wikipedia.txt) is loaded in HDFS. i was
>> thinking that with 8 input splits, giraph was partitioning the input file
>> for paralell proccesing, but i wasnt' sure.
>>
>> I will try with only two threads in same worker.
>>
>> Maybe using giraph.userPartitionCount will help? Or those partitions are
>> destinated to computation steps only?
>>
>> Bye!
>> Jose
>>
>>
>>
>> 2016-08-26 3:11 GMT-03:00 Hassan Eslami <hsn.esl...@gmail.com>:
>>
>>> It seems that you are reading the data from a single file stored on a
>>> local machine with multiple threads. Having multiple threads accessing the
>>> disk causes IO interference which in turn reduces the IO performance. If
>>> you are reading from a single file on a local machine with 8 threads, the
>>> results you've got is kind of expected. In such case, you are better off
>>> using single thread in reading from the disk. You can also try to do it
>>> with two threads, so that you may be able to get some overlapping benefit
>>> of reading from disk and deserializing the input.
>>>
>>> On Thu, Aug 25, 2016 at 5:36 PM, José Luis Larroque <
>>> larroques...@gmail.com> wrote:
>>>
>>>> he cluster used for this was 1 master and one slave, both of a
>>>> r3.8xlarge EC2 instance on AWS.
>>>>
>>>> 2016-08-25 19:26 GMT-03:00 José Luis Larroque <larroques...@gmail.com>:
>>>>
>>>>> I'm doing BFS search through the Wikipedia (spanish edition) site. I
>>>>> converted the [dump][1] into a file that could be read with Giraph.
>>>>>
>>>>> Using 1 worker, a file of 1 GB took 492 seconds. I executed Giraph
>>>>> with this command:
>>>>>
>>>>>     /home/hadoop/bin/yarn jar /home/hadoop/giraph/giraph.jar
>>>>> ar.edu.info.unlp.tesina.lectura.grafo.BusquedaDeCaminosNavegacionalesWikiquote
>>>>> -vif 
>>>>> ar.edu.info.unlp.tesina.vertice.estructuras.IdTextWithComplexValueInputFormat
>>>>> -vip /user/hduser/input/grafo-wikipedia.txt -vof
>>>>> ar.edu.info.unlp.tesina.vertice.estructuras.IdTextWithComplexValueOutputFormat
>>>>> -op /user/hduser/output/caminosNavegacionales -w 1 -yh 120000 -ca
>>>>> giraph.metrics.enable=true,giraph.useOutOfCoreMessages=true
>>>>>
>>>>> Container logs:
>>>>>
>>>>>     16/08/24 21:17:02 INFO master.BspServiceMaster:
>>>>> generateVertexInputSplits: Got 8 input splits for 1 input threads
>>>>>     16/08/24 21:17:02 INFO master.BspServiceMaster:
>>>>> createVertexInputSplits: Starting to write input split data to zookeeper
>>>>> with 1 threads
>>>>>     16/08/24 21:17:02 INFO master.BspServiceMaster:
>>>>> createVertexInputSplits: Done writing input split data to zookeeper
>>>>>     16/08/24 21:17:02 INFO yarn.GiraphYarnTask: [STATUS: task-0]
>>>>> MASTER_ZOOKEEPER_ONLY checkWorkers: Done - Found 1 responses of 1 needed 
>>>>> to
>>>>> start superstep -1
>>>>>     16/08/24 21:17:02 INFO netty.NettyClient: Using Netty without
>>>>> authentication.
>>>>>     16/08/24 21:17:02 INFO netty.NettyClient: connectAllAddresses:
>>>>> Successfully added 1 connections, (1 total connected) 0 failed, 0 failures
>>>>> total.
>>>>>     16/08/24 21:17:02 INFO partition.PartitionUtils:
>>>>> computePartitionCount: Creating 1, default would have been 1 partitions.
>>>>>     ...
>>>>>     16/08/24 21:25:40 INFO netty.NettyClient: stop: Halting netty
>>>>> client
>>>>>     16/08/24 21:25:40 INFO netty.NettyClient: stop: reached wait
>>>>> threshold, 1 connections closed, releasing resources now.
>>>>>     16/08/24 21:25:43 INFO netty.NettyClient: stop: Netty client halted
>>>>>     16/08/24 21:25:43 INFO netty.NettyServer: stop: Halting netty
>>>>> server
>>>>>     16/08/24 21:25:43 INFO netty.NettyServer: stop: Start releasing
>>>>> resources
>>>>>     16/08/24 21:25:44 INFO bsp.BspService: process:
>>>>> cleanedUpChildrenChanged signaled
>>>>>     16/08/24 21:25:47 INFO netty.NettyServer: stop: Netty server halted
>>>>>     16/08/24 21:25:47 INFO bsp.BspService: process:
>>>>> masterElectionChildrenChanged signaled
>>>>>     16/08/24 21:25:47 INFO master.MasterThread: setup: Took 0.898
>>>>> seconds.
>>>>>     16/08/24 21:25:47 INFO master.MasterThread: input superstep: Took
>>>>> 452.531 seconds.
>>>>>     16/08/24 21:25:47 INFO master.MasterThread: superstep 0: Took
>>>>> 64.376 seconds.
>>>>>     16/08/24 21:25:47 INFO master.MasterThread: superstep 1: Took
>>>>> 1.591 seconds.
>>>>>     16/08/24 21:25:47 INFO master.MasterThread: shutdown: Took 6.609
>>>>> seconds.
>>>>>     16/08/24 21:25:47 INFO master.MasterThread: total: Took 526.006
>>>>> seconds.
>>>>>
>>>>> As you guys can see, the first line tell us that input superstep is
>>>>> executing with only **one** thread. And took 492 second in finish Input
>>>>> Superstep.
>>>>>
>>>>> I did another test, using giraph.numInputThreads=8, tryng to do the
>>>>> input superstep with 8 threads:
>>>>>
>>>>>     /home/hadoop/bin/yarn jar /home/hadoop/giraph/giraph.jar
>>>>> ar.edu.info.unlp.tesina.lectura.grafo.BusquedaDeCaminosNavegacionalesWikiquote
>>>>> -vif 
>>>>> ar.edu.info.unlp.tesina.vertice.estructuras.IdTextWithComplexValueInputFormat
>>>>> -vip /user/hduser/input/grafo-wikipedia.txt -vof
>>>>> ar.edu.info.unlp.tesina.vertice.estructuras.IdTextWithComplexValueOutputFormat
>>>>> -op /user/hduser/output/caminosNavegacionales -w 1 -yh 120000 -ca
>>>>> giraph.metrics.enable=true,giraph.useOutOfCoreMessages=true,
>>>>> giraph.numInputThreads=8
>>>>>
>>>>> The result was the following one:
>>>>>
>>>>>         16/08/24 21:54:00 INFO master.BspServiceMaster:
>>>>> generateVertexInputSplits: Got 8 input splits for 8 input threads
>>>>>     16/08/24 21:54:00 INFO master.BspServiceMaster:
>>>>> createVertexInputSplits: Starting to write input split data to zookeeper
>>>>> with 1 threads
>>>>>     16/08/24 21:54:00 INFO master.BspServiceMaster:
>>>>> createVertexInputSplits: Done writing input split data to zookeeper
>>>>>     ...
>>>>>
>>>>>     16/08/24 22:10:07 INFO master.MasterThread: setup: Took 0.093
>>>>> seconds.
>>>>>     16/08/24 22:10:07 INFO master.MasterThread: input superstep: Took
>>>>> 891.339 seconds.
>>>>>     16/08/24 22:10:07 INFO master.MasterThread: superstep 0: Took
>>>>> 66.635 seconds.
>>>>>     16/08/24 22:10:07 INFO master.MasterThread: superstep 1: Took
>>>>> 1.837 seconds.
>>>>>     16/08/24 22:10:07 INFO master.MasterThread: shutdown: Took 6.605
>>>>> seconds.
>>>>>     16/08/24 22:10:07 INFO master.MasterThread: total: Took 966.512
>>>>> seconds.
>>>>>
>>>>>
>>>>> So, my question is, how can be possible that Giraph is using 492
>>>>> seconds without input threads and 891 seconds with them? Should be exacly
>>>>> the opposite, right?
>>>>>
>>>>>
>>>>>   [1]: https://dumps.wikimedia.org/eswiki/20160601/ "dump"
>>>>>
>>>>
>>>>
>>>
>>
>

Re: giraph.numInputThreads execution time for "input superstep" it's the same using 1 or 8 threads, how this can be possible?

Reply via email to