Hello Giraph community,

"*Parameter tuning of graph processing frameworks*" is the domain of
research for my master thesis. The objective of the thesis is to find an
automated method to choose an optimal/sub-optimal configuration for the
graph processing frameworks. At this point, I reviewed the state of the art
in the optimization literature and reviewed the available graph processing
frameworks. *Giraph *is the first framework that I started to discover in
details and start running jobs with it, hoping that it will be the
framework which I will apply the optimization algorithms on.

My question is regarding the set of parameters which should be chosen to
optimize. Since I am not a Giraph expert, I thought the best way is to ask
the community. I made a list of Giraph parameters which I thought are
important and are related directly to the framework performance. The
parameters with higher ranks are parameters which I think are more
important.I hope that you give a feedback about the list: *is it a good set
of parameters to optimize? Are there some parameters in the set which
should be fixed for all different kind of jobs? Any suggestion to change
the ranking, add or remove parameters? *

I will add more parameters regarding the used hardware (number of CPUs,
size of RAM per CPU and hard disk speed), but the point of this email is to
focus on the parameters of *Giraph.*

Thanks,
Muaz TWATY
*EURA NOVA *


Ranking Parameter name Default value Details
Hadoop 1 -w required Number of workers
Hadoop 2 -yarnheap 1024 (integer) MB.
Heap size, in MB, for each Giraph task (YARN only.)
Giraph 3 giraph.useInputSplitLocality TRUE
To minimize network usage when reading input splits, each worker can
prioritize splits that reside on its host. This, however, comes at the cost
of increased load on ZooKeeper. Hence, users with a lot of splits and input
threads (or with configurations that can't exploit locality) may want to
disable it.
Giraph 4 giraph.useMessageSizeEncoding FALSE
Use message size encoding (typically better for complex objects, not meant
for primitive wrapped messages)
Giraph 5 giraph.VerticesToUpdateProgress 100000
Minimum number of vertices to compute before updating worker progress
Giraph 6 giraph.maxMutationsPerRequest 100
Maximum number of mutations per partition before flush
Giraph 7 giraph.maxPartitionsInMemory 0
Maximum number of partitions to hold in memory for each worker. By default
it is set to 0 (for adaptive out-of-core mechanism
Giraph 8 giraph.clientReceiveBufferSize 32768 Client receive buffer size
Giraph 9 giraph.clientSendBufferSize 524288 Client send buffer size
Giraph 10 giraph.serverReceiveBufferSize 524288 Server receive buffer size
Giraph 11 giraph.serverSendBufferSize 32768 Server send buffer size
Giraph 12 giraph.async.message.store.threads 0
Number of threads to be used in async message store
Giraph 13 giraph.channelsPerServer 1
Number of channels used per server
Giraph 14 giraph.nettyClientExecutionThreads 8
Netty client execution threads (execution handler)
Giraph 15 giraph.nettyClientThreads 4 Netty client threads
Giraph 16 giraph.nettyServerExecutionThreads 8
Netty server execution threads (execution handler)
Giraph 17 giraph.nettyServerThreads 16 Netty server threads
Giraph 18 giraph.numComputeThreads 1
Number of threads for vertex computation
Giraph 19 giraph.checkpointFrequency 0
How often to checkpoint (i.e. 0, means no checkpoint, 1 means every
superstep, 2 is every two supersteps, etc.).

-- 
♻ Be green, keep it on the screen

Reply via email to