Hello Giraph community, "*Parameter tuning of graph processing frameworks*" is the domain of research for my master thesis. The objective of the thesis is to find an automated method to choose an optimal/sub-optimal configuration for the graph processing frameworks. At this point, I reviewed the state of the art in the optimization literature and reviewed the available graph processing frameworks. *Giraph *is the first framework that I started to discover in details and start running jobs with it, hoping that it will be the framework which I will apply the optimization algorithms on.
My question is regarding the set of parameters which should be chosen to optimize. Since I am not a Giraph expert, I thought the best way is to ask the community. I made a list of Giraph parameters which I thought are important and are related directly to the framework performance. The parameters with higher ranks are parameters which I think are more important.I hope that you give a feedback about the list: *is it a good set of parameters to optimize? Are there some parameters in the set which should be fixed for all different kind of jobs? Any suggestion to change the ranking, add or remove parameters? * I will add more parameters regarding the used hardware (number of CPUs, size of RAM per CPU and hard disk speed), but the point of this email is to focus on the parameters of *Giraph.* Thanks, Muaz TWATY *EURA NOVA * Ranking Parameter name Default value Details Hadoop 1 -w required Number of workers Hadoop 2 -yarnheap 1024 (integer) MB. Heap size, in MB, for each Giraph task (YARN only.) Giraph 3 giraph.useInputSplitLocality TRUE To minimize network usage when reading input splits, each worker can prioritize splits that reside on its host. This, however, comes at the cost of increased load on ZooKeeper. Hence, users with a lot of splits and input threads (or with configurations that can't exploit locality) may want to disable it. Giraph 4 giraph.useMessageSizeEncoding FALSE Use message size encoding (typically better for complex objects, not meant for primitive wrapped messages) Giraph 5 giraph.VerticesToUpdateProgress 100000 Minimum number of vertices to compute before updating worker progress Giraph 6 giraph.maxMutationsPerRequest 100 Maximum number of mutations per partition before flush Giraph 7 giraph.maxPartitionsInMemory 0 Maximum number of partitions to hold in memory for each worker. By default it is set to 0 (for adaptive out-of-core mechanism Giraph 8 giraph.clientReceiveBufferSize 32768 Client receive buffer size Giraph 9 giraph.clientSendBufferSize 524288 Client send buffer size Giraph 10 giraph.serverReceiveBufferSize 524288 Server receive buffer size Giraph 11 giraph.serverSendBufferSize 32768 Server send buffer size Giraph 12 giraph.async.message.store.threads 0 Number of threads to be used in async message store Giraph 13 giraph.channelsPerServer 1 Number of channels used per server Giraph 14 giraph.nettyClientExecutionThreads 8 Netty client execution threads (execution handler) Giraph 15 giraph.nettyClientThreads 4 Netty client threads Giraph 16 giraph.nettyServerExecutionThreads 8 Netty server execution threads (execution handler) Giraph 17 giraph.nettyServerThreads 16 Netty server threads Giraph 18 giraph.numComputeThreads 1 Number of threads for vertex computation Giraph 19 giraph.checkpointFrequency 0 How often to checkpoint (i.e. 0, means no checkpoint, 1 means every superstep, 2 is every two supersteps, etc.). -- ♻ Be green, keep it on the screen