Hi Alex, answers are inline.
On Thu, Feb 6, 2014 at 11:22 AM, Alexander Frolov <alexndr.fro...@gmail.com>wrote: > Hi, folks! > > I have started small research of Giraph framework and I have not much > experience with Giraph and Hadoop :-(. > > I would like to ask several questions about how things are working in > Giraph which are not straightforward for me. I am trying to use the sources > but sometimes it is not too easy ;-) > > So here they are: > > 1) How Workers are assigned to TaskTrackers? > Each worker is a mapper, and mapper tasks are assigned to tasktrackers by the jobtracker. There's no control by Giraph there, and because Giraph doesn't need data-locality like Mapreduce does, basically nothing is done. > > 2) How vertices are assigned to Workers? Does it depend on distribution of > input file on DataNodes? Is there available any choice of distribution > politics or no? > In the default scheme, vertices are assigned through modulo hash partitioning. Given k workers, vertex v is assigned to worker i according to hash(v) % k = i. > > 3) How Workers and Map tasks are related to each other? (1:1)? (n:1)? > (1:n)? > It's 1:1. Each worker is implemented by a mapper task. The master is usually (but does not need to) implemented by an additional mapper. > > 4) Can Workers migrate from one TaskTracker to the other? > Workers does not migrate. A Giraph computation is not dynamic wrt to assignment and size of the tasks. > > 5) What is the best way to monitor Giraph app execution (progress, worker > assignment, load balancing etc.)? > Just like you would for a standard Mapreduce job. Go to the job page on the jobtracker http page. > > I think this is all for the moment. Thank you. > > Testbed description: > Hardware: 8 node dual-CPU cluster with IB FDR. > Giraph: release-1.0.0-RC2-152-g585511f > Hadoop: hadoop-0.20.203.0, hadoop-rdma-0.9.8 > > Best, > Alex > -- Claudio Martella