On Thu, Feb 6, 2014 at 11:56 AM, Alexander Frolov <alexndr.fro...@gmail.com>wrote:
> Hi Claudio, > > thank you. > > If I understood correctly, mapper and mapper task is the same thing. > More or less. A mapper is a functional element of the programming model, while the mapper task is the task that executes the mapper function on the records. > > > On Thu, Feb 6, 2014 at 2:28 PM, Claudio Martella < > claudio.marte...@gmail.com> wrote: > >> Hi Alex, >> >> answers are inline. >> >> >> On Thu, Feb 6, 2014 at 11:22 AM, Alexander Frolov < >> alexndr.fro...@gmail.com> wrote: >> >>> Hi, folks! >>> >>> I have started small research of Giraph framework and I have not much >>> experience with Giraph and Hadoop :-(. >>> >>> I would like to ask several questions about how things are working in >>> Giraph which are not straightforward for me. I am trying to use the sources >>> but sometimes it is not too easy ;-) >>> >>> So here they are: >>> >>> 1) How Workers are assigned to TaskTrackers? >>> >> >> Each worker is a mapper, and mapper tasks are assigned to tasktrackers by >> the jobtracker. >> > > That is each Worker is created at the beginning of superstep and then > dies. In the next superstep all Workers are created again. Is it correct? > Nope. The workers are created at the beginning of the computation, and destroyed at the end of the computation. A computation is persistent throughout the computation. > > >> There's no control by Giraph there, and because Giraph doesn't need >> data-locality like Mapreduce does, basically nothing is done. >> > > This is important for me. So Giraph Worker (a.k.a Hadoop mapper) fetches > vertex with corresponding index from the HDFS and perform computation. What > does it do next with it? As I understood Giraph is fully in-memory > framework and in the next superstep this vertex should be fetched from the > memory by the same Worker. Where the vertices are stored between > supersteps? In HDFS or in memory? > As I said, the workers are persistent (in-memory) between supersteps, so they keep everything in memory. > > >> >>> >>> 2) How vertices are assigned to Workers? Does it depend on distribution >>> of input file on DataNodes? Is there available any choice of distribution >>> politics or no? >>> >> >> In the default scheme, vertices are assigned through modulo hash >> partitioning. Given k workers, vertex v is assigned to worker i according >> to hash(v) % k = i. >> > >> >>> >>> 3) How Workers and Map tasks are related to each other? (1:1)? (n:1)? >>> (1:n)? >>> >> >> It's 1:1. Each worker is implemented by a mapper task. The master is >> usually (but does not need to) implemented by an additional mapper >> > . >> >> >>> >>> 4) Can Workers migrate from one TaskTracker to the other? >>> >> >> Workers does not migrate. A Giraph computation is not dynamic wrt to >> assignment and size of the tasks. >> > >> >>> >>> 5) What is the best way to monitor Giraph app execution (progress, >>> worker assignment, load balancing etc.)? >>> >> >> Just like you would for a standard Mapreduce job. Go to the job page on >> the jobtracker http page. >> >> >>> >>> I think this is all for the moment. Thank you. >>> >>> Testbed description: >>> Hardware: 8 node dual-CPU cluster with IB FDR. >>> Giraph: release-1.0.0-RC2-152-g585511f >>> Hadoop: hadoop-0.20.203.0, hadoop-rdma-0.9.8 >>> >>> Best, >>> Alex >>> >> >> >> >> -- >> Claudio Martella >> >> > > -- Claudio Martella