Re: Basic questions about Giraph internals

Claudio Martella Thu, 06 Feb 2014 03:01:35 -0800

On Thu, Feb 6, 2014 at 11:56 AM, Alexander Frolov
<alexndr.fro...@gmail.com>wrote:


> Hi Claudio,
>
> thank you.
>
> If I understood correctly, mapper and mapper task is the same thing.
>

More or less. A mapper is a functional element of the programming model,
while the mapper task is the task that executes the mapper function on the
records.


>
>
> On Thu, Feb 6, 2014 at 2:28 PM, Claudio Martella <
> claudio.marte...@gmail.com> wrote:
>
>> Hi Alex,
>>
>> answers are inline.
>>
>>
>> On Thu, Feb 6, 2014 at 11:22 AM, Alexander Frolov <
>> alexndr.fro...@gmail.com> wrote:
>>
>>> Hi, folks!
>>>
>>> I have started small research of Giraph framework and I have not much
>>> experience with Giraph and Hadoop :-(.
>>>
>>> I would like to ask several questions about how things are working in
>>> Giraph which are not straightforward for me. I am trying to use the sources
>>> but sometimes it is not too easy ;-)
>>>
>>> So here they are:
>>>
>>> 1) How Workers are assigned to TaskTrackers?
>>>
>>
>> Each worker is a mapper, and mapper tasks are assigned to tasktrackers by
>> the jobtracker.
>>
>
> That is each Worker is created at the beginning of superstep and then
> dies. In the next superstep all Workers are created again. Is it correct?
>

Nope. The workers are created at the beginning of the computation, and
destroyed at the end of the computation. A computation is persistent
throughout the computation.


>
>
>> There's no control by Giraph there, and because Giraph doesn't need
>> data-locality like Mapreduce does, basically nothing is done.
>>
>
> This is important for me. So Giraph Worker (a.k.a Hadoop mapper) fetches
> vertex with corresponding index from the HDFS and perform computation. What
> does it do next with it? As I understood Giraph is fully in-memory
> framework and in the next superstep this vertex should be fetched from the
> memory by the same Worker. Where the vertices are stored between
> supersteps? In HDFS or in memory?
>

As I said, the workers are persistent (in-memory) between supersteps, so
they keep everything in memory.


>
>
>>
>>>
>>> 2) How vertices are assigned to Workers? Does it depend on distribution
>>> of input file on DataNodes? Is there available any choice of distribution
>>> politics or no?
>>>
>>
>> In the default scheme, vertices are assigned through modulo hash
>> partitioning. Given k workers, vertex v is assigned to worker i according
>> to hash(v) % k = i.
>>
>
>>
>>>
>>> 3) How Workers and Map tasks are related to each other? (1:1)? (n:1)?
>>> (1:n)?
>>>
>>
>> It's 1:1. Each worker is implemented by a mapper task. The master is
>> usually (but does not need to) implemented by an additional mapper
>>
> .
>>
>>
>>>
>>> 4) Can Workers migrate from one TaskTracker to the other?
>>>
>>
>> Workers does not migrate. A Giraph computation is not dynamic wrt to
>> assignment and size of the tasks.
>>
>
>>
>>>
>>> 5) What is the best way to monitor Giraph app execution (progress,
>>> worker assignment, load balancing etc.)?
>>>
>>
>> Just like you would for a standard Mapreduce job. Go to the job page on
>> the jobtracker http page.
>>
>>
>>>
>>> I think this is all for the moment. Thank you.
>>>
>>> Testbed description:
>>> Hardware: 8 node dual-CPU cluster with IB FDR.
>>> Giraph: release-1.0.0-RC2-152-g585511f
>>> Hadoop: hadoop-0.20.203.0, hadoop-rdma-0.9.8
>>>
>>> Best,
>>>    Alex
>>>
>>
>>
>>
>> --
>>    Claudio Martella
>>
>>
>
>


-- 
   Claudio Martella

Re: Basic questions about Giraph internals

Reply via email to