Re: Multiple node types in Giraph and doing a selective M/R over one of them

Eli Reisman Tue, 29 Jan 2013 17:41:33 -0800

Right. If your use case boils down to a join, we're probably not the ideal
tool.



On Tue, Jan 29, 2013 at 1:26 AM, Claudio Martella <
claudio.marte...@gmail.com> wrote:

> That is correct, but that is not the reason. We use M/R for resource
> allocation, but we do not inherit the limits of the M/R paradigm for
> graphs. The thing is that you probably have a lot of data (I assume it
> because you are using HBase, hence it is difficult for you to fit it all
> into memory), and you do not have an iterative process (hence you do not
> hit the costs of multiple M/R jobs). Plus, the join is a well understood
> problem in M/R.
>
>
> On Tue, Jan 29, 2013 at 9:52 AM, David Koch <ogd...@googlemail.com> wrote:
>
>> Hello Claudio and Eli,
>>
>> Thank you for your answers. As far as Map/Reduce being a better tool for
>> the job - I was under the impression that Giraph relies on the M/R
>> framework. It seems like it when I check the console output of the examples
>> on the project's Wiki.
>>
>> Again, thank you.
>>
>> /David
>>
>>
>> On Mon, Jan 28, 2013 at 8:49 PM, Claudio Martella <
>> claudio.marte...@gmail.com> wrote:
>>
>>> One more general point would be whether giraph is a better tool for your
>>> problem. From my understanding, map reduce is probably the way to go.
>>>
>>>
>>> On Monday, January 28, 2013, Eli Reisman wrote:
>>>
>>>> I agree, something like this is possible using the vertex value. In
>>>> giraph, we now have native support for multigraphs, but before we had that
>>>> support, I described a kind of "cheat" to process multigraphs. You could
>>>> use a variation of that same cheat (its on the site confluence wiki) to do
>>>> what you're talking about I think, even though you're not dealing with a
>>>> multigraph in the problem you described. Essentially, you can get clever
>>>> about what sort of Writable you use for the vertex value type, and/or what
>>>> the values it holds can represent in your dataset.
>>>>
>>>> Alternately, in the off chance that the row-keys do not repeat in the
>>>> tables, then really the "row key" can be a Writable vertex ID as long as
>>>> each is unique .The only repetition would be the fact that other rows with
>>>> their own unique row-keys contain row values that mark out-edges to other
>>>> unique row-keys in the table, but more than once since any row-key could
>>>> have lots of other rows "pointing" an out-edge value towards it. Thinking
>>>> of each row key as unique vertex ID then just turns this into a vanilla
>>>> graph. However, if the row keys are not unique in among all your tables,
>>>> this oversimplifies the problem and you really are stuck wtih the above
>>>> vertex value option.
>>>>
>>>> My point: Giraph has vertex value, ID, out-edge-to-other-vertex ID's,
>>>> and message data types, and as long as the properties required of each for
>>>> a graph are met, and each is a Writable, you can think of the problem
>>>> (often) in one of several ways that Giraph can support.
>>>>
>>>> One last thought: assuming the graph does not mutate during processing,
>>>> you could also write a custom input format that evaluates each row as it
>>>> builds it into a graph vertex data structure, and chooses only row keys
>>>> that are of a certain classification in your use case to make into graph
>>>> data for that job run, simply skipping the other rows as it reads them.
>>>> again, this "solution" depends on the nature of your problem. Just
>>>> something to play with.
>>>>
>>>> Good luck with your use case!
>>>>
>>>> On Mon, Jan 28, 2013 at 7:09 AM, Claudio Martella <
>>>> claudio.marte...@gmail.com> wrote:
>>>>
>>>>> Giraph does not support multipartite graph in a natural way. But you
>>>>> can try to model your different sets through the vertexvalue. You can then
>>>>> propagate it (by composing with the ID?) to the neighbors, and obtain your
>>>>> join.
>>>>>
>>>>>
>>>>> On Mon, Jan 28, 2013 at 2:52 PM, David Koch <ogd...@googlemail.com>wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> In Giraph is it possible to have different node types in a graph and
>>>>>> have a Map/Reduce only iterate over nodes of this type and their direct
>>>>>> successors?
>>>>>>
>>>>>> If it sounds a bit cryptic here is something more about our use-case:
>>>>>> We have different HBase tables which we want to "pseudo-join" in
>>>>>> Map/Reduce computations. The node types I mentioned above correspond to 
>>>>>> the
>>>>>> respective row-key types used in each of those tables, edges are 
>>>>>> generated
>>>>>> by the fact that the KeyValues in each table can contain row-key values
>>>>>> found in one of the other tables.
>>>>>>
>>>>>> The graph would describe these relations. In a Map/Reduce I then want
>>>>>> to be able to iterate over all nodes of a given type while also having
>>>>>> access to a node's successor nodes in the same Mapper instance or better
>>>>>> yet the same map() call. One would then carry out additional Gets to
>>>>>> retrieve the data from the tables thus doing a fairly crude join.
>>>>>>
>>>>>> The Graph is likely to change so it would be nice if it could be
>>>>>> updated incrementally.
>>>>>>
>>>>>> Does all this sound like something that would be possible with Giraph?
>>>>>>
>>>>>> Thank you,
>>>>>>
>>>>>> /David
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>    Claudio Martella
>>>>>    claudio.marte...@gmail.com
>>>>>
>>>>
>>>>
>>>
>>> --
>>>    Claudio Martella
>>>    claudio.marte...@gmail.com
>>>
>>
>>
>
>
> --
>    Claudio Martella
>    claudio.marte...@gmail.com
>

Re: Multiple node types in Giraph and doing a selective M/R over one of them

Reply via email to