Right. If your use case boils down to a join, we're probably not the ideal tool.
On Tue, Jan 29, 2013 at 1:26 AM, Claudio Martella < claudio.marte...@gmail.com> wrote: > That is correct, but that is not the reason. We use M/R for resource > allocation, but we do not inherit the limits of the M/R paradigm for > graphs. The thing is that you probably have a lot of data (I assume it > because you are using HBase, hence it is difficult for you to fit it all > into memory), and you do not have an iterative process (hence you do not > hit the costs of multiple M/R jobs). Plus, the join is a well understood > problem in M/R. > > > On Tue, Jan 29, 2013 at 9:52 AM, David Koch <ogd...@googlemail.com> wrote: > >> Hello Claudio and Eli, >> >> Thank you for your answers. As far as Map/Reduce being a better tool for >> the job - I was under the impression that Giraph relies on the M/R >> framework. It seems like it when I check the console output of the examples >> on the project's Wiki. >> >> Again, thank you. >> >> /David >> >> >> On Mon, Jan 28, 2013 at 8:49 PM, Claudio Martella < >> claudio.marte...@gmail.com> wrote: >> >>> One more general point would be whether giraph is a better tool for your >>> problem. From my understanding, map reduce is probably the way to go. >>> >>> >>> On Monday, January 28, 2013, Eli Reisman wrote: >>> >>>> I agree, something like this is possible using the vertex value. In >>>> giraph, we now have native support for multigraphs, but before we had that >>>> support, I described a kind of "cheat" to process multigraphs. You could >>>> use a variation of that same cheat (its on the site confluence wiki) to do >>>> what you're talking about I think, even though you're not dealing with a >>>> multigraph in the problem you described. Essentially, you can get clever >>>> about what sort of Writable you use for the vertex value type, and/or what >>>> the values it holds can represent in your dataset. >>>> >>>> Alternately, in the off chance that the row-keys do not repeat in the >>>> tables, then really the "row key" can be a Writable vertex ID as long as >>>> each is unique .The only repetition would be the fact that other rows with >>>> their own unique row-keys contain row values that mark out-edges to other >>>> unique row-keys in the table, but more than once since any row-key could >>>> have lots of other rows "pointing" an out-edge value towards it. Thinking >>>> of each row key as unique vertex ID then just turns this into a vanilla >>>> graph. However, if the row keys are not unique in among all your tables, >>>> this oversimplifies the problem and you really are stuck wtih the above >>>> vertex value option. >>>> >>>> My point: Giraph has vertex value, ID, out-edge-to-other-vertex ID's, >>>> and message data types, and as long as the properties required of each for >>>> a graph are met, and each is a Writable, you can think of the problem >>>> (often) in one of several ways that Giraph can support. >>>> >>>> One last thought: assuming the graph does not mutate during processing, >>>> you could also write a custom input format that evaluates each row as it >>>> builds it into a graph vertex data structure, and chooses only row keys >>>> that are of a certain classification in your use case to make into graph >>>> data for that job run, simply skipping the other rows as it reads them. >>>> again, this "solution" depends on the nature of your problem. Just >>>> something to play with. >>>> >>>> Good luck with your use case! >>>> >>>> On Mon, Jan 28, 2013 at 7:09 AM, Claudio Martella < >>>> claudio.marte...@gmail.com> wrote: >>>> >>>>> Giraph does not support multipartite graph in a natural way. But you >>>>> can try to model your different sets through the vertexvalue. You can then >>>>> propagate it (by composing with the ID?) to the neighbors, and obtain your >>>>> join. >>>>> >>>>> >>>>> On Mon, Jan 28, 2013 at 2:52 PM, David Koch <ogd...@googlemail.com>wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> In Giraph is it possible to have different node types in a graph and >>>>>> have a Map/Reduce only iterate over nodes of this type and their direct >>>>>> successors? >>>>>> >>>>>> If it sounds a bit cryptic here is something more about our use-case: >>>>>> We have different HBase tables which we want to "pseudo-join" in >>>>>> Map/Reduce computations. The node types I mentioned above correspond to >>>>>> the >>>>>> respective row-key types used in each of those tables, edges are >>>>>> generated >>>>>> by the fact that the KeyValues in each table can contain row-key values >>>>>> found in one of the other tables. >>>>>> >>>>>> The graph would describe these relations. In a Map/Reduce I then want >>>>>> to be able to iterate over all nodes of a given type while also having >>>>>> access to a node's successor nodes in the same Mapper instance or better >>>>>> yet the same map() call. One would then carry out additional Gets to >>>>>> retrieve the data from the tables thus doing a fairly crude join. >>>>>> >>>>>> The Graph is likely to change so it would be nice if it could be >>>>>> updated incrementally. >>>>>> >>>>>> Does all this sound like something that would be possible with Giraph? >>>>>> >>>>>> Thank you, >>>>>> >>>>>> /David >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Claudio Martella >>>>> claudio.marte...@gmail.com >>>>> >>>> >>>> >>> >>> -- >>> Claudio Martella >>> claudio.marte...@gmail.com >>> >> >> > > > -- > Claudio Martella > claudio.marte...@gmail.com >