Hi Mirko, this is in general the kind of approach I was suggesting, but looked at in a broader-perspective. I'd tend to avoid calling other tools such as Hive or Pig often to compute injections, as Giraph is still a batch-processing and this could really introduce latency and reduce throughput. I feel that if the injection of vertices and edges would really require such a complexity (such a computing them with M/R), then one could just create a pipeline of jobs. But this is only my superficial analysis/speculation, I can see your point on integration and your proposal is very interesting.
On Sun, Aug 25, 2013 at 8:55 AM, Mirko Kämpf <mirko.kae...@cloudera.com>wrote: > Good morning Gentlemen, > > as far as I understand your thread you are talking about the same topic I > was thinking and working some time. > I work on a research project focused on evolution of networks and networks > dynamics in networks of networks. > > My understanding of Marco's question is, that he needs to change node > properties or even wants to add nodes to the graph while it is processed, > right? > > With the WorkerContext we could construct a "Connector" to the outside > world, not just for loading data from HDFS, which requires a preprocessing > step for the data which has to be loaded also. I think about HBase often. > All my nodes and edges live in HBase. From there it is quite easy to load > new data based on a simple "Scan" or even if the WorkerContext triggers a > Hive or Pig script, one can automatically reorganize or extract relevant > new links / nodes which have to be added to the graph. > > Such an approach means, after n super steps of the Giraph layer an > additional utility-step (triggered via WorkerContext, or any other better > fitting class form Giraph - not sure jet there to start) is executed. > Before such a step the state of the graph is persisted to allow fall back > or resume. The utility-step can be a processing (MR, Mahout) or just a load > (from HDFS, HBase) operation and it allows a kind of clocked data flow > directly into a running Giraph application. I think this is a very > important feature in Complex Systems research, as we have interacting > layers which change in parallel. In this picture the Giraph steps are the > steps of layer A, lets say something whats going on on top of a network and > the utility-step expresses the changes in the underlying structure > affecting the network it self but based on the data / properties of the > second subsystem, e.g. the agents operating on top of the network. > > I created a tool, which worked like this - but not at scale - and it was > at a time before Giraph. What do you think, is there a need for such a kind > of extension in the Giraph world? > > Have a nice Sunday. > > Best wishes > Mirko > > -- > -- > Mirko Kämpf > > *Trainer* @ Cloudera > > tel: +49 *176 20 63 51 99* > skype: *kamir1604* > mi...@cloudera.com > > > > On Wed, Aug 21, 2013 at 3:30 PM, Claudio Martella < > claudio.marte...@gmail.com> wrote: > >> As I said, the injection of the new vertices/edges would have to be done >> "manually", hence without any support of the infrastructure. I'd suggest >> you implement a WorkerContext class that supports the reading of a specific >> file with a specific format (under your control) from HDFS, and that is >> accessed by this particular "special" vertex (e.g. based on the vertex ID). >> >> Does this make sense? >> >> >> On Wed, Aug 21, 2013 at 2:13 PM, Marco Aurelio Barbosa Fagnani Lotz < >> m.a.b.l...@stu12.qmul.ac.uk> wrote: >> >>> Dear Mr. Martella, >>> >>> Once achieved the conditions for updating the vertex data base, what it >>> the best way for the Injector Vertex to call an input reader again? >>> >>> I am able to access all the HDFS data, but I guess the vertex would need >>> to have access to the input splits and also the vertex input format that I >>> designate. Am I correct? Or there is a way that one can just ask Zookeeper >>> to create new splits and distribute to the workers from given a path in DFS? >>> >>> Best Regards, >>> Marco Lotz >>> ------------------------------ >>> *From:* Claudio Martella <claudio.marte...@gmail.com> >>> *Sent:* 14 August 2013 15:25 >>> *To:* user@giraph.apache.org >>> *Subject:* Re: Dynamic Graphs >>> >>> Hi Marco, >>> >>> Giraph currently does not support that. One way of doing this would be >>> by having a specific (pseudo-)vertex to act as the "injector" of the new >>> vertices and edges For example, it would read a file from HDFS and call the >>> mutable API during the computation, superstep after superstep. >>> >>> >>> On Wed, Aug 14, 2013 at 3:02 PM, Marco Aurelio Barbosa Fagnani Lotz < >>> m.a.b.l...@stu12.qmul.ac.uk> wrote: >>> >>>> Hello all, >>>> >>>> I would like to know if there is any form to use dynamic graphs with >>>> Giraph. By dynamic one can read graphs that may change while Giraph is >>>> computing/deliberating. The changes are in the input file and are not >>>> caused by the graph computation itself. >>>> >>>> Is there any way to analyse it using Giraph? If not, anyone has any >>>> idea/suggestion if it is possible to modify the framework in order to >>>> process it? >>>> >>>> Best Regards, >>>> Marco Lotz >>>> >>> >>> >>> >>> -- >>> Claudio Martella >>> claudio.marte...@gmail.com >>> >> >> >> >> -- >> Claudio Martella >> claudio.marte...@gmail.com >> > > > > > -- Claudio Martella claudio.marte...@gmail.com