Thanks for the clarification Ankur Appreciate it. Regards Sunita
On Monday, August 25, 2014, Ankur Dave <ankurd...@gmail.com> wrote: > At 2014-08-25 11:23:37 -0700, Sunita Arvind <sunitarv...@gmail.com > <javascript:;>> wrote: > > Does this "We introduce GraphX, which combines the advantages of both > > data-parallel and graph-parallel systems by efficiently expressing graph > > computation within the Spark data-parallel framework. We leverage new > ideas > > in distributed graph representation to efficiently distribute graphs as > > tabular data-structures. Similarly, we leverage advances in data-flow > > systems to exploit in-memory computation and fault-tolerance." mean that > > GraphX makes the typical RDBMS operations possible even when the data is > > persisted in a GDBMS and not viceversa? > > This quote refers to the research idea that while previous graph-parallel > systems (Pregel, GraphLab, etc.) were built as specialized systems for > performance, it's actually possible to avoid the trouble of a separate > system by embedding graph computation efficiently in a general > data-parallel system. Here "data-parallel" refers generally to any system > that can support the join optimizations, including Spark and, with some > work on the optimizer, relational databases as well. So GraphX use > data-parallel or relational operators to provide graph computation, not the > other way around. > > > From what I initially thought, it looked like GraphX could be applied to > data > > stored in RDBMSs as Spark could translate the relational data into > graphical > > representation. However, there seems to be no conversation and everything > > presented in GraphX implementations AFAIK, works on vertices and edges. > So > > does it mean that GraphX is only relevant when the backend is a GDBMS? > > GraphX, the library on top of Spark, can be applied indirectly to > relational data as you described: you can use Spark to load vertex and edge > tables from a relational database, then process them with GraphX. This > isn't discussed in the GraphX documentation because it's a concern of > Spark. GraphX is only relevant once you have the vertices and edges in RDD > form. > > GraphX, the research concept, can in theory be implemented directly in a > relational database by augmenting the query optimizer to support the > optimizations described in the paper and setting up the appropriate indexes > on the vertex and edge tables. > > Ankur >