Thanks for the clarification Ankur
Appreciate it.

Regards
Sunita

On Monday, August 25, 2014, Ankur Dave <ankurd...@gmail.com> wrote:

> At 2014-08-25 11:23:37 -0700, Sunita Arvind <sunitarv...@gmail.com
> <javascript:;>> wrote:
> > Does this "We introduce GraphX, which combines the advantages of both
> > data-parallel and graph-parallel systems by efficiently expressing graph
> > computation within the Spark data-parallel framework. We leverage new
> ideas
> > in distributed graph representation to efficiently distribute graphs as
> > tabular data-structures. Similarly, we leverage advances in data-flow
> > systems to exploit in-memory computation and fault-tolerance." mean that
> > GraphX makes the typical RDBMS operations possible even when the data is
> > persisted in a GDBMS and not viceversa?
>
> This quote refers to the research idea that while previous graph-parallel
> systems (Pregel, GraphLab, etc.) were built as specialized systems for
> performance, it's actually possible to avoid the trouble of a separate
> system by embedding graph computation efficiently in a general
> data-parallel system. Here "data-parallel" refers generally to any system
> that can support the join optimizations, including Spark and, with some
> work on the optimizer, relational databases as well. So GraphX use
> data-parallel or relational operators to provide graph computation, not the
> other way around.
>
> > From what I initially thought, it looked like GraphX could be applied to
> data
> > stored in RDBMSs as Spark could translate the relational data into
> graphical
> > representation. However, there seems to be no conversation and everything
> > presented in GraphX implementations AFAIK, works on vertices and edges.
> So
> > does it mean that GraphX is only relevant when the backend is a GDBMS?
>
> GraphX, the library on top of Spark, can be applied indirectly to
> relational data as you described: you can use Spark to load vertex and edge
> tables from a relational database, then process them with GraphX. This
> isn't discussed in the GraphX documentation because it's a concern of
> Spark. GraphX is only relevant once you have the vertices and edges in RDD
> form.
>
> GraphX, the research concept, can in theory be implemented directly in a
> relational database by augmenting the query optimizer to support the
> optimizations described in the paper and setting up the appropriate indexes
> on the vertex and edge tables.
>
> Ankur
>

Reply via email to