Thanks for all the responses so far! I have started to understand the
system more, but I just had another question while I was going along. Is
there a way to check the individual partitions of an RDD? For example, if I
had a graph with vertices a,b,c,d and it was split into 2 partitions could
I check which vertices belonged in partition 1 and parition 2?

Thank You,
Matthew Bucci

On Fri, Feb 13, 2015 at 10:58 PM, Ankur Dave <ankurd...@gmail.com> wrote:

> At 2015-02-13 12:19:46 -0800, Matthew Bucci <mrbucci...@gmail.com> wrote:
> > 1) How do you actually run programs in GraphX? At the moment I've been
> doing
> > everything live through the shell, but I'd obviously like to be able to
> work
> > on it by writing and running scripts.
>
> You can create your own projects that build against Spark and GraphX
> through a Maven dependency [1], then run those applications using the
> bin/spark-submit script included with Spark [2].
>
> These guides assume you already know how to do this using your preferred
> build tool (SBT or Maven). In short, here's how to do it with SBT:
>
> 1. Install SBT locally (`brew install sbt` on OS X).
>
> 2. Inside your project directory, create a build.sbt file listing Spark
> and GraphX as a dependency, as in [3].
>
> 3. Run `sbt package` in a shell.
>
> 4. Pass the JAR in your_project_dir/target/scala-2.10/ to bin/spark-submit.
>
> [1]
> http://spark.apache.org/docs/latest/programming-guide.html#linking-with-spark
> [2] http://spark.apache.org/docs/latest/submitting-applications.html
> [3] https://gist.github.com/ankurdave/1fb7234d8affb3a2e4f4
>
> >> 2) Is there a way to check the status of the partitions of a graph? For
> > example, I want to determine for starters if the number of partitions
> > requested are always made, like if I ask for 8 partitions but only have 4
> > cores what happens?
>
> You can look at `graph.vertices` and `graph.edges`, which are both RDDs,
> so you can do for example: graph.vertices.partitions
>
> > 3) Would I be able to partition by vertex instead of edges, even if I
> had to
> > write it myself? I know partitioning by edges is favored in a majority of
> > the cases, but for the sake of research I'd like to be able to do both.
>
> If you pass PartitionStrategy.EdgePartition1D, this will partition edges
> by their source vertices, so all edges with the same source will be
> co-partitioned, and the communication pattern will be similar to
> vertex-partitioned (edge-cut) systems like Giraph.
>
> > 4) Is there a better way to time processes outside of using built-in unix
> > timing through the logs or something?
>
> I think the options are Unix timing, log file timestamp parsing, looking
> at the web UI, or writing timing code within your program
> (System.currentTimeMillis and System.nanoTime).
>
> Ankur
>

Reply via email to