Re: Is Spark? or GraphX runs fast? a performance comparison on Page Rank

Harihar Nahak Thu, 27 Nov 2014 13:56:41 -0800

Thanks Ankur, Its really help full. I've few queries on optimization
techniques. for the current I used RandomVertexCut partition.


But what partition should be used if have:
1. No. of edges in edgeList file are to large like 50,000,000; where
multiple edges to same pair of vertices are many
2. No of unique Vertex are to large suppose 10,000,000 in above edgeList
file
3. No of unique Vertex are small suppose less than 100,000 in above
edgeList file





On 27 November 2014 at 20:23, ankurdave [via Apache Spark User List] <
ml-node+s1001560n1995...@n3.nabble.com> wrote:

> At 2014-11-24 19:02:08 -0800, Harihar Nahak <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19956&i=0>> wrote:
>
> > According to documentation GraphX runs 10x faster than normal Spark. So
> I
> > run Page Rank algorithm in both the applications:
> > [...]
> > Local Mode (Machine : 8 Core; 16 GB memory; 2.80 Ghz Intel i7; Executor
> > Memory: 4Gb, No. of Partition: 50; No. of Iterations: 2);   ==>
> >
> > *Spark Page Rank took -> 21.29 mins
> > GraphX Page Rank took -> 42.01 mins *
> >
> > Cluster Mode (ubantu 12.4; spark 1.1/hadoop 2.4 cluster ; 3 workers , 1
> > driver , 8 cores, 30 gb memory) (Executor memory 4gb; No. of edge
> partitions
> > : 50, random vertex cut ; no. of iteration : 2) =>
> >
> > *Spark Page Rank took -> 10.54 mins
> > GraphX Page Rank took -> 7.54 mins *
> >
> > Could you please help me to determine, when to use Spark and GraphX ? If
> > GraphX took same amount of time than Spark then its better to use Spark
> > because spark has variey of operators to deal with any type of RDD.
>
> If you have a problem that's naturally expressible as a graph computation,
> it makes sense to use GraphX in my opinion. In addition to the
> optimizations that GraphX incorporates which you would otherwise have to
> implement manually, GraphX's programming model is likely a better fit. But
> even if you start off by using pure Spark, you'll still have the
> flexibility to use GraphX for other parts of the problem since it's part of
> the same system.
>
> To address the benchmark results you got:
>
> 1. GraphX takes more time than Spark to load the graph, because it has to
> index it, but subsequent iterations should be faster. We benchmarked with
> 20 iterations to show this effect, but you only used 2 iterations, which
> doesn't give much time to amortize the loading cost.
>
> 2. The benchmarks in the GraphX OSDI paper are against a naive
> implementation of PageRank in Spark, while the version you benchmarked
> against has some of the same optimizations as GraphX does. I believe we
> found that the optimized Spark PageRank was only 3x slower than GraphX.
>
> 3. When running those benchmarks, we used an experimental version of Spark
> with in-memory shuffle, which disproportionately benefits GraphX since its
> shuffle files are smaller due to specialized compression.
>
> 4. We haven't optimized GraphX for local mode, so it's not surprising that
> it's slower there.
>
> Ankur
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> <http://user/SendEmail.jtp?type=node&node=19956&i=1>
> For additional commands, e-mail: [hidden email]
> <http://user/SendEmail.jtp?type=node&node=19956&i=2>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-or-GraphX-runs-fast-a-performance-comparison-on-Page-Rank-tp19710p19956.html
>  To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1...@n3.nabble.com
> To unsubscribe from Is Spark? or GraphX runs fast? a performance
> comparison on Page Rank, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=19710&code=aG5haGFrQHd5bnlhcmRncm91cC5jb218MTk3MTB8LTE4MTkxOTE5Mjk=>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
Regards,
Harihar Nahak
BigData Developer
Wynyard
Email:hna...@wynyardgroup.com | Extn: 8019




-----
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-or-GraphX-runs-fast-a-performance-comparison-on-Page-Rank-tp19710p19986.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Is Spark? or GraphX runs fast? a performance comparison on Page Rank

Reply via email to