Re: Is spark suitable for large scale pagerank, such as 200 millionnodes, 2 billion edges?

txw Sat, 17 Jan 2015 22:19:35 -0800

I’ve read these pages. In the paper "GraphX: Graph Processing in a Distributed 
Dataflow Framework
“, the authors claim that it only takes 400 seconds for uk-2007-05 dataset, 
which is similar size as my dateset. Is the current Graphx the same version as 
the Graphx in that paper? And how many partitions does the experiment have for 
uk-2007-05 dataset. I tried 16, 192partitions, and both are sucked.





原始邮件
发件人:Ted [email protected]
收件人:[email protected]
抄送:[email protected]
发送时间:2015年1月16日(周五) 02:23
主题:Re: Is spark suitable for large scale pagerank, such as 200 millionnodes, 2 
billion edges?


Have you seenhttp://search-hadoop.com/m/JW1q5pE3P12 ?


Please also take a look at the end-to-end performance graph on 
http://spark.apache.org/graphx/


Cheers


On Thu, Jan 15, 2015 at 9:29 AM, txw [email protected] wrote:

Hi,


I am run PageRank on a large dataset, which include 200 million nodes and 2 
billion edges?
Isspark suitable for large scale pagerank? How many cores and MEM do I need and 
how long will it take?


Thanks


Xuewei Tang

Re: Is spark suitable for large scale pagerank, such as 200 millionnodes, 2 billion edges?

Reply via email to