I’ve read these pages. In the paper "GraphX: Graph Processing in a Distributed Dataflow Framework “, the authors claim that it only takes 400 seconds for uk-2007-05 dataset, which is similar size as my dateset. Is the current Graphx the same version as the Graphx in that paper? And how many partitions does the experiment have for uk-2007-05 dataset. I tried 16, 192partitions, and both are sucked.
原始邮件 发件人:Ted [email protected] 收件人:[email protected] 抄送:[email protected] 发送时间:2015年1月16日(周五) 02:23 主题:Re: Is spark suitable for large scale pagerank, such as 200 millionnodes, 2 billion edges? Have you seenhttp://search-hadoop.com/m/JW1q5pE3P12 ? Please also take a look at the end-to-end performance graph on http://spark.apache.org/graphx/ Cheers On Thu, Jan 15, 2015 at 9:29 AM, txw [email protected] wrote: Hi, I am run PageRank on a large dataset, which include 200 million nodes and 2 billion edges? Isspark suitable for large scale pagerank? How many cores and MEM do I need and how long will it take? Thanks Xuewei Tang
