Re: Pregel runs slower and slower when each Pregel has data dependency

2015-06-05 Thread dash
Hi Heather, Please check this issue https://issues.apache.org/jira/browse/SPARK-4672. I think you can solve this problem by checkpointing your data every several iterations. Hope that helps. Best regards, Baoxu(Dash) Shi Computer Science and Engineering Department University of Notre Dame

Re: New combination-like RDD based on two RDDs

2015-02-04 Thread dash
Problem solved. A simple join will do the work val prefix = new PairRDDFunctions[Int, Set[Int]](sc.parallelize(List((9, Set(4)), (1,Set(3)), (2,Set(5)), (2,Set(4) val suffix = sc.parallelize(List((1, Set(1)), (2, Set(6)), (2, Set(5)), (2, Set(7 prefix.join(suffix).collect().foreach(print

New combination-like RDD based on two RDDs

2015-02-04 Thread dash
Hey Spark gurus! Sorry for the confusing title. I do not know the exactly description of my problem, if you know please tell me so I can change it :-) Say I have two RDDs right now, and they are val rdd1 = sc.parallelize(List((1,(3)), (2,(5)), (3,(6 val rdd2 = sc.parallelize(List((2,(1)), (2,

Maximum jobs finish very soon, some of them take longer time.

2014-07-27 Thread Sarthak Dash
Hello everyone, I am trying out Spark for the first time, and after a month of work - I am stuck with an issue. I have a very simple program that, given a directed Graph with nodes/edges parameters and a particular node, tries to figure out all the siblings(in the traditional sense) of the given n

Re: Worker can not find custom KryoRegistrator

2014-07-02 Thread Baoxu Shi(Dash)
Don’t know why the setting does not appear in the last mail: .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .set("spark.kryo.registrator", new HDTMKryoRegistrator().getClass.getName) On Jul 2, 2014, at 1:03 PM, dash wrote

Worker can not find custom KryoRegistrator

2014-07-02 Thread dash
Hi, I'm using Spark 1.1.0 standalone with 5 workers and 1 driver, and Kryo settings are When I submit this job, the driver works fine but workers will throw ClassNotFoundException saying they can not found HDTMKryoRegistrator. Any idea about this problem? I googled this but there is only one p

Re: Question about VD and ED

2014-07-01 Thread Baoxu Shi(Dash)
Hi Bin, VD and ED are ClassTags, you could treat them as placeholder, or template T in C (not 100% clear). You do not need convert graph[String, Double] to Graph[VD,ED]. Check ClassTag’s definition in Scala could help. Best, On Jul 1, 2014, at 4:49 AM, Bin WU wrote: > Hi all, > > I am a ne

Re: Alternative to checkpointing and materialization for truncating lineage in high iteration jobs

2014-06-28 Thread Baoxu Shi(Dash)
I’m facing the same situation. It would be great if someone could provide a code snippet as example. On Jun 28, 2014, at 12:36 PM, Nilesh Chakraborty wrote: > Hello, > > In a thread about "java.lang.StackOverflowError when calling count()" [1] I > saw Tathagata Das share an interesting approac

Re: LiveListenerBus throws exception and weird web UI bug

2014-06-26 Thread Baoxu Shi(Dash)
Hi Pei-Lun, I have the same problem there. The Issue is SPARK-2228, there also someone posted a pull request on that, but he only eliminate this exception but not the side effects. I think the problem may due to the hard-coded private val EVENT_QUEUE_CAPACITY = 1 in core/src/main/scala/

Can not checkpoint Graph object's vertices but could checkpoint edges

2014-06-20 Thread dash
I'm trying to workaround the StackOverflowError when an object have a long dependency chain, someone said I should use checkpoint to cuts off dependencies. I write a sample code to test it, but I can only checkpoint edges but not vertices. I think I do materialize vertices and edges after calling c

Re: Best practices for removing lineage of a RDD or Graph object?

2014-06-18 Thread dash
= false) currentGraph.edges.unpersist(blocking = false) currentGraph = g println(" iter "+i+" finished") } } Baoxu Shi(Dash) Computer Science and Engineering Department University of Notre Dame b...@nd.edu > On Jun 19, 2014, at 1:47 AM, roy20021 [via Apache Spark U

Best practices for removing lineage of a RDD or Graph object?

2014-06-17 Thread dash
If a RDD object have non-empty .dependencies, does that means it have lineage? How could I remove it? I'm doing iterative computing and each iteration depends on the result computed in previous iteration. After several iteration, it will throw StackOverflowError. At first I'm trying to use cache,