Re: NullPointer when access rdd.sparkContext (Spark 1.1.1)

2015-01-21 Thread Luis Ángel Vicente Sánchez
Yes, I have just found that. By replacing, rdd.map(_._1).distinct().foreach { case (game, category) => persist(game, category, minTime, maxTime, rdd) } with, rdd.map(_._1).distinct().collect().foreach { case (game, category) => persist(game, category, minTime, maxTime

Re: NullPointer when access rdd.sparkContext (Spark 1.1.1)

2015-01-21 Thread Sean Owen
It looks like you are trying to use the RDD in a distributed operation, which won't work. The context will be null. On Jan 21, 2015 1:50 PM, "Luis Ángel Vicente Sánchez" < langel.gro...@gmail.com> wrote: > The SparkContext is lost when I call the persist function from the sink > function, just bef

Re: NullPointer when access rdd.sparkContext (Spark 1.1.1)

2015-01-21 Thread Luis Ángel Vicente Sánchez
The SparkContext is lost when I call the persist function from the sink function, just before the function call... everything works as intended so I guess is the FunctionN class serialisation what it's causing the problem. I will try to embed the functionality in the sink method to verify that. 20

NullPointer when access rdd.sparkContext (Spark 1.1.1)

2015-01-21 Thread Luis Ángel Vicente Sánchez
The following functions, def sink(data: DStream[((GameID, Category), ((TimeSeriesKey, Platform), HLL))]): Unit = { data.foreachRDD { rdd => rdd.cache() val (minTime, maxTime): (Long, Long) = rdd.map { case (_, ((TimeSeriesKey(_, time), _), _)) => (time, time)