Yes, I have just found that. By replacing,
rdd.map(_._1).distinct().foreach {
case (game, category) => persist(game, category, minTime,
maxTime, rdd)
}
with,
rdd.map(_._1).distinct().collect().foreach {
case (game, category) => persist(game, category, minTime,
maxTime
It looks like you are trying to use the RDD in a distributed operation,
which won't work. The context will be null.
On Jan 21, 2015 1:50 PM, "Luis Ángel Vicente Sánchez" <
langel.gro...@gmail.com> wrote:
> The SparkContext is lost when I call the persist function from the sink
> function, just bef
The SparkContext is lost when I call the persist function from the sink
function, just before the function call... everything works as intended so
I guess is the FunctionN class serialisation what it's causing the problem.
I will try to embed the functionality in the sink method to verify that.
20
The following functions,
def sink(data: DStream[((GameID, Category), ((TimeSeriesKey, Platform),
HLL))]): Unit = {
data.foreachRDD { rdd =>
rdd.cache()
val (minTime, maxTime): (Long, Long) =
rdd.map {
case (_, ((TimeSeriesKey(_, time), _), _)) => (time, time)