Hello,
I have this weird error coming up that i am really at loss to explain.
I have defined a custom RDD ( reading from a 3rd party store, a really
simple one). In a same code, same test running locally (0.8.0) it
occasionally receives a compute() call with a HadoopSplit split (instead of
its native split). Naturally, it fails on the cast call.
I don't see any fundamental difference between my code and HadoopRDD or
JdbcRDD.
I managed to reproduce this with the stack frame below with local task
scheduler. But i am really at loss to explain what's going on here.
- "pool-2-thread-3"@8,430 in group "main": RUNNING
- org$hbl$whale$plan$spark$HblRDD$$toHblPartition():41, HblRDD$
{org.hbl.whale.plan.spark}
- compute():19, HblRDD {org.hbl.whale.plan.spark}
- computeOrReadCheckpoint():237, RDD {org.apache.spark.rdd}
- getOrCompute():70, CacheManager {org.apache.spark}
- iterator():224, RDD {org.apache.spark.rdd}
- compute():32, FilteredRDD {org.apache.spark.rdd}
- computeOrReadCheckpoint():237, RDD {org.apache.spark.rdd}
- iterator():226, RDD {org.apache.spark.rdd}
- compute():29, MappedRDD {org.apache.spark.rdd}
- computeOrReadCheckpoint():237, RDD {org.apache.spark.rdd}
- iterator():226, RDD {org.apache.spark.rdd}
- compute():36, MapPartitionsRDD {org.apache.spark.rdd}
- computeOrReadCheckpoint():237, RDD {org.apache.spark.rdd}
- iterator():226, RDD {org.apache.spark.rdd}
- run():149, ShuffleMapTask {org.apache.spark.scheduler}
- run():88, ShuffleMapTask {org.apache.spark.scheduler}
- runTask():198, LocalScheduler {org.apache.spark.scheduler.local}
- run():68, LocalActor$$anonfun$launchTask$1$$anon$1
{org.apache.spark.scheduler.local}
- call():471, Executors$RunnableAdapter {java.util.concurrent}
- run():262, FutureTask {java.util.concurrent}
- runWorker():1145, ThreadPoolExecutor {java.util.concurrent}
- run():615, ThreadPoolExecutor$Worker {java.util.concurrent}
- run():744, Thread {java.lang}
Thank you in advance for your help.
-Dmitriy