aha ok, thanks. If I create different RDDs from a parent RDD and force evaluation thread-by-thread, then it should presumably be fine, correct? Or do I need to checkpoint the child RDDs as a precaution in case it needs to be removed from memory and recomputed?
On Sat, Feb 28, 2015 at 4:28 AM, Shixiong Zhu <zsxw...@gmail.com> wrote: > RDD is not thread-safe. You should not use it in multiple threads. > > Best Regards, > Shixiong Zhu > > 2015-02-27 23:14 GMT+08:00 rok <rokros...@gmail.com>: > >> I'm seeing this java.util.NoSuchElementException: key not found: exception >> pop up sometimes when I run operations on an RDD from multiple threads in >> a >> python application. It ends up shutting down the SparkContext so I'm >> assuming this is a bug -- from what I understand, I should be able to run >> operations on the same RDD from multiple threads or is this not >> recommended? >> >> I can't reproduce it all the time and I've tried eliminating caching >> wherever possible to see if that would have an effect, but it doesn't seem >> to. Each thread first splits the base RDD and then runs the >> LogisticRegressionWithSGD on the subset. >> >> Is there a workaround to this exception? >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/java-util-NoSuchElementException-key-not-found-tp21848.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >