Our system works with RDDs generated from Hadoop files. It processes each
record in a Hadoop file and for a subset of those records generates output
that is written to an external system via RDD.foreach. There are no
dependencies between the records that are processed.
If writing to the external
Hi Art,
I have some advice that isn't spark-specific at all, so it doesn't
*exactly* address your questions, but you might still find helpful. I
think using an implicit to add your retyring behavior might be useful. I
can think of two options:
1. enriching RDD itself, eg. to add a
whoops! just realized I was retyring the function even on success. didn't
pay enough attention to the output from my calls. Slightly updated
definitions:
class RetryFunction[-A](nTries: Int,f: A = Unit) extends Function[A,Unit]
{
def apply(a: A): Unit = {
var tries = 0
var success =