Sean,
Thanks for point this out. I’d have to experiment with the mapPartitions
method, but you’re right, this seems to address this issue directly. I’m also
connecting to Zookeeper to retrieve SparkConf parameters. I run into the same
issue with my Zookeeper driver, however, this is before a
The problem is not using the drivers per se, but writing your
functions in a way that you are trying to serialize them. You can't
serialize them, and indeed don't want to. Instead your code needs to
reopen connections and so forth when the function is instantiated on
the remote worker.
static var
Hello,
I’m using Spark streaming to aggregate data from a Kafka topic in sliding
windows. Usually we want to persist this aggregated data to a MongoDB cluster,
or republish to a different Kafka topic. When I include these 3rd party
drivers, I usually get a NotSerializableException due to the