Re: Serialized 3rd party libs

2014-09-02 Thread Matt Narrell
Sean, Thanks for point this out. I’d have to experiment with the mapPartitions method, but you’re right, this seems to address this issue directly. I’m also connecting to Zookeeper to retrieve SparkConf parameters. I run into the same issue with my Zookeeper driver, however, this is before a

Re: Serialized 3rd party libs

2014-09-02 Thread Sean Owen
The problem is not using the drivers per se, but writing your functions in a way that you are trying to serialize them. You can't serialize them, and indeed don't want to. Instead your code needs to reopen connections and so forth when the function is instantiated on the remote worker. static var

Serialized 3rd party libs

2014-09-02 Thread Matt Narrell
Hello, I’m using Spark streaming to aggregate data from a Kafka topic in sliding windows. Usually we want to persist this aggregated data to a MongoDB cluster, or republish to a different Kafka topic. When I include these 3rd party drivers, I usually get a NotSerializableException due to the