Afternoon. About 6 months ago I tried (and failed) to get Spark and Cassandra working together in production due to dependency hell.
I'm going to give it another try! Here's my general strategy. I'm going to create a maven module for my code... with spark dependencies. Then I'm going to get that to run and have unit tests for reading from files and writing the data back out the way I want via spark jobs. Then I'm going to setup cassandra unit to embed cassandra in my project. Then I'm going to point Spark to Cassandra and have the same above code work with Cassandra but instead of reading from a file it reads/writes to C*. Then once testing is working I'm going to setup spark in cluster mode with the same dependencies. Does this sound like a reasonable strategy? Kevin -- We’re hiring if you know of any awesome Java Devops or Linux Operations Engineers! Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts>