Afternoon.

About 6 months ago I tried (and failed) to get Spark and Cassandra working
together in production due to dependency hell.

I'm going to give it another try!

Here's my general strategy.

I'm going to create a maven module for my code... with spark dependencies.

Then I'm going to get that to run and have unit tests for reading from
files and writing the data back out the way I want via spark jobs.

Then I'm going to setup cassandra unit to embed cassandra in my project.
Then I'm going to point Spark to Cassandra and have the same above code
work with Cassandra but instead of reading from a file it reads/writes to
C*.

Then once testing is working I'm going to setup spark in cluster mode with
the same dependencies.

Does this sound like a reasonable strategy?

Kevin

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>

Reply via email to