Adding to conversation... there are 3 great open source options available
1. Calliope http://tuplejump.github.io/calliope/ This is the first library that was out some time late last year (as i can recall) and I have been using this for a while, mostly very stable, uses Hadoop i/o in Cassandra (note that it doesn't require hadoop) 2. Datastax spark cassandra connector https://github.com/datastax/spark-cassandra-connector: Main difference is this uses cql3, again a great library but has few issues, also is very actively developed by far and still uses thrift for minor stuff but all heavy lifting in cql3 3. Startio Deep https://github.com/Stratio/stratio-deep: Has lot more to offer if you use all startio stack, Deep is for Spark, Statio Streaming is built on top of spark streaming, Stratio meta is something similar to sharkor sparksql and finally stratio Cassandra which is a fork of Cassandra with advanced Lucene based indexing