Code review - Spark SQL command-line client for Cassandra

Matthew Johnson Fri, 19 Jun 2015 02:21:34 -0700

Hi all,



I have been struggling with Cassandra’s lack of adhoc query support (I know
this is an anti-pattern of Cassandra, but sometimes management come over
and ask me to run stuff and it’s impossible to explain that it will take me
a while when it would take about 10 seconds in MySQL) so I have put
together the following code snippet that bundles DataStax’s Cassandra Spark
connector and allows you to submit Spark SQL to it, outputting the results
in a text file.



Does anyone spot any obvious flaws in this plan?? (I have a lot more error
handling etc in my code, but removed it here for brevity)



    *private* *void* run(String sqlQuery) {

        SparkContext scc = *new* SparkContext(conf);

        CassandraSQLContext csql = *new* CassandraSQLContext(scc);

        DataFrame sql = csql.sql(sqlQuery);

        String folderName = "/tmp/output_" + System.*currentTimeMillis*();

        *LOG*.info("Attempting to save SQL results in folder: " +
folderName);

        sql.rdd().saveAsTextFile(folderName);

        *LOG*.info("SQL results saved");

    }



    *public* *static* *void* main(String[] args) {



        String sparkMasterUrl = args[0];

        String sparkHost = args[1];

        String sqlQuery = args[2];



        SparkConf conf = *new* SparkConf();

        conf.setAppName("Java Spark SQL");

        conf.setMaster(sparkMasterUrl);

        conf.set("spark.cassandra.connection.host", sparkHost);



        JavaSparkSQL app = *new* JavaSparkSQL(conf);



        app.run(sqlQuery, printToConsole);

    }



I can then submit this to Spark with ‘spark-submit’:



Ø  *./spark-submit --class com.algomi.spark.JavaSparkSQL --master
spark://sales3:7077
spark-on-cassandra-0.0.1-SNAPSHOT-jar-with-dependencies.jar
spark://sales3:7077 sales3 "select * from mykeyspace.operationlog" *



It seems to work pretty well, so I’m pretty happy, but wondering why this
isn’t common practice (at least I haven’t been able to find much about it
on Google) – is there something terrible that I’m missing?



Thanks!

Matthew

Code review - Spark SQL command-line client for Cassandra

Reply via email to