Hi Matthew, It looks fine to me. I have built a similar service that allows a user to submit a query from a browser and returns the result in JSON format.
Another alternative is to leave a Spark shell or one of the notebooks (Spark Notebook, Zeppelin, etc.) session open and run queries from there. This model works only if people give you the queries to execute. Mohammed From: Matthew Johnson [mailto:matt.john...@algomi.com] Sent: Friday, June 19, 2015 2:20 AM To: user@spark.apache.org Subject: Code review - Spark SQL command-line client for Cassandra Hi all, I have been struggling with Cassandra’s lack of adhoc query support (I know this is an anti-pattern of Cassandra, but sometimes management come over and ask me to run stuff and it’s impossible to explain that it will take me a while when it would take about 10 seconds in MySQL) so I have put together the following code snippet that bundles DataStax’s Cassandra Spark connector and allows you to submit Spark SQL to it, outputting the results in a text file. Does anyone spot any obvious flaws in this plan?? (I have a lot more error handling etc in my code, but removed it here for brevity) private void run(String sqlQuery) { SparkContext scc = new SparkContext(conf); CassandraSQLContext csql = new CassandraSQLContext(scc); DataFrame sql = csql.sql(sqlQuery); String folderName = "/tmp/output_" + System.currentTimeMillis(); LOG.info("Attempting to save SQL results in folder: " + folderName); sql.rdd().saveAsTextFile(folderName); LOG.info("SQL results saved"); } public static void main(String[] args) { String sparkMasterUrl = args[0]; String sparkHost = args[1]; String sqlQuery = args[2]; SparkConf conf = new SparkConf(); conf.setAppName("Java Spark SQL"); conf.setMaster(sparkMasterUrl); conf.set("spark.cassandra.connection.host", sparkHost); JavaSparkSQL app = new JavaSparkSQL(conf); app.run(sqlQuery, printToConsole); } I can then submit this to Spark with ‘spark-submit’: > ./spark-submit --class com.algomi.spark.JavaSparkSQL --master > spark://sales3:7077 > spark-on-cassandra-0.0.1-SNAPSHOT-jar-with-dependencies.jar > spark://sales3:7077 sales3 "select * from mykeyspace.operationlog" It seems to work pretty well, so I’m pretty happy, but wondering why this isn’t common practice (at least I haven’t been able to find much about it on Google) – is there something terrible that I’m missing? Thanks! Matthew