Re: Code review - Spark SQL command-line client for Cassandra

pawan kumar Mon, 22 Jun 2015 10:54:07 -0700

Hi Matthew,

you could add the dependencies yourself by using the %dep command in
zeppelin ( https://zeppelin.incubator.apache.org/docs/interpreter/spark.html).
I have not tried with zeppelin but have used spark-notebook
<https://github.com/andypetrella/spark-notebook> and got Cassandra
connector working. Below have provided samples.


*In Zeppelin: (Not Tested)*

%dep z.load("com.datastax.com:spark-cassandra-connector_2.11:1.4.0-M1")


Note: In order for Spark and Cassandra to work the Spark ,
Spark-Cassandra-Connector, Spark-notebook spark version should match. In
the above case it was 1.2.0

*If using spark-notebook: (Tested & works)*

Installed :

   1. Apache Spark 1.2.0
   2. Cassandra DSE - 1 node (just Cassandra and no analytics)
   3. Notebook:

wget
https://s3.eu-central-1.amazonaws.com/spark-notebook/tgz/spark-notebook-0.4.3-scala-2.10.4-spark-1.2.0-hadoop-2.4.0.tgz



Once notebook have been started :

http://ec2-xx-x-xx-xxx.us-west-x.compute.amazonaws.com:9000/#clusters



Select Standalone:

In SparkConf : update the spark master ip to EC2 : internal DNS name.



In Spark Notebook:

:dp "com.datastax.spark" % "spark-cassandra-connector_2.10" % "1.2.0-rc3"



import com.datastax.spark.connector._

import com.datastax.spark.connector.rdd.CassandraRDD



val cassandraHost:String = "localhost"

reset(lastChanges = _.set("spark.cassandra.connection.host", cassandraHost))

val rdd = sparkContext.cassandraTable("excelsior","test")

rdd.toArray.foreach(println)



Note: In order for Spark and Cassandra to work the Spark ,
Spark-Cassandra-Connector, Spark-notebook spark version should match. In
the above case it was 1.2.0






On Mon, Jun 22, 2015 at 9:52 AM, Matthew Johnson <matt.john...@algomi.com>
wrote:

> Hi Pawan,
>
>
>
> Looking at the changes for that git pull request, it looks like it just
> pulls in the dependency (and transitives) for “spark-cassandra-connector”.
> Since I am having to build Zeppelin myself anyway, would it be ok to just
> add this myself for the connector for 1.4.0 (as found here
> http://search.maven.org/#artifactdetails%7Ccom.datastax.spark%7Cspark-cassandra-connector_2.11%7C1.4.0-M1%7Cjar)?
> What exactly is it that does not currently exist for Spark 1.4?
>
>
>
> Thanks,
>
> Matthew
>
>
>
> *From:* pawan kumar [mailto:pkv...@gmail.com]
> *Sent:* 22 June 2015 17:19
> *To:* Silvio Fiorito
> *Cc:* Mohammed Guller; Matthew Johnson; shahid ashraf;
> user@spark.apache.org
> *Subject:* Re: Code review - Spark SQL command-line client for Cassandra
>
>
>
> Hi,
>
>
>
> Zeppelin has a cassandra-spark-connector built into the build. I have not
> tried it yet may be you could let us know.
>
>
>
> https://github.com/apache/incubator-zeppelin/pull/79
>
>
>
> To build a Zeppelin version with the *Datastax Spark/Cassandra connector
> <https://github.com/datastax/spark-cassandra-connector>*
>
> mvn clean package *-Pcassandra-spark-1.x* -Dhadoop.version=xxx
> -Phadoop-x.x -DskipTests
>
> Right now the Spark/Cassandra connector is available for *Spark 1.1* and 
> *Spark
> 1.2*. Support for *Spark 1.3* is not released yet (*but you can build you
> own Spark/Cassandra connector version **1.3.0-SNAPSHOT*). Support for *Spark
> 1.4* does not exist yet
>
> Please do not forget to add -Dspark.cassandra.connection.host=xxx to the
> *ZEPPELIN_JAVA_OPTS*parameter in *conf/zeppelin-env.sh* file.
> Alternatively you can add this parameter in the parameter list of the *Spark
> interpreter* on the GUI
>
>
>
> -Pawan
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Jun 22, 2015 at 9:04 AM, Silvio Fiorito <
> silvio.fior...@granturing.com> wrote:
>
> Yes, just put the Cassandra connector on the Spark classpath and set the
> connector config properties in the interpreter settings.
>
>
>
> *From: *Mohammed Guller
> *Date: *Monday, June 22, 2015 at 11:56 AM
> *To: *Matthew Johnson, shahid ashraf
>
>
> *Cc: *"user@spark.apache.org"
> *Subject: *RE: Code review - Spark SQL command-line client for Cassandra
>
>
>
> I haven’t tried using Zeppelin with Spark on Cassandra, so can’t say for
> sure, but it should not be difficult.
>
>
>
> Mohammed
>
>
>
> *From:* Matthew Johnson [mailto:matt.john...@algomi.com
> <matt.john...@algomi.com>]
> *Sent:* Monday, June 22, 2015 2:15 AM
> *To:* Mohammed Guller; shahid ashraf
> *Cc:* user@spark.apache.org
> *Subject:* RE: Code review - Spark SQL command-line client for Cassandra
>
>
>
> Thanks Mohammed, it’s good to know I’m not alone!
>
>
>
> How easy is it to integrate Zeppelin with Spark on Cassandra? It looks
> like it would only support Hadoop out of the box. Is it just a case of
> dropping the Cassandra Connector onto the Spark classpath?
>
>
>
> Cheers,
>
> Matthew
>
>
>
> *From:* Mohammed Guller [mailto:moham...@glassbeam.com]
> *Sent:* 20 June 2015 17:27
> *To:* shahid ashraf
> *Cc:* Matthew Johnson; user@spark.apache.org
> *Subject:* RE: Code review - Spark SQL command-line client for Cassandra
>
>
>
> It is a simple Play-based web application. It exposes an URI for
> submitting a SQL query. It then executes that query using
> CassandraSQLContext provided by Spark Cassandra Connector. Since it is
> web-based, I added an authentication and authorization layer to make sure
> that only users with the right authorization can use it.
>
>
>
> I am happy to open-source that code if there is interest. Just need to
> carve out some time to clean it up and remove all the other services that
> this web application provides.
>
>
>
> Mohammed
>
>
>
> *From:* shahid ashraf [mailto:sha...@trialx.com <sha...@trialx.com>]
> *Sent:* Saturday, June 20, 2015 6:52 AM
> *To:* Mohammed Guller
> *Cc:* Matthew Johnson; user@spark.apache.org
> *Subject:* RE: Code review - Spark SQL command-line client for Cassandra
>
>
>
> Hi Mohammad
> Can you provide more info about the Service u developed
>
> On Jun 20, 2015 7:59 AM, "Mohammed Guller" <moham...@glassbeam.com> wrote:
>
> Hi Matthew,
>
> It looks fine to me. I have built a similar service that allows a user to
> submit a query from a browser and returns the result in JSON format.
>
>
>
> Another alternative is to leave a Spark shell or one of the notebooks
> (Spark Notebook, Zeppelin, etc.) session open and run queries from there.
> This model works only if people give you the queries to execute.
>
>
>
> Mohammed
>
>
>
> *From:* Matthew Johnson [mailto:matt.john...@algomi.com]
> *Sent:* Friday, June 19, 2015 2:20 AM
> *To:* user@spark.apache.org
> *Subject:* Code review - Spark SQL command-line client for Cassandra
>
>
>
> Hi all,
>
>
>
> I have been struggling with Cassandra’s lack of adhoc query support (I
> know this is an anti-pattern of Cassandra, but sometimes management come
> over and ask me to run stuff and it’s impossible to explain that it will
> take me a while when it would take about 10 seconds in MySQL) so I have put
> together the following code snippet that bundles DataStax’s Cassandra Spark
> connector and allows you to submit Spark SQL to it, outputting the results
> in a text file.
>
>
>
> Does anyone spot any obvious flaws in this plan?? (I have a lot more error
> handling etc in my code, but removed it here for brevity)
>
>
>
>     *privatevoid* run(String sqlQuery) {
>
>         SparkContext scc = *new* SparkContext(conf);
>
>         CassandraSQLContext csql = *new* CassandraSQLContext(scc);
>
>         DataFrame sql = csql.sql(sqlQuery);
>
>         String folderName = "/tmp/output_" + System.*currentTimeMillis*();
>
>         *LOG*.info("Attempting to save SQL results in folder: " +
> folderName);
>
>         sql.rdd().saveAsTextFile(folderName);
>
>         *LOG*.info("SQL results saved");
>
>     }
>
>
>
>     *publicstaticvoid* main(String[] args) {
>
>
>
>         String sparkMasterUrl = args[0];
>
>         String sparkHost = args[1];
>
>         String sqlQuery = args[2];
>
>
>
>         SparkConf conf = *new* SparkConf();
>
>         conf.setAppName("Java Spark SQL");
>
>         conf.setMaster(sparkMasterUrl);
>
>         conf.set("spark.cassandra.connection.host", sparkHost);
>
>
>
>         JavaSparkSQL app = *new* JavaSparkSQL(conf);
>
>
>
>         app.run(sqlQuery, printToConsole);
>
>     }
>
>
>
> I can then submit this to Spark with ‘spark-submit’:
>
>
>
> Ø  *./spark-submit --class com.algomi.spark.JavaSparkSQL --master
> spark://sales3:7077
> spark-on-cassandra-0.0.1-SNAPSHOT-jar-with-dependencies.jar
> spark://sales3:7077 sales3 "select * from mykeyspace.operationlog" *
>
>
>
> It seems to work pretty well, so I’m pretty happy, but wondering why this
> isn’t common practice (at least I haven’t been able to find much about it
> on Google) – is there something terrible that I’m missing?
>
>
>
> Thanks!
>
> Matthew
>
>
>
>
>
>
>

Re: Code review - Spark SQL command-line client for Cassandra

Reply via email to