Hi Matthew, you could add the dependencies yourself by using the %dep command in zeppelin ( https://zeppelin.incubator.apache.org/docs/interpreter/spark.html). I have not tried with zeppelin but have used spark-notebook <https://github.com/andypetrella/spark-notebook> and got Cassandra connector working. Below have provided samples.
*In Zeppelin: (Not Tested)* %dep z.load("com.datastax.com:spark-cassandra-connector_2.11:1.4.0-M1") Note: In order for Spark and Cassandra to work the Spark , Spark-Cassandra-Connector, Spark-notebook spark version should match. In the above case it was 1.2.0 *If using spark-notebook: (Tested & works)* Installed : 1. Apache Spark 1.2.0 2. Cassandra DSE - 1 node (just Cassandra and no analytics) 3. Notebook: wget https://s3.eu-central-1.amazonaws.com/spark-notebook/tgz/spark-notebook-0.4.3-scala-2.10.4-spark-1.2.0-hadoop-2.4.0.tgz Once notebook have been started : http://ec2-xx-x-xx-xxx.us-west-x.compute.amazonaws.com:9000/#clusters Select Standalone: In SparkConf : update the spark master ip to EC2 : internal DNS name. In Spark Notebook: :dp "com.datastax.spark" % "spark-cassandra-connector_2.10" % "1.2.0-rc3" import com.datastax.spark.connector._ import com.datastax.spark.connector.rdd.CassandraRDD val cassandraHost:String = "localhost" reset(lastChanges = _.set("spark.cassandra.connection.host", cassandraHost)) val rdd = sparkContext.cassandraTable("excelsior","test") rdd.toArray.foreach(println) Note: In order for Spark and Cassandra to work the Spark , Spark-Cassandra-Connector, Spark-notebook spark version should match. In the above case it was 1.2.0 On Mon, Jun 22, 2015 at 9:52 AM, Matthew Johnson <matt.john...@algomi.com> wrote: > Hi Pawan, > > > > Looking at the changes for that git pull request, it looks like it just > pulls in the dependency (and transitives) for “spark-cassandra-connector”. > Since I am having to build Zeppelin myself anyway, would it be ok to just > add this myself for the connector for 1.4.0 (as found here > http://search.maven.org/#artifactdetails%7Ccom.datastax.spark%7Cspark-cassandra-connector_2.11%7C1.4.0-M1%7Cjar)? > What exactly is it that does not currently exist for Spark 1.4? > > > > Thanks, > > Matthew > > > > *From:* pawan kumar [mailto:pkv...@gmail.com] > *Sent:* 22 June 2015 17:19 > *To:* Silvio Fiorito > *Cc:* Mohammed Guller; Matthew Johnson; shahid ashraf; > user@spark.apache.org > *Subject:* Re: Code review - Spark SQL command-line client for Cassandra > > > > Hi, > > > > Zeppelin has a cassandra-spark-connector built into the build. I have not > tried it yet may be you could let us know. > > > > https://github.com/apache/incubator-zeppelin/pull/79 > > > > To build a Zeppelin version with the *Datastax Spark/Cassandra connector > <https://github.com/datastax/spark-cassandra-connector>* > > mvn clean package *-Pcassandra-spark-1.x* -Dhadoop.version=xxx > -Phadoop-x.x -DskipTests > > Right now the Spark/Cassandra connector is available for *Spark 1.1* and > *Spark > 1.2*. Support for *Spark 1.3* is not released yet (*but you can build you > own Spark/Cassandra connector version **1.3.0-SNAPSHOT*). Support for *Spark > 1.4* does not exist yet > > Please do not forget to add -Dspark.cassandra.connection.host=xxx to the > *ZEPPELIN_JAVA_OPTS*parameter in *conf/zeppelin-env.sh* file. > Alternatively you can add this parameter in the parameter list of the *Spark > interpreter* on the GUI > > > > -Pawan > > > > > > > > > > > > On Mon, Jun 22, 2015 at 9:04 AM, Silvio Fiorito < > silvio.fior...@granturing.com> wrote: > > Yes, just put the Cassandra connector on the Spark classpath and set the > connector config properties in the interpreter settings. > > > > *From: *Mohammed Guller > *Date: *Monday, June 22, 2015 at 11:56 AM > *To: *Matthew Johnson, shahid ashraf > > > *Cc: *"user@spark.apache.org" > *Subject: *RE: Code review - Spark SQL command-line client for Cassandra > > > > I haven’t tried using Zeppelin with Spark on Cassandra, so can’t say for > sure, but it should not be difficult. > > > > Mohammed > > > > *From:* Matthew Johnson [mailto:matt.john...@algomi.com > <matt.john...@algomi.com>] > *Sent:* Monday, June 22, 2015 2:15 AM > *To:* Mohammed Guller; shahid ashraf > *Cc:* user@spark.apache.org > *Subject:* RE: Code review - Spark SQL command-line client for Cassandra > > > > Thanks Mohammed, it’s good to know I’m not alone! > > > > How easy is it to integrate Zeppelin with Spark on Cassandra? It looks > like it would only support Hadoop out of the box. Is it just a case of > dropping the Cassandra Connector onto the Spark classpath? > > > > Cheers, > > Matthew > > > > *From:* Mohammed Guller [mailto:moham...@glassbeam.com] > *Sent:* 20 June 2015 17:27 > *To:* shahid ashraf > *Cc:* Matthew Johnson; user@spark.apache.org > *Subject:* RE: Code review - Spark SQL command-line client for Cassandra > > > > It is a simple Play-based web application. It exposes an URI for > submitting a SQL query. It then executes that query using > CassandraSQLContext provided by Spark Cassandra Connector. Since it is > web-based, I added an authentication and authorization layer to make sure > that only users with the right authorization can use it. > > > > I am happy to open-source that code if there is interest. Just need to > carve out some time to clean it up and remove all the other services that > this web application provides. > > > > Mohammed > > > > *From:* shahid ashraf [mailto:sha...@trialx.com <sha...@trialx.com>] > *Sent:* Saturday, June 20, 2015 6:52 AM > *To:* Mohammed Guller > *Cc:* Matthew Johnson; user@spark.apache.org > *Subject:* RE: Code review - Spark SQL command-line client for Cassandra > > > > Hi Mohammad > Can you provide more info about the Service u developed > > On Jun 20, 2015 7:59 AM, "Mohammed Guller" <moham...@glassbeam.com> wrote: > > Hi Matthew, > > It looks fine to me. I have built a similar service that allows a user to > submit a query from a browser and returns the result in JSON format. > > > > Another alternative is to leave a Spark shell or one of the notebooks > (Spark Notebook, Zeppelin, etc.) session open and run queries from there. > This model works only if people give you the queries to execute. > > > > Mohammed > > > > *From:* Matthew Johnson [mailto:matt.john...@algomi.com] > *Sent:* Friday, June 19, 2015 2:20 AM > *To:* user@spark.apache.org > *Subject:* Code review - Spark SQL command-line client for Cassandra > > > > Hi all, > > > > I have been struggling with Cassandra’s lack of adhoc query support (I > know this is an anti-pattern of Cassandra, but sometimes management come > over and ask me to run stuff and it’s impossible to explain that it will > take me a while when it would take about 10 seconds in MySQL) so I have put > together the following code snippet that bundles DataStax’s Cassandra Spark > connector and allows you to submit Spark SQL to it, outputting the results > in a text file. > > > > Does anyone spot any obvious flaws in this plan?? (I have a lot more error > handling etc in my code, but removed it here for brevity) > > > > *privatevoid* run(String sqlQuery) { > > SparkContext scc = *new* SparkContext(conf); > > CassandraSQLContext csql = *new* CassandraSQLContext(scc); > > DataFrame sql = csql.sql(sqlQuery); > > String folderName = "/tmp/output_" + System.*currentTimeMillis*(); > > *LOG*.info("Attempting to save SQL results in folder: " + > folderName); > > sql.rdd().saveAsTextFile(folderName); > > *LOG*.info("SQL results saved"); > > } > > > > *publicstaticvoid* main(String[] args) { > > > > String sparkMasterUrl = args[0]; > > String sparkHost = args[1]; > > String sqlQuery = args[2]; > > > > SparkConf conf = *new* SparkConf(); > > conf.setAppName("Java Spark SQL"); > > conf.setMaster(sparkMasterUrl); > > conf.set("spark.cassandra.connection.host", sparkHost); > > > > JavaSparkSQL app = *new* JavaSparkSQL(conf); > > > > app.run(sqlQuery, printToConsole); > > } > > > > I can then submit this to Spark with ‘spark-submit’: > > > > Ø *./spark-submit --class com.algomi.spark.JavaSparkSQL --master > spark://sales3:7077 > spark-on-cassandra-0.0.1-SNAPSHOT-jar-with-dependencies.jar > spark://sales3:7077 sales3 "select * from mykeyspace.operationlog" * > > > > It seems to work pretty well, so I’m pretty happy, but wondering why this > isn’t common practice (at least I haven’t been able to find much about it > on Google) – is there something terrible that I’m missing? > > > > Thanks! > > Matthew > > > > > > >