RE: Cassandra via SparkSQL/Hive JDBC

Mohammed Guller Thu, 12 Nov 2015 11:36:18 -0800

No worries. Happy to help.

I don’t think the 1.5 version of the Spark Cassandra connector has been 
officially released yet. In any case 1.5.0-M1 has been replaced by 1.5.0-M2. 
Moreover, this version is meant for Spark 1.5.x.


Since you are using Spark 1.4, why not use the v1.4 of the SCC? Did you try 
with the 1.4 version of the SCC?

packages com.datastax.spark:spark-cassandra-connector_2.10:1.4.0

Mohammed

From: Bryan Jeffrey [mailto:bryan.jeff...@gmail.com]
Sent: Thursday, November 12, 2015 11:20 AM
To: Mohammed Guller
Cc: user
Subject: Re: Cassandra via SparkSQL/Hive JDBC

I hesitate to ask further questions, but your assistance is advancing my work 
much faster than extensive fiddling might.  I am seeing the following error 
when querying:

0: jdbc:hive2://localhost:10000> create temporary table cassandraeventcounts 
using org.apache.spark.sql.cassandra OPTIONS ( keyspace "c2", table 
"eventcounts" );
Error: java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.spark.sql.cassandra.DataTypeConverter$ (state=,code=0)

I started the Thrift server as follows:

root@sparkdev1:~# /spark/spark-1.4.1/sbin/start-thriftserver.sh --master 
spark://10.0.0.4:7077<http://10.0.0.4:7077> --packages 
com.datastax.spark:spark-cassandra-connector_2.11:1.5.0-M1 --hiveconf 
"spark.cores.max=2" --hiveconf "spark.executor.memory=2g"

Do I perhaps need to include an additional library to do the default conversion?

Regards,

Bryan Jeffrey

On Thu, Nov 12, 2015 at 1:57 PM, Mohammed Guller 
<moham...@glassbeam.com<mailto:moham...@glassbeam.com>> wrote:
Hi Bryan,

Yes, you can query a real Cassandra cluster. You just need to provide the 
address of the Cassandra seed node.

Looks like you figured out the answer. You can also put the C* seed node 
address in the spark-defaults.conf file under the SPARK_HOME/conf directory. 
Then you don’t need to manually SET it for each Beeline session.

Mohammed

From: Bryan Jeffrey 
[mailto:bryan.jeff...@gmail.com<mailto:bryan.jeff...@gmail.com>]
Sent: Thursday, November 12, 2015 10:26 AM

To: Mohammed Guller
Cc: user
Subject: Re: Cassandra via SparkSQL/Hive JDBC

Answer: In beeline run the following: SET 
spark.cassandra.connection.host="10.0.0.10"

On Thu, Nov 12, 2015 at 1:13 PM, Bryan Jeffrey 
<bryan.jeff...@gmail.com<mailto:bryan.jeff...@gmail.com>> wrote:
Mohammed,

While you're willing to answer questions, is there a trick to getting the Hive 
Thrift server to connect to remote Cassandra instances?

0: jdbc:hive2://localhost:10000> SET 
spark.cassandra.connection.host="cassandrahost";
SET spark.cassandra.connection.host="cassandrahost";
+-----------------------------------------------------------+
|                                                           |
+-----------------------------------------------------------+
| spark.cassandra.connection.host="cassandrahost"  |
+-----------------------------------------------------------+
1 row selected (0.018 seconds)
0: jdbc:hive2://localhost:10000> create temporary table cdr using 
org.apache.spark.sql.cassandra OPTIONS ( keyspace "c2", table "detectionresult" 
);
create temporary table cdr using org.apache.spark.sql.cassandra OPTIONS ( 
keyspace "c2", table "detectionresult" );
]Error: java.io.IOException: Failed to open native connection to Cassandra at 
{10.0.0.4}:9042 (state=,code=0)

This seems to be connecting to local host regardless of the value I set 
spark.cassandra.connection.host to.

Regards,

Bryan Jeffrey

On Thu, Nov 12, 2015 at 12:54 PM, Bryan Jeffrey 
<bryan.jeff...@gmail.com<mailto:bryan.jeff...@gmail.com>> wrote:
Yes, I do - I found your example of doing that later in your slides.  Thank you 
for your help!

On Thu, Nov 12, 2015 at 12:20 PM, Mohammed Guller 
<moham...@glassbeam.com<mailto:moham...@glassbeam.com>> wrote:
Did you mean Hive or Spark SQL JDBC/ODBC server?

Mohammed

From: Bryan Jeffrey 
[mailto:bryan.jeff...@gmail.com<mailto:bryan.jeff...@gmail.com>]
Sent: Thursday, November 12, 2015 9:12 AM
To: Mohammed Guller
Cc: user
Subject: Re: Cassandra via SparkSQL/Hive JDBC

Mohammed,

That is great.  It looks like a perfect scenario. Would I be able to make the 
created DF queryable over the Hive JDBC/ODBC server?

Regards,

Bryan Jeffrey

On Wed, Nov 11, 2015 at 9:34 PM, Mohammed Guller 
<moham...@glassbeam.com<mailto:moham...@glassbeam.com>> wrote:
Short answer: yes.

The Spark Cassandra Connector supports the data source API. So you can create a 
DataFrame that points directly to a Cassandra table. You can query it using the 
DataFrame API or the SQL/HiveQL interface.

If you want to see an example,  see slide# 27 and 28 in this deck that I 
presented at the Cassandra Summit 2015:
http://www.slideshare.net/mg007/ad-hoc-analytics-with-cassandra-and-spark


Mohammed

From: Bryan [mailto:bryan.jeff...@gmail.com<mailto:bryan.jeff...@gmail.com>]
Sent: Tuesday, November 10, 2015 7:42 PM
To: Bryan Jeffrey; user
Subject: RE: Cassandra via SparkSQL/Hive JDBC

Anyone have thoughts or a similar use-case for SparkSQL / Cassandra?

Regards,

Bryan Jeffrey
________________________________
From: Bryan Jeffrey<mailto:bryan.jeff...@gmail.com>
Sent: ‎11/‎4/‎2015 11:16 AM
To: user<mailto:user@spark.apache.org>
Subject: Cassandra via SparkSQL/Hive JDBC
Hello.

I have been working to add SparkSQL HDFS support to our application.  We're 
able to process streaming data, append to a persistent Hive table, and have 
that table available via JDBC/ODBC.  Now we're looking to access data in 
Cassandra via SparkSQL.

In reading a number of previous posts, it appears that the way to do this is to 
instantiate a Spark Context, read the data into an RDD using the Cassandra 
Spark Connector, convert the data to a DF and register it as a temporary table. 
 The data will then be accessible via SparkSQL - although I assume that you 
would need to refresh the table on a periodic basis.

Is there a more straightforward way to do this?  Is it possible to register the 
Cassandra table with Hive so that the SparkSQL thrift server instance can just 
read data directly?

Regards,

Bryan Jeffrey

RE: Cassandra via SparkSQL/Hive JDBC

Reply via email to