Hi, I am trying to read from Elasticsearch using Spark SQL and getting the exception below. My environment is CDH 5.3 with Spark 1.2.0 and Elasticsearch 1.4.4. Since Spark SQL is not officially supported on CDH 5.3, I added the Hive Jars to Spark classpath in compute-classpath.sh. I also added elasticsearch-hadoop-2.1.0.Beta3.jar to the Spark classpath in compute-classpath.sh. Also, I tried adding the Hive, elasticsearch-hadoop and elasticseach-spark Jars to SPARK_CLASSPATH environment variable prior to running spark-submit, but got the same exception.
Exception in thread "main" java.lang.RuntimeException: Failed to load class for data source: org.elasticsearch.spark.sql at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.sources.CreateTableUsing.run(ddl.scala:99) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:67) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:67) at org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:75) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425) at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58) at org.apache.spark.sql.SchemaRDD.<init>(SchemaRDD.scala:108) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:303) at com.informatica.sats.datamgtsrv.Percolator$.main(Percolator.scala:29) at com.informatica.sats.datamgtsrv.Percolator.main(Percolator.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) The code with I am trying to run: import org.apache.spark._ import org.apache.spark.sql._ import org.apache.spark.SparkContext._ import org.elasticsearch.spark._ import org.elasticsearch.spark.sql._ object MyTest { def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName("MyTest") val sc = new SparkContext(sparkConf) val sqlContext = new SQLContext(sc) sqlContext.sql("CREATE TEMPORARY TABLE INTERVALS " + "USING org.elasticsearch.spark.sql " + "OPTIONS (resource 'events/intervals') " ) val allRDD = sqlContext.sql("SELECT * FROM INTERVALS") allRDD.foreach(rdd => {rdd.foreach(elem => print(elem + "\n\n"));}) } } I checked in Spark source code ( resource org\apache\spark\sql\sources\ddl.scala ) and saw that the run method in CreateTableUsing class expects "DefaultSource.class" file for the data source that needs to be loaded. However, there is no such class in org.elasticsearch.spark.sql package in the official Elasticsearch builds. I checked in following jars: elasticsearch-spark_2.10-2.1.0.Beta3.jar elasticsearch-spark_2.10-2.1.0.Beta2.jar elasticsearch-spark_2.10-2.1.0.Beta1.jar elasticsearch-hadoop-2.1.0.Beta3.jar Can you please advise why this problem happens and how to resolve it? Thanks, Dmitriy Fingerman -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/12e69eee-6fb1-401e-95f6-5c69341c1796%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.