Problem reading from Elasticsearch using Sparl SQL

Dmitriy Fingerman Thu, 26 Mar 2015 06:55:55 -0700

Hi,

I am trying to read from Elasticsearch using Spark SQL and getting the 
exception below.
My environment is CDH 5.3 with Spark 1.2.0 and Elasticsearch 1.4.4.
Since Spark SQL is not officially supported on CDH 5.3, I added the Hive 
Jars to Spark classpath in compute-classpath.sh.
I also added elasticsearch-hadoop-2.1.0.Beta3.jar to the Spark classpath in 
compute-classpath.sh.
Also, I tried adding the Hive, elasticsearch-hadoop and elasticseach-spark 
Jars to SPARK_CLASSPATH environment variable prior to running spark-submit, 
but got the same exception.


Exception in thread "main" java.lang.RuntimeException: Failed to load class 
for data source: org.elasticsearch.spark.sql
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.sources.CreateTableUsing.run(ddl.scala:99)
at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:67)
at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:67)
at org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:75)
at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
at org.apache.spark.sql.SchemaRDD.<init>(SchemaRDD.scala:108)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:303)
at com.informatica.sats.datamgtsrv.Percolator$.main(Percolator.scala:29)
at com.informatica.sats.datamgtsrv.Percolator.main(Percolator.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

The code with I am trying to run:

import org.apache.spark._
import org.apache.spark.sql._
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._
import org.elasticsearch.spark.sql._

object MyTest
{
  def main(args: Array[String]) 
  {
    val sparkConf = new SparkConf().setAppName("MyTest")
    val sc =  new SparkContext(sparkConf)
    val sqlContext = new SQLContext(sc)
    
    sqlContext.sql("CREATE TEMPORARY TABLE INTERVALS    " +
                   "USING org.elasticsearch.spark.sql " +
                   "OPTIONS (resource 'events/intervals') " )
            
    val allRDD = sqlContext.sql("SELECT * FROM INTERVALS")

    allRDD.foreach(rdd => {rdd.foreach(elem => print(elem + "\n\n"));})
  }
}

I checked in Spark source code ( resource 
org\apache\spark\sql\sources\ddl.scala ) and saw that the run method 
in CreateTableUsing class expects "DefaultSource.class" file for the data 
source that needs to be loaded.
However, there is no such class in org.elasticsearch.spark.sql package in 
the official Elasticsearch builds.
I checked in following jars:

elasticsearch-spark_2.10-2.1.0.Beta3.jar
elasticsearch-spark_2.10-2.1.0.Beta2.jar
elasticsearch-spark_2.10-2.1.0.Beta1.jar
elasticsearch-hadoop-2.1.0.Beta3.jar

Can you please advise why this problem happens and how to resolve it?

Thanks,
Dmitriy Fingerman

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/12e69eee-6fb1-401e-95f6-5c69341c1796%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Problem reading from Elasticsearch using Sparl SQL

Reply via email to