Submission to cluster fails (Spark SQL; NoSuchMethodError on SchemaRDD)

Michael Campbell Tue, 14 Oct 2014 15:10:08 -0700

Hey all, I'm trying a very basic spark SQL job and apologies as I'm new to
a lot of this, but I'm getting this failure:


Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.spark.sql.SchemaRDD.take(I)[Lorg/apache/spark/sql/catalyst/expressions/Row;

I've tried a variety of uber-jar creation, but it always comes down to
this.  Right now my jar has *NO* dependencies on other jars (other than
Spark itself), and my "uber jar" contains essentially only the .class file
of my own code.

My own code simply reads a parquet file and does a count(*) on the
contents. I'm sure this is very basic, but I'm at a loss.

Thoughts and/or debugging tips welcome.

Here's my code, which I call from the "main" object which passes in the
context, Parquet file name, and some arbitrary sql, (which for this is just
"select count(*) from flows").

I have run the equivalent successfully in spark-shell in the cluster.

  def readParquetFile(sparkCtx: SparkContext, dataFileName: String, sql:
String) = {
    val sqlContext = new SQLContext(sparkCtx)
    val parquetFile = sqlContext.parquetFile(dataFileName)

    parquetFile.registerAsTable("flows")

    println(s"About to run $sql")

    val start = System.nanoTime()
    val countRDD = sqlContext.sql(sql)
    val rows: Array[Row] = countRDD.take(1)  // DIES HERE
    val stop = System.nanoTime()

    println(s"result: ${rows(0)}")
    println(s"Query took ${(stop - start) / 1e9} seconds.")
    println(s"Query was $sql")

  }

Submission to cluster fails (Spark SQL; NoSuchMethodError on SchemaRDD)

Reply via email to