For posterity's sake, I solved this.  The problem was the Cloudera cluster
I was submitting to is running 1.0, and I was compiling against the latest
1.1 release.  Downgrading to 1.0 on my compile got me past this.

On Tue, Oct 14, 2014 at 6:08 PM, Michael Campbell <
michael.campb...@gmail.com> wrote:

> Hey all, I'm trying a very basic spark SQL job and apologies as I'm new to
> a lot of this, but I'm getting this failure:
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> org.apache.spark.sql.SchemaRDD.take(I)[Lorg/apache/spark/sql/catalyst/expressions/Row;
>
> I've tried a variety of uber-jar creation, but it always comes down to
> this.  Right now my jar has *NO* dependencies on other jars (other than
> Spark itself), and my "uber jar" contains essentially only the .class file
> of my own code.
>
> My own code simply reads a parquet file and does a count(*) on the
> contents. I'm sure this is very basic, but I'm at a loss.
>
> Thoughts and/or debugging tips welcome.
>
> Here's my code, which I call from the "main" object which passes in the
> context, Parquet file name, and some arbitrary sql, (which for this is just
> "select count(*) from flows").
>
> I have run the equivalent successfully in spark-shell in the cluster.
>
>   def readParquetFile(sparkCtx: SparkContext, dataFileName: String, sql:
> String) = {
>     val sqlContext = new SQLContext(sparkCtx)
>     val parquetFile = sqlContext.parquetFile(dataFileName)
>
>     parquetFile.registerAsTable("flows")
>
>     println(s"About to run $sql")
>
>     val start = System.nanoTime()
>     val countRDD = sqlContext.sql(sql)
>     val rows: Array[Row] = countRDD.take(1)  // DIES HERE
>     val stop = System.nanoTime()
>
>     println(s"result: ${rows(0)}")
>     println(s"Query took ${(stop - start) / 1e9} seconds.")
>     println(s"Query was $sql")
>
>   }
>
>
>
>

Reply via email to