For posterity's sake, I solved this. The problem was the Cloudera cluster I was submitting to is running 1.0, and I was compiling against the latest 1.1 release. Downgrading to 1.0 on my compile got me past this.
On Tue, Oct 14, 2014 at 6:08 PM, Michael Campbell < michael.campb...@gmail.com> wrote: > Hey all, I'm trying a very basic spark SQL job and apologies as I'm new to > a lot of this, but I'm getting this failure: > > Exception in thread "main" java.lang.NoSuchMethodError: > org.apache.spark.sql.SchemaRDD.take(I)[Lorg/apache/spark/sql/catalyst/expressions/Row; > > I've tried a variety of uber-jar creation, but it always comes down to > this. Right now my jar has *NO* dependencies on other jars (other than > Spark itself), and my "uber jar" contains essentially only the .class file > of my own code. > > My own code simply reads a parquet file and does a count(*) on the > contents. I'm sure this is very basic, but I'm at a loss. > > Thoughts and/or debugging tips welcome. > > Here's my code, which I call from the "main" object which passes in the > context, Parquet file name, and some arbitrary sql, (which for this is just > "select count(*) from flows"). > > I have run the equivalent successfully in spark-shell in the cluster. > > def readParquetFile(sparkCtx: SparkContext, dataFileName: String, sql: > String) = { > val sqlContext = new SQLContext(sparkCtx) > val parquetFile = sqlContext.parquetFile(dataFileName) > > parquetFile.registerAsTable("flows") > > println(s"About to run $sql") > > val start = System.nanoTime() > val countRDD = sqlContext.sql(sql) > val rows: Array[Row] = countRDD.take(1) // DIES HERE > val stop = System.nanoTime() > > println(s"result: ${rows(0)}") > println(s"Query took ${(stop - start) / 1e9} seconds.") > println(s"Query was $sql") > > } > > > >