Hey all, I'm trying a very basic spark SQL job and apologies as I'm new to a lot of this, but I'm getting this failure:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.SchemaRDD.take(I)[Lorg/apache/spark/sql/catalyst/expressions/Row; I've tried a variety of uber-jar creation, but it always comes down to this. Right now my jar has *NO* dependencies on other jars (other than Spark itself), and my "uber jar" contains essentially only the .class file of my own code. My own code simply reads a parquet file and does a count(*) on the contents. I'm sure this is very basic, but I'm at a loss. Thoughts and/or debugging tips welcome. Here's my code, which I call from the "main" object which passes in the context, Parquet file name, and some arbitrary sql, (which for this is just "select count(*) from flows"). I have run the equivalent successfully in spark-shell in the cluster. def readParquetFile(sparkCtx: SparkContext, dataFileName: String, sql: String) = { val sqlContext = new SQLContext(sparkCtx) val parquetFile = sqlContext.parquetFile(dataFileName) parquetFile.registerAsTable("flows") println(s"About to run $sql") val start = System.nanoTime() val countRDD = sqlContext.sql(sql) val rows: Array[Row] = countRDD.take(1) // DIES HERE val stop = System.nanoTime() println(s"result: ${rows(0)}") println(s"Query took ${(stop - start) / 1e9} seconds.") println(s"Query was $sql") }