Thanks for the advice Diego, that was very helpful. How could I read the csv as a dataset though? I need to do a map operation over the dataset, I just coded up an example to illustrate the issue
On Mar 22, 2017 6:43 PM, "Diego Fanesi" <diego.fan...@gmail.com> wrote: > You are using spark as a library but it is much more than that. The book > "learning Spark" is very well done and it helped me a lot starting with > spark. Maybe you should start from there. > > Those are the issues in your code: > > Basically, you generally don't execute spark code like that. You could but > it is not officially supported and many functions don't work in that way. > You should start your local cluster made of master and single worker, then > make a jar with your code and use spark-submit to send it to the cluster. > > You generally never use args because spark is a multiprocess, multi-thread > application so args will not be available everywhere. > > All contexts have been merged into the same context in the last versions > of spark. so you will need to do something like this: > > import org.apache.spark.sql.{DataFrame, SparkSession} > > object DatasetTest{ > > val spark: SparkSession = SparkSession > .builder() .master("local[8]") > .appName("Spark basic example").getOrCreate() > > import spark.implicits._ > > def main(Args: Array[String]) { > > var x = spark.read.format("csv").load("/home/user/data.csv") > > x.show() > > } > > } > > > hope this helps. > > Diego > > On 22 Mar 2017 7:18 pm, "Keith Chapman" <keithgchap...@gmail.com> wrote: > > Hi, > > I'm trying to read in a CSV file into a Dataset but keep having > compilation issues. I'm using spark 2.1 and the following is a small > program that exhibit the issue I'm having. I've searched around but not > found a solution that worked, I've added "import sqlContext.implicits._" as > suggested but no luck. What am I missing? Would appreciate some advice. > > import org.apache.spark.sql.functions._ > import org.apache.spark.{SparkContext, SparkConf} > import org.apache.spark.sql.{Encoder,Encoders} > > object DatasetTest{ > > def main(args: Array[String]) { > val sparkConf = new SparkConf().setAppName("DatasetTest") > val sc = new SparkContext(sparkConf) > case class Foo(text: String) > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > import sqlContext.implicits._ > val ds : org.apache.spark.sql.Dataset[Foo] = > sqlContext.read.csv(args(1)).as[Foo] > ds.show > } > } > > Compiling the above program gives, I'd expect it to work as its a simple > case class, changing it to as[String] works, but I would like to get the > case class to work. > > [error] /home/keith/dataset/DataSetTest.scala:13: Unable to find encoder > for type stored in a Dataset. Primitive types (Int, String, etc) and > Product types (case classes) are supported by importing spark.implicits._ > Support for serializing other types will be added in future releases. > [error] val ds : org.apache.spark.sql.Dataset[Foo] = > sqlContext.read.csv(args(1)).as[Foo] > > > Regards, > Keith. > > >