Re: Having issues reading a csv file into a DataSet using Spark 2.1

Keith Chapman Wed, 22 Mar 2017 19:04:57 -0700

Thanks for the advice Diego, that was very helpful. How could I read the
csv as a dataset though? I need to do a map operation over the dataset, I
just coded up an example to illustrate the issue


On Mar 22, 2017 6:43 PM, "Diego Fanesi" <diego.fan...@gmail.com> wrote:

> You are using spark as a library but it is much more than that. The book
> "learning Spark"  is very well done and it helped me a lot starting with
> spark. Maybe you should start from there.
>
> Those are the issues in your code:
>
> Basically, you generally don't execute spark code like that. You could but
> it is not officially supported and many functions don't work in that way.
> You should start your local cluster made of master and single worker, then
> make a jar with your code and use spark-submit to send it to the cluster.
>
> You generally never use args because spark is a multiprocess, multi-thread
> application so args will not be available everywhere.
>
> All contexts have been merged into the same context in the last versions
> of spark. so you will need to do something like this:
>
> import org.apache.spark.sql.{DataFrame, SparkSession}
>
> object DatasetTest{
>
> val spark: SparkSession = SparkSession
>   .builder() .master("local[8]")
>   .appName("Spark basic example").getOrCreate()
>
> import spark.implicits._
>
> def main(Args: Array[String]) {
>
> var x = spark.read.format("csv").load("/home/user/data.csv")
>
> x.show()
>
> }
>
> }
>
>
> hope this helps.
>
> Diego
>
> On 22 Mar 2017 7:18 pm, "Keith Chapman" <keithgchap...@gmail.com> wrote:
>
> Hi,
>
> I'm trying to read in a CSV file into a Dataset but keep having
> compilation issues. I'm using spark 2.1 and the following is a small
> program that exhibit the issue I'm having. I've searched around but not
> found a solution that worked, I've added "import sqlContext.implicits._" as
> suggested but no luck. What am I missing? Would appreciate some advice.
>
> import org.apache.spark.sql.functions._
> import org.apache.spark.{SparkContext, SparkConf}
> import org.apache.spark.sql.{Encoder,Encoders}
>
> object DatasetTest{
>
>   def main(args: Array[String]) {
>     val sparkConf = new SparkConf().setAppName("DatasetTest")
>     val sc = new SparkContext(sparkConf)
>     case class Foo(text: String)
>     val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>     import sqlContext.implicits._
>     val ds : org.apache.spark.sql.Dataset[Foo] =
> sqlContext.read.csv(args(1)).as[Foo]
>     ds.show
>   }
> }
>
> Compiling the above program gives, I'd expect it to work as its a simple
> case class, changing it to as[String] works, but I would like to get the
> case class to work.
>
> [error] /home/keith/dataset/DataSetTest.scala:13: Unable to find encoder
> for type stored in a Dataset.  Primitive types (Int, String, etc) and
> Product types (case classes) are supported by importing spark.implicits._
> Support for serializing other types will be added in future releases.
> [error]     val ds : org.apache.spark.sql.Dataset[Foo] =
> sqlContext.read.csv(args(1)).as[Foo]
>
>
> Regards,
> Keith.
>
>
>

Re: Having issues reading a csv file into a DataSet using Spark 2.1

Reply via email to