Re: Having issues reading a csv file into a DataSet using Spark 2.1

Diego Fanesi Wed, 22 Mar 2017 19:07:59 -0700

You are using spark as a library but it is much more than that. The book
"learning Spark"  is very well done and it helped me a lot starting with
spark. Maybe you should start from there.


Those are the issues in your code:

Basically, you generally don't execute spark code like that. You could but
it is not officially supported and many functions don't work in that way.
You should start your local cluster made of master and single worker, then
make a jar with your code and use spark-submit to send it to the cluster.

You generally never use args because spark is a multiprocess, multi-thread
application so args will not be available everywhere.

All contexts have been merged into the same context in the last versions of
spark. so you will need to do something like this:

import org.apache.spark.sql.{DataFrame, SparkSession}

object DatasetTest{

val spark: SparkSession = SparkSession
  .builder() .master("local[8]")
  .appName("Spark basic example").getOrCreate()

import spark.implicits._

def main(Args: Array[String]) {

var x = spark.read.format("csv").load("/home/user/data.csv")

x.show()

}

}


hope this helps.

Diego

On 22 Mar 2017 7:18 pm, "Keith Chapman" <keithgchap...@gmail.com> wrote:

Hi,

I'm trying to read in a CSV file into a Dataset but keep having compilation
issues. I'm using spark 2.1 and the following is a small program that
exhibit the issue I'm having. I've searched around but not found a solution
that worked, I've added "import sqlContext.implicits._" as suggested but no
luck. What am I missing? Would appreciate some advice.

import org.apache.spark.sql.functions._
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.sql.{Encoder,Encoders}

object DatasetTest{

  def main(args: Array[String]) {
    val sparkConf = new SparkConf().setAppName("DatasetTest")
    val sc = new SparkContext(sparkConf)
    case class Foo(text: String)
    val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    import sqlContext.implicits._
    val ds : org.apache.spark.sql.Dataset[Foo] =
sqlContext.read.csv(args(1)).as[Foo]
    ds.show
  }
}

Compiling the above program gives, I'd expect it to work as its a simple
case class, changing it to as[String] works, but I would like to get the
case class to work.

[error] /home/keith/dataset/DataSetTest.scala:13: Unable to find encoder
for type stored in a Dataset.  Primitive types (Int, String, etc) and
Product types (case classes) are supported by importing spark.implicits._
Support for serializing other types will be added in future releases.
[error]     val ds : org.apache.spark.sql.Dataset[Foo] =
sqlContext.read.csv(args(1)).as[Foo]


Regards,
Keith.

Re: Having issues reading a csv file into a DataSet using Spark 2.1

Reply via email to