Handling null in dataset

A Shaikh Wed, 11 Jan 2017 09:12:37 -0800

How does Spark handle null values.

case class AvroSource(name: String, age: Integer, sal: Long, col_float:
Float, col_double: Double, col_bytes: String, col_bool: Boolean )



    val userDS =
spark.read.format("com.databricks.spark.avro").option("nullValue",
"x").load("./users.avro")//.as[AvroSource]
    userDS.printSchema()
    userDS.show()
    userDS.createOrReplaceTempView("user")
    spark.sql("select * from user where xdouble is not null ").show()



[image: Inline images 2]


Adding Following lines to the code returns error which seems contradicting
to the schema which says nullable = true. how to handle null here?

    val filteredDS = userDS.filter(_.age > 30)
    filteredDS.show(10)

java.lang.RuntimeException: Null value appeared in non-nullable field:
- field (class: "scala.Double", name: "col_double")
- root class: "com.model.AvroSource"
If the schema is inferred from a Scala tuple/case class, or a Java bean,
please try to use scala.Option[_] or other nullable types (e.g.
java.lang.Integer instead of int/scala.Int).

Handling null in dataset

Reply via email to