How does Spark handle null values.
case class AvroSource(name: String, age: Integer, sal: Long, col_float:
Float, col_double: Double, col_bytes: String, col_bool: Boolean )
val userDS =
spark.read.format("com.databricks.spark.avro").option("nullValue",
"x").load("./users.avro")//.as[AvroSource]
userDS.printSchema()
userDS.show()
userDS.createOrReplaceTempView("user")
spark.sql("select * from user where xdouble is not null ").show()
[image: Inline images 2]
Adding Following lines to the code returns error which seems contradicting
to the schema which says nullable = true. how to handle null here?
val filteredDS = userDS.filter(_.age > 30)
filteredDS.show(10)
java.lang.RuntimeException: Null value appeared in non-nullable field:
- field (class: "scala.Double", name: "col_double")
- root class: "com.model.AvroSource"
If the schema is inferred from a Scala tuple/case class, or a Java bean,
please try to use scala.Option[_] or other nullable types (e.g.
java.lang.Integer instead of int/scala.Int).