why do you want the array to have nullable = false? what is the benefit?

On Wed, Aug 3, 2016 at 10:45 AM, Kazuaki Ishizaki <ishiz...@jp.ibm.com>
wrote:

> Dear all,
> Would it be possible to let me know how to change nullable property in
> Dataset?
>
> When I looked for how to change nullable property in Dataframe schema, I
> found the following approaches.
> http://stackoverflow.com/questions/33193958/change-
> nullable-property-of-column-in-spark-dataframe
> https://github.com/apache/spark/pull/13873(Not merged yet)
>
> However, I cannot find how to change nullable property in Dataset schema.
> Even when I wrote the following program, nullable property for "value:
> array" in ds2.schema is not changed.
> If my understanding is correct, current Spark 2.0 uses an
> ExpressionEncoder that is generated based on Dataset[T] at
> https://github.com/apache/spark/blob/master/sql/
> catalyst/src/main/scala/org/apache/spark/sql/catalyst/
> encoders/ExpressionEncoder.scala#L46
>
> class Test extends QueryTest with SharedSQLContext {
>   import testImplicits._
>   test("test") {
>     val ds1 = sparkContext.parallelize(Seq(Array(1, 1), Array(2, 2),
> Array(3, 3)), 1).toDS
>     val schema = new StructType().add("array", ArrayType(IntegerType,
> false), false)
>     val inputObject = BoundReference(0, 
> ScalaReflection.dataTypeFor[Array[Int]],
> false)
>     val encoder = new ExpressionEncoder[Array[Int]](schema, true,
>       ScalaReflection.serializerFor[Array[Int]](inputObject).flatten,
>       ScalaReflection.deserializerFor[Array[Int]],
>       ClassTag[Array[Int]](classOf[Array[Int]]))
>     val ds2 = ds1.map(e => e)(encoder)
>     ds1.printSchema
>     ds2.printSchema
>   }
> }
>
> root
>  |-- value: array (nullable = true)
>  |    |-- element: integer (containsNull = false)
>
> root
>  |-- value: array (nullable = true)                         // Expected
> (nullable = false)
>  |    |-- element: integer (containsNull = false)
>
>
> Kazuaki Ishizaki
>

Reply via email to