why do you want the array to have nullable = false? what is the benefit? On Wed, Aug 3, 2016 at 10:45 AM, Kazuaki Ishizaki <ishiz...@jp.ibm.com> wrote:
> Dear all, > Would it be possible to let me know how to change nullable property in > Dataset? > > When I looked for how to change nullable property in Dataframe schema, I > found the following approaches. > http://stackoverflow.com/questions/33193958/change- > nullable-property-of-column-in-spark-dataframe > https://github.com/apache/spark/pull/13873(Not merged yet) > > However, I cannot find how to change nullable property in Dataset schema. > Even when I wrote the following program, nullable property for "value: > array" in ds2.schema is not changed. > If my understanding is correct, current Spark 2.0 uses an > ExpressionEncoder that is generated based on Dataset[T] at > https://github.com/apache/spark/blob/master/sql/ > catalyst/src/main/scala/org/apache/spark/sql/catalyst/ > encoders/ExpressionEncoder.scala#L46 > > class Test extends QueryTest with SharedSQLContext { > import testImplicits._ > test("test") { > val ds1 = sparkContext.parallelize(Seq(Array(1, 1), Array(2, 2), > Array(3, 3)), 1).toDS > val schema = new StructType().add("array", ArrayType(IntegerType, > false), false) > val inputObject = BoundReference(0, > ScalaReflection.dataTypeFor[Array[Int]], > false) > val encoder = new ExpressionEncoder[Array[Int]](schema, true, > ScalaReflection.serializerFor[Array[Int]](inputObject).flatten, > ScalaReflection.deserializerFor[Array[Int]], > ClassTag[Array[Int]](classOf[Array[Int]])) > val ds2 = ds1.map(e => e)(encoder) > ds1.printSchema > ds2.printSchema > } > } > > root > |-- value: array (nullable = true) > | |-- element: integer (containsNull = false) > > root > |-- value: array (nullable = true) // Expected > (nullable = false) > | |-- element: integer (containsNull = false) > > > Kazuaki Ishizaki >