I try to get the result schema of aggregate functions using DataFrame API. However, I find the result field of groupBy columns are always nullable even the source field is not nullable.
I want to know if this is by design, thank you! Below is the simple code to show the issue. ====== import sqlContext.implicits._ import org.apache.spark.sql.functions._ case class Test(key: String, value: Long) val df = sc.makeRDD(Seq(Test("k1",2),Test("k1",1))).toDF val result = df.groupBy("key").agg($"key", sum("value")) // From the output, you can see the "key" column is nullable, why?? result.printSchema // root // |-- key: string (nullable = true) // |-- SUM(value): long (nullable = true) --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org