I'll look into it - not sure yet what I can get out of exprs :p Le lun. 11 mai 2015 à 22:35, Reynold Xin <r...@databricks.com> a écrit :
> Thanks for catching this. I didn't read carefully enough. > > It'd make sense to have the udaf result be non-nullable, if the exprs are > indeed non-nullable. > > On Mon, May 11, 2015 at 1:32 PM, Olivier Girardot <ssab...@gmail.com> > wrote: > >> Hi Haopu, >> actually here `key` is nullable because this is your input's schema : >> >> scala> result.printSchema >> root >> |-- key: string (nullable = true) >> |-- SUM(value): long (nullable = true) >> >> scala> df.printSchema >> root >> |-- key: string (nullable = true) >> |-- value: long (nullable = false) >> >> I tried it with a schema where the key is not flagged as nullable, and >> the schema is actually respected. What you can argue however is that >> SUM(value) should also be not nullable since value is not nullable. >> >> @rxin do you think it would be reasonable to flag the Sum aggregation >> function as nullable (or not) depending on the input expression's schema ? >> >> Regards, >> >> Olivier. >> Le lun. 11 mai 2015 à 22:07, Reynold Xin <r...@databricks.com> a écrit : >> >>> Not by design. Would you be interested in submitting a pull request? >>> >>> On Mon, May 11, 2015 at 1:48 AM, Haopu Wang <hw...@qilinsoft.com> wrote: >>> >>>> I try to get the result schema of aggregate functions using DataFrame >>>> API. >>>> >>>> However, I find the result field of groupBy columns are always nullable >>>> even the source field is not nullable. >>>> >>>> I want to know if this is by design, thank you! Below is the simple code >>>> to show the issue. >>>> >>>> ====== >>>> >>>> import sqlContext.implicits._ >>>> import org.apache.spark.sql.functions._ >>>> case class Test(key: String, value: Long) >>>> val df = sc.makeRDD(Seq(Test("k1",2),Test("k1",1))).toDF >>>> >>>> val result = df.groupBy("key").agg($"key", sum("value")) >>>> >>>> // From the output, you can see the "key" column is nullable, why?? >>>> result.printSchema >>>> // root >>>> // |-- key: string (nullable = true) >>>> // |-- SUM(value): long (nullable = true) >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>> >