yes, please do and send me the link. @rxin I have trouble building master, but the code is done...
Le ven. 15 mai 2015 à 01:27, Haopu Wang <hw...@qilinsoft.com> a écrit : > Thank you, should I open a JIRA for this issue? > > > ------------------------------ > > *From:* Olivier Girardot [mailto:ssab...@gmail.com] > *Sent:* Tuesday, May 12, 2015 5:12 AM > *To:* Reynold Xin > *Cc:* Haopu Wang; user > *Subject:* Re: [SparkSQL 1.4.0] groupBy columns are always nullable? > > > > I'll look into it - not sure yet what I can get out of exprs :p > > > > Le lun. 11 mai 2015 à 22:35, Reynold Xin <r...@databricks.com> a écrit : > > Thanks for catching this. I didn't read carefully enough. > > > > It'd make sense to have the udaf result be non-nullable, if the exprs are > indeed non-nullable. > > > > On Mon, May 11, 2015 at 1:32 PM, Olivier Girardot <ssab...@gmail.com> > wrote: > > Hi Haopu, > actually here `key` is nullable because this is your input's schema : > > scala> result.printSchema > > root > |-- key: string (nullable = true) > |-- SUM(value): long (nullable = true) > > scala> df.printSchema > root > |-- key: string (nullable = true) > |-- value: long (nullable = false) > > > > I tried it with a schema where the key is not flagged as nullable, and the > schema is actually respected. What you can argue however is that SUM(value) > should also be not nullable since value is not nullable. > > > > @rxin do you think it would be reasonable to flag the Sum aggregation > function as nullable (or not) depending on the input expression's schema ? > > > > Regards, > > > > Olivier. > > Le lun. 11 mai 2015 à 22:07, Reynold Xin <r...@databricks.com> a écrit : > > Not by design. Would you be interested in submitting a pull request? > > > > On Mon, May 11, 2015 at 1:48 AM, Haopu Wang <hw...@qilinsoft.com> wrote: > > I try to get the result schema of aggregate functions using DataFrame > API. > > However, I find the result field of groupBy columns are always nullable > even the source field is not nullable. > > I want to know if this is by design, thank you! Below is the simple code > to show the issue. > > ====== > > import sqlContext.implicits._ > import org.apache.spark.sql.functions._ > case class Test(key: String, value: Long) > val df = sc.makeRDD(Seq(Test("k1",2),Test("k1",1))).toDF > > val result = df.groupBy("key").agg($"key", sum("value")) > > // From the output, you can see the "key" column is nullable, why?? > result.printSchema > // root > // |-- key: string (nullable = true) > // |-- SUM(value): long (nullable = true) > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > > >