Not by design. Would you be interested in submitting a pull request? On Mon, May 11, 2015 at 1:48 AM, Haopu Wang <hw...@qilinsoft.com> wrote:
> I try to get the result schema of aggregate functions using DataFrame > API. > > However, I find the result field of groupBy columns are always nullable > even the source field is not nullable. > > I want to know if this is by design, thank you! Below is the simple code > to show the issue. > > ====== > > import sqlContext.implicits._ > import org.apache.spark.sql.functions._ > case class Test(key: String, value: Long) > val df = sc.makeRDD(Seq(Test("k1",2),Test("k1",1))).toDF > > val result = df.groupBy("key").agg($"key", sum("value")) > > // From the output, you can see the "key" column is nullable, why?? > result.printSchema > // root > // |-- key: string (nullable = true) > // |-- SUM(value): long (nullable = true) > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >