Re: [SparkSQL 1.4.0] groupBy columns are always nullable?

Reynold Xin Mon, 11 May 2015 13:08:08 -0700

Not by design. Would you be interested in submitting a pull request?

On Mon, May 11, 2015 at 1:48 AM, Haopu Wang <[email protected]> wrote:


> I try to get the result schema of aggregate functions using DataFrame
> API.
>
> However, I find the result field of groupBy columns are always nullable
> even the source field is not nullable.
>
> I want to know if this is by design, thank you! Below is the simple code
> to show the issue.
>
> ======
>
>   import sqlContext.implicits._
>   import org.apache.spark.sql.functions._
>   case class Test(key: String, value: Long)
>   val df = sc.makeRDD(Seq(Test("k1",2),Test("k1",1))).toDF
>
>   val result = df.groupBy("key").agg($"key", sum("value"))
>
>   // From the output, you can see the "key" column is nullable, why??
>   result.printSchema
> //    root
> //     |-- key: string (nullable = true)
> //     |-- SUM(value): long (nullable = true)
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [SparkSQL 1.4.0] groupBy columns are always nullable?

Reply via email to