Yes, this is a bug, do you mind to create a jira issue for this? I will fix this asap.
BTW, what’s your spark version? From: Yana Kadiyska [mailto:yana.kadiy...@gmail.com] Sent: Friday, July 10, 2015 12:16 AM To: ayan guha Cc: user Subject: Re: [SparkSQL] Incorrect ROLLUP results +---+---+---+ |cnt|_c1|grp| +---+---+---+ | 1| 31| 0| | 1| 31| 1| | 1| 4| 0| | 1| 4| 1| | 1| 42| 0| | 1| 42| 1| | 1| 15| 0| | 1| 15| 1| | 1| 26| 0| | 1| 26| 1| | 1| 37| 0| | 1| 10| 0| | 1| 37| 1| | 1| 10| 1| | 1| 48| 0| | 1| 21| 0| | 1| 48| 1| | 1| 21| 1| | 1| 32| 0| | 1| 32| 1| +---+---+---+ On Thu, Jul 9, 2015 at 11:54 AM, ayan guha <guha.a...@gmail.com<mailto:guha.a...@gmail.com>> wrote: Can you please post result of show()? On 10 Jul 2015 01:00, "Yana Kadiyska" <yana.kadiy...@gmail.com<mailto:yana.kadiy...@gmail.com>> wrote: Hi folks, I just re-wrote a query from using UNION ALL to use "with rollup" and I'm seeing some unexpected behavior. I'll open a JIRA if needed but wanted to check if this is user error. Here is my code: case class KeyValue(key: Int, value: String) val df = sc.parallelize(1 to 50).map(i=>KeyValue(i, i.toString)).toDF df.registerTempTable("foo") sqlContext.sql(“select count(*) as cnt, value as key,GROUPING__ID from foo group by value with rollup”).show(100) sqlContext.sql(“select count(*) as cnt, key % 100 as key,GROUPING__ID from foo group by key%100 with rollup”).show(100) Grouping by value does the right thing, I get one group 0 with the overall count. But grouping by expression (key%100) produces weird results -- appears that group 1 results are replicated as group 0. Am I doing something wrong or is this a bug?