I found out the problem. Grouping by a constant column value is indeed
The reason it was "working" in my project is that I gave the constant
column an alias that exists in the schema of the dataframe. The dataframe
contained a "data_timestamp" representing an hour, and I added to the
select a constant "data_timestamp" that represented the timestamp of the
day. And that was the cause for my original bug - I thought I was grouping
by the day timestamp, when I was actually grouping by each hour, and
therefore I got multiple rows for each of the group by combinations.

On Wed, Jul 15, 2015 at 10:09 AM, Lior Chaga <lio...@taboola.com> wrote:

> Hi,
> Facing a bug with group by in SparkSQL (version 1.4).
> Registered a JavaRDD with object containing integer fields as a table.
> Then I'm trying to do a group by, with a constant value in the group by
> fields:
> SELECT primary_one, primary_two, 10 as num, SUM(measure) as total_measures
> FROM tbl
> GROUP BY primary_one, primary_two, num
> I get the following exception:
> org.apache.spark.sql.AnalysisException: cannot resolve 'num' given input
> columns measure, primary_one, primary_two
> Tried both with HiveContext and SqlContext.
> The odd thing is that this kind of query actually works for me in a
> project I'm working on, but I have there another bug (the group by does not
> yield expected results).
> The only reason I can think of is that maybe in my real project, the
> context configuration is different.
> In my above example the configuration of the HiveContext is empty.
> In my real project, the configuration is shown below.
> Any ideas?
> Thanks,
> Lior
> Hive context configuration in project:
