Thanks for the information Jacques.
Based on the above formula from Jacques, the hash agg operator should not
be using ~33MB memory when we have only 1 group ( and 7 varchar columns in
the group). As per Aman's suggestion, I tried using fixed width columns in
the group by and the memory usage
There was a presentation a year or so ago I presented at the MapR sales
kickoff that covers the memory characteristics of operators. Unfortunately,
I don't have access to the content but hopefully someone internal to MapR
should have it. (Maybe Ellen or Neeraja)
Approximately (from memory):
Rahul, can you send me the query profile separately ? Also, can you try
group-by on fixed-width columns instead of Varchar ?
With single group, the hash table itself should be consuming relatively
small amount of memory.
On Fri, May 27, 2016 at 11:14 AM, Zelaine Fong wrote:
My guess would be that for hashing, a hash table is pre-allocated based on
the number of keys in the hash. That would explain why with more keys, the
memory usage grows. But that's just my guess. Someone who really
understands how this works should chime in :).
-- Zelaine
On Fri, May 27, 2016
Any inputs on this one?
On Wed, May 25, 2016 at 7:51 PM, rahul challapalli <
challapallira...@gmail.com> wrote:
> Its using hash aggregation.
> On May 25, 2016 7:48 PM, "Zelaine Fong" wrote:
>
>> What does the explain plan show? I.e., is the group by being done via a
>>
Its using hash aggregation.
On May 25, 2016 7:48 PM, "Zelaine Fong" wrote:
> What does the explain plan show? I.e., is the group by being done via a
> hash agg or a streaming agg? If it's a streaming agg, then you still have
> to sort the entire data set before you reduce
Oops, my bad. I just noticed you did indicate that the query plan shows
usage of a hash agg.
-- Zelaine
On Wed, May 25, 2016 at 7:48 PM, Zelaine Fong wrote:
> What does the explain plan show? I.e., is the group by being done via a
> hash agg or a streaming agg? If it's a
What does the explain plan show? I.e., is the group by being done via a
hash agg or a streaming agg? If it's a streaming agg, then you still have
to sort the entire data set before you reduce it down to a single group.
That would explain the increase in memory as you add group by keys.
--