Hi,

I've been playing with a custom COUNT(DISTINCT) aggregate based on hash
tables - it seems to be working as planned, except for memory
consumption which is much higher than I expected.

The code is available at https://github.com/tvondra/count_distinct and
in short it works like this:

1) There's a simple hash table for each group (so with HashAggregate
   it's effectively a hash table of hash tables). The hash table
   contains only distinct values for that particular group, so the
   result is simply equal to number of items in the table.

2) The table starts very small (4 buckets), and when it grows too large
   (more 20 items in a bucket on average) it gets resized on the fly to
   4x the size. So the buckets grows 4 -> 16 -> 64 -> 256 -> ... (and
   the max number of items in the table grows 80 -> 320 -> 1280 ...).

Let's assume I have a table like this

   CREATE TABLE test_table (a int, b int);

   INSERT INTO test_table SELECT mod(i,4000), (100000*random())::int
   FROM generate_series(1,80000000) s(i);

And let's run a query like this:

   SELECT a, count_distinct(b) FROM test_table GROUP a;

I.e. there are ~4k groups with ~20k distinct values for each group. This
query however consumes >5GB of memory, which is much more than I
expected, and I've spent a fair amount of time looking for memory leaks
in the code so I'm wondering if I'm missing something important.

The custom hash tables are built from these very simple structures:

    typedef struct hash_element_t {
        uint32  hash;
        uint32  length;
        char   *value; /* the actual value (say 4B for int4) */
    } hash_element_t;

    typedef struct hash_bucket_t {
        uint32  nitems;
        hash_element_t * items; /* array of hash_element_t items */
    } hash_bucket_t;

    typedef struct hash_table_t {
        uint16  nbits;    /* number of significant bits of the hash */
        uint32  nbuckets; /* number of buckets (2^nbits) */
        uint32  nitems;   /* current number of elements in the table */
        hash_bucket_t *  buckets; /* array of hash_bucket_t */
    } hash_table_t;

I'm on 64-bit architecture and the example works with int32, which means
the sizes should be about this:

    hash_element_t => 20B
    hash_bucket_t  => 4B + (20B * items in the bucket [in steps of 5])
    hash_table_t   => 4B + space for buckets

In the example above, there's ~20k unique values in each group. The
threshold is 20 items per bucket on average, so that's 1024 buckets, and
the buckets are almost full.

So for single group, the hash table size is about

   4B + 1024 * (4B + 20 * 20B) = 413700B = ~ 400 kB

There are 4000 groups, so the total estimate is ~1.6GB in total.

However when executed (9.2, 9.3 and HEAD behave exactly the same), the
query consumes almost ~5GB of RAM (excluding shared buffers).

This is how the memory context stats: http://pastebin.com/JHjnehvC

The interesting part is

ExecutorState: 24576 total in 2 blocks; 16368 free (6 chunks); 8208 used
  ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  AggContext: 5106224640 total in 4612 blocks; 565779368 free
              (1072086 chunks); 4540445272 used
    TupleHashTable: 253952 total in 5 blocks; 55360 free
                    (16 chunks); 198592 used
  ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used

So there's ~5GB in the AggContext, ~500MB of that is free.

I'm aware that there will be some more memory allocated by the
HashAggregate itself, but although I'm not sure how much that should be,
I would expect a "small amount" compared to the 1.6GB.

So I'm wondering what uses those 3.5GB? Is this the overhead of
HashAggregate, or am I missing something (e.g. a memory leak in the code)?


Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to