Use-Case :

Every hour 100K invoices (entities) are created which are pushed to Storm.
These invoices belong to 'n' users.

So Storm does group aggregation on users, and creates aggregated buckets
per user. Along with sum of invoices, bucket also contains invoices IDs, so
that we know what all invoices got aggregated.
These buckets are getting updated continuously.

Problem : We need to consume these buckets on 2 rules :
                a) bucket has aggregated 1000 invoices, or
                b) bucket created 1 hour ago.

In both cases we need to consume this bucket and do some action on what
invoices are aggregated so far. Before we can consume we need to make sure
that storm should stop updating that bucket and create/update another
bucket for the same customer if new invoices comes.


Is there a way to achieve this in Storm ?


Thanks,
Rajat

Reply via email to