Here are my requirements.
We use Cassandra.
I get millions of invoice line items into the system. As I load them I
need to build up some data structures.
* Invoice line items by invoice id (each line item has an invoice id on
it ), with total dollar value
* Invoice line items by customer id , with total dollar value
* Invoice line items by territory, with total dollar value
In all of those cases, what we want is to see the total by a given
attribute, that's all there is to it.
Line items may change daily, i.e. a territory may change or they may
correct the values. In this case I need to update the aggregations
accordingly.
Here are my ideas:
- I can use counters and store the data in buckets
- I can just store the data in buckets and do the math in Java
In both cases the challenge is that the items can be updated. Which
means I need to look up a current version of an item and decide how to
proceed. That puts a huge performance penalty on the application (# of
line items we receive is in the millions and we need to process them in
a timely fashion).
Help me out here -- any ideas on how I could design this in Cassandra ?
Regards,
Oleg