Here are my requirements.

We use Cassandra.

I get millions of invoice line items into the system. As I load them I need to build up some data structures.

* Invoice line items by invoice id (each line item has an invoice id on it ), with total dollar value
* Invoice line items by customer id , with total dollar value
* Invoice line items by territory, with total dollar value

In all of those cases, what we want is to see the total by a given attribute, that's all there is to it.

Line items may change daily, i.e. a territory may change or they may correct the values. In this case I need to update the aggregations accordingly.

Here are my ideas:

- I can use counters and store the data in buckets
- I can just store the data in buckets and do the math in Java

In both cases the challenge is that the items can be updated. Which means I need to look up a current version of an item and decide how to proceed. That puts a huge performance penalty on the application (# of line items we receive is in the millions and we need to process them in a timely fashion).

Help me out here -- any ideas on how I could design this in Cassandra ?


Regards,
Oleg


Reply via email to