Data aggregation -- help me design a solution

Oleg Dulin Tue, 21 Aug 2012 13:09:00 -0700

Here are my requirements.

We use Cassandra.

I get millions of invoice line items into the system. As I load them Ineed to build up some data structures.

* Invoice line items by invoice id (each line item has an invoice id onit ), with total dollar value

* Invoice line items by customer id , with total dollar value
* Invoice line items by territory, with total dollar value

In all of those cases, what we want is to see the total by a givenattribute, that's all there is to it.

Line items may change daily, i.e. a territory may change or they maycorrect the values. In this case I need to update the aggregationsaccordingly.


Here are my ideas:

- I can use counters and store the data in buckets
- I can just store the data in buckets and do the math in Java

In both cases the challenge is that the items can be updated. Whichmeans I need to look up a current version of an item and decide how toproceed. That puts a huge performance penalty on the application (# ofline items we receive is in the millions and we need to process them ina timely fashion).


Help me out here -- any ideas on how I could design this in Cassandra ?


Regards,
Oleg

Data aggregation -- help me design a solution

Reply via email to