1. Assuming that the majorirty of the line items are new and 2. The lookup of an existing line-item will dictate the performance of the system because reads are slower than writes in C*.
3. Assuming that you are using counters in C* Therefore eliminate that problem by implementing a bloom filter or similar structure (stable bloom filter) to figure out whether you actually need to go to C* at all FOR READING of existing line item. IF YOU NEED TO GO TO C* FOR READS, handle that event (act of getting an line-item that has already existed) in a seperate set of threads; DECRing the chosen counters for the previous value of the invoice line-tems HTH Regards Milind On Tue, Aug 21, 2012 at 1:08 PM, Oleg Dulin <oleg.du...@gmail.com> wrote: > Here are my requirements. > > We use Cassandra. > > I get millions of invoice line items into the system. As I load them I > need to build up some data structures. > > * Invoice line items by invoice id (each line item has an invoice id on it > ), with total dollar value > * Invoice line items by customer id , with total dollar value > * Invoice line items by territory, with total dollar value > > In all of those cases, what we want is to see the total by a given > attribute, that's all there is to it. > > Line items may change daily, i.e. a territory may change or they may > correct the values. In this case I need to update the aggregations > accordingly. > > Here are my ideas: > > - I can use counters and store the data in buckets > - I can just store the data in buckets and do the math in Java > > In both cases the challenge is that the items can be updated. Which means > I need to look up a current version of an item and decide how to proceed. > That puts a huge performance penalty on the application (# of line items we > receive is in the millions and we need to process them in a timely fashion). > > Help me out here -- any ideas on how I could design this in Cassandra ? > > > Regards, > Oleg > > >