On Tue, May 11, 2010 at 1:52 PM, Paulo Gabriel Poiati <paulogpoi...@gmail.com> wrote: > - First of all, my first thoughts is to have two CF one for raw client > request (~10 millions++ per day) and other for aggregated metrics in some > defined inteval time like 1min, 5min, 15min... Is this a good approach ?
Sure. > - It is a good idea to use a OrderPreservingPartitioner ? To maintain the > order of my requests in the raw data CF ? Or the overhead is too big. The problem with OPP isn't overhead (it is lower-overhead than RP) but the tendency to have hotspots in sequentially-written data. > - Initially the cluster will contain only three nodes, is it a problem (to > few maybe) ? You'll have to do some load testing to see. > - I think the best way to do the aggregation job is through a hadoop > MapReduce job. Right ? Is there any other way to consider ? Map/Reduce is usually better than rolling your own because it parallelizes for you. > - Is really Cassandra suitable for it ? Maybe HBase is better in this case? Nothing here makes me think "Cassandra is a poor choice." -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com