As a general, rough guideline, I would suggest that a partition be kept down to thousands or tens of thousands of rows, probably not more than 100K rows per partition, and physical size kept to tens of thousands to hundreds of thousands or maybe a few megabytes or ten megabytes maximum per partition.
So, a partition with millions of rows or tens of megabytes is probably going to stress Cassandra too heavily. Of course, all of this depends on your actual data model and your actual data values and your actual access patterns. For example, if each row might be very large blobs, then a fewer number of rows would be warranted. Or if each row is very tiny, maybe even using compact storage, then a much larger row count is feasible. And if you are doing heavy updates, then tombstone generation and compaction become issues, that would argue for a smaller partition size. And if you tend to do a lot of repair, you want to keep your row count per partition down, but not too tiny. Moderation is the message - avoid the extremes, both too large and too small. This is not a matter of absolute limits per se, but common sense and operational efficiency. And unfortunately performance and capacity of Cassandra are highly non-linear, as are the values of most data, so exact prediction of size and performance, particularly Java heap pressure are quite unpredictable and require a proof of concept even to determine rough recommendations for your specific app. -- Jack Krupansky On Fri, Feb 27, 2015 at 1:44 AM, wateray <wate...@163.com> wrote: > Hi all, > My team is using Cassandra as our database. We have one question as below. > As we know, the row with the some partition key will be stored in the some > node. > But how many rows can one partition key hold? What is it depend on? The > node's volume or partition data size or partition rows size(the number of > rows)? > When one partition's data is extreme large, the write/read will slow? > Can anyone show me some exist usecases. > thanks! > > > > > >