We use playorm to do 80,000 virtual column families(a playorm feature though the pattern could be copied). We did find out later and we are working on this now that we wanted to map 80,000 virtual CF's into 10 real CF's so leveled compaction can run more in parallel though or else we get stuck with single threaded LCS at the last tier which can take a while. We are about to map/reduce our dataset into our newest format.
Dean From: Kirk True <kirktrue...@gmail.com<mailto:kirktrue...@gmail.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Monday, July 1, 2013 10:19 AM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: 10,000s of column families/keyspaces Hi all, I know it's an old topic, but I want to see if anything's changed on the number of column families that C* supports, either in 1.2.x or 2.x. For a number of reasons [1], we'd like to support multi-tenancy via separate column families. The "problem" is that there are around 5,000 tenants to support and each one needs a small handful of column families each. The last I heard C* supports 'a couple of hundred' column families before things start to bog down. What will it take for C* to support 50,000 column families? I'm about to dive into the code and run some tests, but I was curious about how to quantify the overhead of a column family. Is the reason performance? Memory? Does the off-heap work help here? Thanks, Kirk [1] The main three reasons: 1. ability to wholesale drop data for a given tenant via drop keyspace/drop CFs 2. ability to have divergent schema for each tenant (partially effected by DSE Solr integration) 3. secondary indexes per tenant (given requirement #2)