We use playorm to do 80,000 virtual column families(a playorm feature though 
the pattern could be copied).  We did find out later and we are working on this 
now that we wanted to map 80,000 virtual CF's into 10 real CF's so leveled 
compaction can run more in parallel though or else we get stuck with single 
threaded LCS at the last tier which can take a while.  We are about to 
map/reduce our dataset into our newest format.

Dean

From: Kirk True <kirktrue...@gmail.com<mailto:kirktrue...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, July 1, 2013 10:19 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: 10,000s of column families/keyspaces

Hi all,

I know it's an old topic, but I want to see if anything's changed on the number 
of column families that C* supports, either in 1.2.x or 2.x.

For a number of reasons [1], we'd like to support multi-tenancy via separate 
column families. The "problem" is that there are around 5,000 tenants to 
support and each one needs a small handful of column families each.

The last I heard C* supports 'a couple of hundred' column families before 
things start to bog down.

What will it take for C* to support 50,000 column families?

I'm about to dive into the code and run some tests, but I was curious about how 
to quantify the overhead of a column family. Is the reason performance? Memory? 
Does the off-heap work help here?

Thanks,
Kirk

[1] The main three reasons:


 1.  ability to wholesale drop data for a given tenant via drop keyspace/drop 
CFs
 2.  ability to have divergent schema for each tenant (partially effected by 
DSE Solr integration)
 3.  secondary indexes per tenant (given requirement #2)

Reply via email to