Hi,

I've just received a requirement to make a Cassandra app
multi-tenanted, where we'll have up to 100 tenants.

Most of the tables are timestamped wide row tables with a natural
application key for the partitioning key and a timestamp key as a
cluster key.

So I was considering the options:

(a) Add a tenant column to each table and stick a secondary index on
that column;
(b) Add a tenant column to each table and maintain index tables that
use the tenant id as a partitioning key;
(c) Decompose the partitioning key of each table and add the tenant
and the leading component of the key;
(d) Add the tenant as a separate clustering key;
(e) Replicate the schema in separate tenant specific key spaces;
(f) Something I may have missed;

Option (a) seems the easiest, but I'm wary of just adding secondary
indexes without thinking about it.

Option (b) seems to have the least impact of the layout of the
storage, but a cost of maintaining each index table, both code wise
and in terms of performance.

Option (c) seems quite straight forward, but I feel it might have a
significant effect on the distribution of the rows, if the cardinality
of the tenants is low.

Option (d) seems simple enough, but it would mean that you couldn't
query for a range of tenants without supplying a range of natural
application keys, through which you would need to iterate (under the
assumption that you don't use an ordered partitioner).

Option (e) appears relatively straight forward, but it does mean that
the application CQL client needs to maintain separate cluster
connections for each tenant. Also I'm not sure to what extent key
spaces were designed to partition identically structured data.

Does anybody have any experience with running a multi-tenanted
Cassandra app, or does this just depend too much on the specifics of
the application?

Cheers,

Ben

Reply via email to