OK, so in the end I elected to go for option (c), which makes my table definition look like this:
create table tenanted_foo_table ( tenant ascii, application_key bigint, timestamp timestamp, .... other non-key columns PRIMARY KEY ((tenant, application_key), timestamp) ) such that on disk the row keys are effectively tenant:application_key concatenations. Thanks for your input, Ben On Wed, Nov 13, 2013 at 2:43 PM, Nate McCall <n...@thelastpickle.com> wrote: > Astyanax and/or the DS Java client depending on your use case. (Emphasis on > the "and" - really no reason you can't use both - even on the same schema - > depending on what you are doing as they both have their strengths and > weaknesses). > > To be clear, Hector is not going away. We are still accepting patches and > updates, but there is no active feature development. > > Any other hector specific questions, please start a thread over on > hector-us...@googlegroups.com > > > On Wed, Nov 13, 2013 at 8:35 AM, Shahab Yunus <shahab.yu...@gmail.com> > wrote: >> >> Nate, >> >> (slightly OT), what client API/library is recommended now that Hector is >> sunsetting? Thanks. >> >> Regards, >> Shahab >> >> >> On Wed, Nov 13, 2013 at 9:28 AM, Nate McCall <n...@thelastpickle.com> >> wrote: >>> >>> You basically want option (c). Option (d) might work, but you would be >>> bending the paradigm a bit, IMO. Certainly do not use dedicated column >>> families or keyspaces per tennant. That never works. The list history will >>> show that with a few google searches and we've seen it fail badly with >>> several clients. >>> >>> Overall, option (c) would be difficult to do in CQL without some very >>> well thought out abstractions and/or a deep hack on the Java driver (not >>> in-ellegant or impossible, just lots of moving parts to get your head around >>> if you are new to such). That said, depending on the size of your project >>> and skill of your team, this direction might be worth considering. >>> >>> Usergrid (just accepted for incubation at Apache) functions this way via >>> the Thrift API: https://github.com/apigee/usergrid-stack >>> >>> The commercial version of Usergrid has "tens of thousands" of active >>> tennants on a single cluster (same code base at the service layer as the >>> open source version). It uses Hector's built in virtual keyspaces: >>> https://github.com/hector-client/hector/wiki/Virtual-Keyspaces (NOTE: though >>> Hector is sunsetting/in patch maintenance, the approach is certainly >>> legitimate - but I'd recommend you *not* start a new project on Hector). >>> >>> In short, Usergrid is the only project I know of that has a well-proven >>> tenant model that functions at scale, though I'm sure there are others >>> around, just not open sourced or actually running large deployments. >>> >>> Astyanax can do this as well albeit with a little more work required: >>> >>> https://github.com/Netflix/astyanax/wiki/Composite-columns#how-to-use-the-prefixedserializer-but-you-really-should-use-composite-columns >>> >>> Happy to clarify any of the above. >>> >>> >>> On Tue, Nov 12, 2013 at 3:19 AM, Ben Hood <0x6e6...@gmail.com> wrote: >>>> >>>> Hi, >>>> >>>> I've just received a requirement to make a Cassandra app >>>> multi-tenanted, where we'll have up to 100 tenants. >>>> >>>> Most of the tables are timestamped wide row tables with a natural >>>> application key for the partitioning key and a timestamp key as a >>>> cluster key. >>>> >>>> So I was considering the options: >>>> >>>> (a) Add a tenant column to each table and stick a secondary index on >>>> that column; >>>> (b) Add a tenant column to each table and maintain index tables that >>>> use the tenant id as a partitioning key; >>>> (c) Decompose the partitioning key of each table and add the tenant >>>> and the leading component of the key; >>>> (d) Add the tenant as a separate clustering key; >>>> (e) Replicate the schema in separate tenant specific key spaces; >>>> (f) Something I may have missed; >>>> >>>> Option (a) seems the easiest, but I'm wary of just adding secondary >>>> indexes without thinking about it. >>>> >>>> Option (b) seems to have the least impact of the layout of the >>>> storage, but a cost of maintaining each index table, both code wise >>>> and in terms of performance. >>>> >>>> Option (c) seems quite straight forward, but I feel it might have a >>>> significant effect on the distribution of the rows, if the cardinality >>>> of the tenants is low. >>>> >>>> Option (d) seems simple enough, but it would mean that you couldn't >>>> query for a range of tenants without supplying a range of natural >>>> application keys, through which you would need to iterate (under the >>>> assumption that you don't use an ordered partitioner). >>>> >>>> Option (e) appears relatively straight forward, but it does mean that >>>> the application CQL client needs to maintain separate cluster >>>> connections for each tenant. Also I'm not sure to what extent key >>>> spaces were designed to partition identically structured data. >>>> >>>> Does anybody have any experience with running a multi-tenanted >>>> Cassandra app, or does this just depend too much on the specifics of >>>> the application? >>>> >>>> Cheers, >>>> >>>> Ben >>> >>> >>> >>> >>> -- >>> ----------------- >>> Nate McCall >>> Austin, TX >>> @zznate >>> >>> Co-Founder & Sr. Technical Consultant >>> Apache Cassandra Consulting >>> http://www.thelastpickle.com >> >> > > > > -- > ----------------- > Nate McCall > Austin, TX > @zznate > > Co-Founder & Sr. Technical Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com