Hi Thanks, The schema is different. Putting a tenant id as first partition key will make spark logic more complex ( filtering is needed in search-all).
> There's also the issue of lots of memtables flushing to disk during commit log rotations. Can be problematic. If this is the case, I think Cassandra cannot even handle more than 10 tables during commit log rotations. Does number of tables affect the schema modification(create, alter) performance? Jonathan Haddad <j...@jonhaddad.com>于2016年4月7日周四 上午5:13写道: > There's also the issue of lots of memtables flushing to disk during commit > log rotations. Can be problematic. > > On Wed, Apr 6, 2016 at 2:08 PM Michael Penick <michael.pen...@datastax.com> > wrote: > >> Are the tenants using the same schema? If so, you might consider using >> the tenant's ID as part of the primary key for the tables they have in >> common. >> >> If they're all using different, largish schemas I'm not sure that >> Cassandra is well suited to that type of multi-tenancy. There's the per >> overhead memory pre-table and there's that fact that it's difficult to tune >> a single cluster to handle the different (probably competing) workloads >> effectively. >> >> Mike >> >> On Tue, Apr 5, 2016 at 8:40 PM, jason zhao yang < >> zhaoyangsingap...@gmail.com> wrote: >> >>> Hi Jack, >>> >>> Thanks for the reply. >>> >>> Each tenant will has around 50-100 tables for their applications. >>> probably log collection, probably account table, it's not fixed and depends >>> on tenants' need. >>> >>> There will be a team in charge of helping tenant to do data modeling and >>> access patterns. Tenants will not directly admin on the cluster, we will >>> take care. >>> >>> Yes, multi-cluster is a solution. But the cost will be quite high, >>> because each tenant's data is far less than the capacity of a 3 node >>> cluster. So I want to put multiple tenants into one clusters. >>> >>> >>> >>> Jack Krupansky <jack.krupan...@gmail.com>于2016年4月6日周三 上午10:41写道: >>> >>>> What is the nature of these tenants? Are they each creating their own >>>> data models? Is there one central authority that will approve of all data >>>> models and who can adjust the cluster configuration to support those >>>> models? >>>> >>>> Generally speaking, multi-tenancy is an anti-pattern for Cassandra and >>>> for most servers. The proper way to do multitenancy is to not do it at all, >>>> and to use separate machines or at least separate virtual machines. >>>> >>>> In particular, there needs to be a central authority managing a >>>> Cassandra cluster to assure its smooth operation. If each tenant is going >>>> in their own directions, then nobody will be in charge and capable of >>>> assuring that everybody is on the same page. >>>> >>>> Again, it depends on the nature of these tenants and how much control >>>> the cluster administrator has over them. >>>> >>>> Think of a Cassandra cluster as managing the data for either a single >>>> application or a collection of applications which share the same data. If >>>> there are multiple applications that don't share the same data, then they >>>> absolutely should be on separate clusters. >>>> >>>> >>>> -- Jack Krupansky >>>> >>>> On Tue, Apr 5, 2016 at 5:40 PM, Kai Wang <dep...@gmail.com> wrote: >>>> >>>>> Once a while the question about table count rises in this list. The >>>>> most recent is >>>>> https://groups.google.com/forum/#!topic/nosql-databases/IblAhiLUXdk >>>>> >>>>> In short C* is not designed to scale with the table count. For one >>>>> each table/CF has some fixed memory footprint on *ALL* nodes. The >>>>> consensus >>>>> is you shouldn't have more than "a few hundreds" of tables. >>>>> >>>>> On Mon, Apr 4, 2016 at 10:17 AM, jason zhao yang < >>>>> zhaoyangsingap...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> This is Jason. >>>>>> >>>>>> Currently, I am using C* 2.1.10, I want to ask what's the optimal >>>>>> number of tables I should create in one cluster? >>>>>> >>>>>> My use case is that I will prepare a keyspace for each of my tenant, >>>>>> and every tenant will create tables they needed. Assume each tenant >>>>>> created >>>>>> 50 tables with normal workload (half read, half write). so how many >>>>>> number of tenants I can support in one cluster? >>>>>> >>>>>> I know there are a few issues related to large number of tables. >>>>>> * frequent GC >>>>>> * frequent flush due to insufficient memory >>>>>> * large latency when modifying table schema >>>>>> * large amount of tombstones during creating table >>>>>> >>>>>> Is there any other issues with large number of tables? Using a 32GB >>>>>> instance, I can easily create 4000 tables with off-heap-memtable. >>>>>> >>>>>> BTW, Is this table limitation solved in 3.X? >>>>>> >>>>>> Thank you very much. >>>>>> >>>>>> >>>>> >>>> >>