The question is what matters and how big cardinality is. 1. MV updates are atomic 2. Updates on 2 tables are not. You'd require a logged batch to ensure atomicity and so the write performance is also a little bit lower than without batches 3. If you have a hand full of groups per user, collections are a way to go. If you have thousands of memberships per user, you should consider MVs. Collections are not made to store "tons of data". 4. MVs are not so super-production-stable. They work but there are still some issues. So if you have a good alternative, you probably want not to use MVs
2016-12-22 7:31 GMT+01:00 Torsten Bronger <bron...@physik.rwth-aachen.de>: > Hallöchen! > > In RDBMS terms, I have a n:m relationship between "users" and > "groups". I need to answer the questions "who's in that group" and > "in which groups is he". In my Cassandra DB, this looks like this: > > CREATE TABLE users ( > id uuid PRIMARY KEY, > groups_member set<uuid> > groups_admin set<uuid> > groups_pending set<uuid> > ); > > CREATE TABLE groups ( > id uuid PRIMARY KEY, > members set<uuid> > admins set<uuid> > pending set<uuid> > ); > > But someone suggested to me to express the membership relation by > this: > > CREATE TABLE group_status ( > group uuid, > user uuid, > status text, /* "member", "admin", "pending" */ > PRIMARY KEY ((group, user)) > ); > > CREATE MATERIALIZED VIEW group_status_group AS > SELECT user, status FROM group_status > WHERE user IS NOT NULL AND status IS NOT NULL and group IS NOT NULL > PRIMARY KEY (group, user); > > CREATE MATERIALIZED VIEW group_status_user AS > SELECT group, status FROM group_status > WHERE user IS NOT NULL AND status IS NOT NULL and group IS NOT NULL > PRIMARY KEY (user, group); > > The answer to "who's in that group" is here "SELECT * FROM > group_status_group WHERE group = <group-id>". > > > Let's analyse both, and please interrrupt me if I write something > wrong. > > Simplicity: Table layout is easier to understand in the first > variant, however, code is simpler with the second variant as you > only need one update instead of a batch with up to six sets. The > second variant is easier to extend to further states. > > Consistenty: Eventual consistency can be guaranteed in both cases. > > Performance: Read performance is much better for the first variant, > because the second variant has to go through many rows to collect > all non-deleted clustering key values. Write performance is > slightly better for the first variant because one table + two > materialised views is more expensive than two tables. > > What would you prefer? > > Tschö, > Torsten. > > -- > Torsten Bronger > > -- Benjamin Roth Prokurist Jaumo GmbH · www.jaumo.com Wehrstraße 46 · 73035 Göppingen · Germany Phone +49 7161 304880-6 · Fax +49 7161 304880-1 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer