The question is what matters and how big cardinality is.

1. MV updates are atomic
2. Updates on 2 tables are not. You'd require a logged batch to ensure
atomicity and so the write performance is also a little bit lower than
without batches
3. If you have a hand full of groups per user, collections are a way to go.
If you have thousands of memberships per user, you should consider MVs.
Collections are not made to store "tons of data".
4. MVs are not so super-production-stable. They work but there are still
some issues. So if you have a good alternative, you probably want not to
use MVs


2016-12-22 7:31 GMT+01:00 Torsten Bronger <bron...@physik.rwth-aachen.de>:

> Hallöchen!
>
> In RDBMS terms, I have a n:m relationship between "users" and
> "groups".  I need to answer the questions "who's in that group" and
> "in which groups is he".  In my Cassandra DB, this looks like this:
>
>     CREATE TABLE users (
>       id uuid PRIMARY KEY,
>       groups_member set<uuid>
>       groups_admin set<uuid>
>       groups_pending set<uuid>
>     );
>
>     CREATE TABLE groups (
>       id uuid PRIMARY KEY,
>       members set<uuid>
>       admins set<uuid>
>       pending set<uuid>
>     );
>
> But someone suggested to me to express the membership relation by
> this:
>
>     CREATE TABLE group_status (
>       group uuid,
>       user uuid,
>       status text,  /* "member", "admin", "pending" */
>       PRIMARY KEY ((group, user))
>     );
>
>     CREATE MATERIALIZED VIEW group_status_group AS
>       SELECT user, status FROM group_status
>       WHERE user IS NOT NULL AND status IS NOT NULL and group IS NOT NULL
>       PRIMARY KEY (group, user);
>
>     CREATE MATERIALIZED VIEW group_status_user AS
>       SELECT group, status FROM group_status
>       WHERE user IS NOT NULL AND status IS NOT NULL and group IS NOT NULL
>       PRIMARY KEY (user, group);
>
> The answer to "who's in that group" is here "SELECT * FROM
> group_status_group WHERE group = <group-id>".
>
>
> Let's analyse both, and please interrrupt me if I write something
> wrong.
>
> Simplicity: Table layout is easier to understand in the first
> variant, however, code is simpler with the second variant as you
> only need one update instead of a batch with up to six sets.  The
> second variant is easier to extend to further states.
>
> Consistenty: Eventual consistency can be guaranteed in both cases.
>
> Performance: Read performance is much better for the first variant,
> because the second variant has to go through many rows to collect
> all non-deleted clustering key values.  Write performance is
> slightly better for the first variant because one table + two
> materialised views is more expensive than two tables.
>
> What would you prefer?
>
> Tschö,
> Torsten.
>
> --
> Torsten Bronger
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Reply via email to