Hallöchen!

In RDBMS terms, I have a n:m relationship between "users" and
"groups".  I need to answer the questions "who's in that group" and
"in which groups is he".  In my Cassandra DB, this looks like this:

    CREATE TABLE users (
      id uuid PRIMARY KEY,
      groups_member set<uuid>
      groups_admin set<uuid>
      groups_pending set<uuid>
    );

    CREATE TABLE groups (
      id uuid PRIMARY KEY,
      members set<uuid>
      admins set<uuid>
      pending set<uuid>
    );

But someone suggested to me to express the membership relation by
this:

    CREATE TABLE group_status (
      group uuid,
      user uuid,
      status text,  /* "member", "admin", "pending" */
      PRIMARY KEY ((group, user))
    );

    CREATE MATERIALIZED VIEW group_status_group AS
      SELECT user, status FROM group_status
      WHERE user IS NOT NULL AND status IS NOT NULL and group IS NOT NULL
      PRIMARY KEY (group, user);

    CREATE MATERIALIZED VIEW group_status_user AS
      SELECT group, status FROM group_status
      WHERE user IS NOT NULL AND status IS NOT NULL and group IS NOT NULL
      PRIMARY KEY (user, group);

The answer to "who's in that group" is here "SELECT * FROM
group_status_group WHERE group = <group-id>".


Let's analyse both, and please interrrupt me if I write something
wrong.

Simplicity: Table layout is easier to understand in the first
variant, however, code is simpler with the second variant as you
only need one update instead of a batch with up to six sets.  The
second variant is easier to extend to further states.

Consistenty: Eventual consistency can be guaranteed in both cases.

Performance: Read performance is much better for the first variant,
because the second variant has to go through many rows to collect
all non-deleted clustering key values.  Write performance is
slightly better for the first variant because one table + two
materialised views is more expensive than two tables.

What would you prefer?

Tschö,
Torsten.

-- 
Torsten Bronger

Reply via email to