Re: Modeling contact list, plain table or List

I PVP Tue, 12 Jan 2016 05:03:58 -0800

--> Why don't you do: DELETE FROM user_contact WHERE userid=xxx AND 
contactname=yyyy ?
Answer : Because a contact name can be duplicated.   Or should I force unique 
contact names?


Overall , the challenge seems to be addressed , with some trade of on the 
"ordering by contact name”.

If, at the base table, the clustering column is the objectid(timeuuid) instead 
of the contactname , the DELETE will be based on userid = ? and objectid = ?.
This works fine. Generic SELECTs will also work fine on the base table.

The MV will serve SELECTs  targeting/filtering the favorite contacts.

Like this:

CREATE TABLE communication.user_contact (
userid int,
objectid timeuuid,
contactid int,
contactname text,
createdat timeuuid,
favoriteat timestamp,
isfavorite boolean,
PRIMARY KEY (userid, objectid)
);


CREATE MATERIALIZED VIEW communication.user_contact_by_favorite AS
SELECT userid, isfavorite, objectid, contactid, contactname, createdat, 
favoriteat
FROM user_contact
WHERE userid IS NOT NULL AND isfavorite IS NOT NULL AND objectid IS NOT NULL
PRIMARY KEY ( ( userid, isfavorite ), objectid )
WITH CLUSTERING ORDER BY ( objectid DESC ) ;


Unfortunately  this approach forces the model to cluster by objected(timeuuid) 
just to satisfy the need to DELETE a specific contact row,  and by doing that 
it wastes a opportunity on the MV, because all the PKs from the base table need 
to be at the MV and  it is not possible to set the MV with with more than 1 
non-PK column from the base table as the MV PK.  But still working fine.


That is my first Cassandra use case and the guidance provided by  you guys 
pretty important.

Thanks very much for the answers, questions and suggestions.


--
IPVP


From: DuyHai Doan <doanduy...@gmail.com><mailto:doanduy...@gmail.com>
Reply: user@cassandra.apache.org 
<user@cassandra.apache.org>><mailto:user@cassandra.apache.org>
Date: January 12, 2016 at 10:27:45 AM
To: user@cassandra.apache.org 
<user@cassandra.apache.org>><mailto:user@cassandra.apache.org>
Cc: Jack Krupansky <jack.krupan...@gmail.com>><mailto:jack.krupan...@gmail.com>
Subject:  Re: Modeling contact list, plain table or List

1)SELECT all rows from user_contact excluding the one  that the user wants to 
get rid of.
2) DELETE all the user_contact rows  for that particular user .
3) INSERT  the result of 1).

--> Why don't you do: DELETE FROM user_contact WHERE userid=xxx AND 
contactname=yyyy ?

The Materialized View will be automagically updated by Cassandra with a query 
similar to DELETE FROM user_contact_by_favorite WHERE userid=xxx AND 
is_favorite=zzz AND contactname=yyy

On Mon, Jan 11, 2016 at 10:40 PM, Jonathan Haddad 
<j...@jonhaddad.com<mailto:j...@jonhaddad.com>> wrote:
In general I advise people avoid lists and use Maps or Sets instead.

Using this data model, for instance, it's easy to remove a specific Address 
from a user:

CREATE TYPE address (
  street text,
  city text,
  zip_code int,
);

CREATE TABLE user (
    user_id int primary key,
    addresses map<text, frozen <address>>
);

When I want to remove one of the addresses from a user, I can do this:

cqlsh:test> delete addresses['home'] from user where user_id =  1;


Hope that helps,
Jon


On Mon, Jan 11, 2016 at 1:20 PM I PVP 
<i...@hotmail.com<mailto:i...@hotmail.com>> wrote:
Well…the way it is now  it is not possible to delete a specific contact row 
from the base table at all. Because a DELETE statement only works with  PK in 
the WHERE  clausule. Non PK columns cannot be in the DELETE WHERE clausule.
https://docs.datastax.com/en/cql/3.3/cql/cql_reference/delete_r.html

The way it is now  It is only possible to delete the entire contact list  for 
that specific user.
Looks like will need to:
1)SELECT all rows from user_contact excluding the one  that the user wants to 
get rid of.
2) DELETE all the user_contact rows  for that particular user .
3) INSERT  the result of 1).

Is that the proper way to achieve it or Am I missing some point in the modeling 
that would allow to delete a specific contact row  and still able to comply 
with the select requirements?

Thanks
--
IPVP


From: Jack Krupansky <jack.krupan...@gmail.com><mailto:jack.krupan...@gmail.com>
Reply: user@cassandra.apache.org<mailto:user@cassandra.apache.org> 
<user@cassandra.apache.org>><mailto:user@cassandra.apache.org>
Date: January 11, 2016 at 7:00:04 PM

To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> 
<user@cassandra.apache.org>><mailto:user@cassandra.apache.org>
Subject:  Re: Modeling contact list, plain table or List

That's the beauty of MV - Cassandra automatically updates the MVs when the base 
table changes, including deletions, which is why all of the PK columns from the 
base table needed to be in the MV PK.

-- Jack Krupansky

On Mon, Jan 11, 2016 at 3:41 PM, I PVP 
<i...@hotmail.com<mailto:i...@hotmail.com>> wrote:
The below table and materialized view will solve the SELECT requirements of my 
current application .
The challenge now is when the user decides to DELETE one specific contact from 
his contact list. I could add the objectid to a composite partition key 
together with the userid. But that would make the SELECT inviable.

 Any ideas/suggestions?


CREATE TABLE communication.user_contact (
userid int,
contactname text,
contactid int,
createdat timeuuid,
favoriteat timestamp,
isfavorite boolean,
objectid timeuuid,
PRIMARY KEY (userid, contactname)
) WITH CLUSTERING ORDER BY ( contactname DESC )


CREATE MATERIALIZED VIEW communication.user_contact_by_favorite AS
SELECT userid, isfavorite, contactname, contactid, createdat, favoriteat, 
objectid
FROM user_contact
WHERE userid IS NOT NULL AND isfavorite IS NOT NULL AND contactname IS NOT NULL
PRIMARY KEY ( ( userid, isfavorite ), contactname )
WITH CLUSTERING ORDER BY ( contactname DESC )

Thanks

--
IPVP


From: DuyHai Doan <doanduy...@gmail.com><mailto:doanduy...@gmail.com>
Reply: user@cassandra.apache.org<mailto:user@cassandra.apache.org> 
<user@cassandra.apache.org>><mailto:user@cassandra.apache.org>
Date: January 11, 2016 at 11:14:10 AM

To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> 
<user@cassandra.apache.org>><mailto:user@cassandra.apache.org>
Subject:  Re: Modeling contact list, plain table or List

In the current iteration of materialized view, it is still not possible to have 
WHERE clause other than IS NOT NULL so is_favourite IS TRUE won't work.

Still there is a JIRA created to support this feature : 
https://issues.apache.org/jira/browse/CASSANDRA-10368

About cardinality of favorite vs non-favorites, it doesn't matter in 
[https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif] this case 
because the OP said "Less then one hundred contacts by user is the normal."

So even if all contacts are stuck in one unique favorite state, the 
materialized view partition for one user is at most 100. Even for extreme edge 
case with users having 10 000 contacts, it's still a manageable partition size 
for C*.

But I agree it is important to know before-hand the favorite/non-favorite 
update frequency since it will impact the write throughput on the MV.

For more details on materialized view impl and performance: 
http://www.doanduyhai.com/blog/?p=1930

On Mon, Jan 11, 2016 at 1:36 PM, Jack Krupansky 
<jack.krupan...@gmail.com<mailto:jack.krupan...@gmail.com>> wrote:
The new Materialized View feature is just an automated way of creating and 
maintaining what people used to call a "query table", which is the traditional 
Cassandra data modeling technique for performing queries on on than the primary 
key for a table - you store the same columns in different tables using 
different columns for the primary key.

One also needs to be careful to include all columns of the original primary key 
in each MV primary key - in addition to whatever column(s) are to be used for 
indexing in each MV (so that Cassandra can find the old row when it needs to 
update the MV when the base table row changes, such as on a deletion.)

But before creating MVs, you first need to answer questions about how the app 
needs to query the data. Even with MV, conceptualizing queries needs to precede 
data modeling.

For example, what is the cardinality of favorites vs. non-favorites, does the 
app even need to query by favorates, as opposed to querying all contacts and 
retrieving is_favorite as simply a non-key column value, whether favorites need 
to be retrieved separately from non-favorites, the frequency and latency 
requirements for query by favorite status, etc. Once these questions are 
answered, decisions can be made about data modeling.

-- Jack Krupansky

On Mon, Jan 11, 2016 at 5:13 AM, Carlos Alonso 
<i...@mrcalonso.com<mailto:i...@mrcalonso.com>> wrote:
I have never used Materialized Views so maybe this suggestion is not possible, 
but in this case, wouldn't it make sense to define the materialized view as

is_favourite IS TRUE
instead of
is_favourite IS NOT NULL?

Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso>

On 10 January 2016 at 09:59, DuyHai Doan 
<doanduy...@gmail.com<mailto:doanduy...@gmail.com>> wrote:
Try this

CREATE TABLE communication.user_contact_list (
  user_id uuid,
  contact_id uuid,
  contact_name text,
  created_at timeuuid,
  is_favorite boolean,
  favorite_at timestamp,
  PRIMARY KEY (user_id, contact_name, contact_id)
);

CREATE MATERIALIZED VIEW communication.user_favorite_contact_list
AS SELECT * FROM communication.user_contact_list
WHERE user_id IS NOT NULL AND contact_name IS NOT NULL
AND contact_id IS NOT NULL AND is_favorite IS NOT NULL
PRIMARY KEY(user_id, is_favorite, contact_name, contact_id)

If the flag is_favorite is not updated very often the write perf hit due to 
materialized view is acceptable.

On Sat, Jan 9, 2016 at 11:57 PM, Isaac P. 
<i...@hotmail.com<mailto:i...@hotmail.com>> wrote:
Jack/ Michael,

Thanks for answering.

How big?: Less then one hundred contacts by user is the normal.

Update requirements: The UPDATE requirements are all around  each user 
“favoriting/unfavoriting” the contacts . Deleting is not very frequent.

Does that mean that in C* 3.02 , for this use case to work, the contact name  
must be part of a  composite partition key in order to allow sorting by 
contact_name like this ? :

CREATE TABLE communication.user_contact_list (
user_id uuid,
contact_name text,
is_favorite boolean,
contact_id uuid,
created_at timeuuid,
favorite_at timestamp,
PRIMARY KEY ((user_id, contact_name), is_favorite)
)  WITH CLUSTERING ORDER BY (contact_name ASC);

Query: Select * from user_contact_list where user_id = :userid and is_favorite 
= true order by contact_name asc;

Looks like each contact as a row/clustering key will be the way to go.

Thanks

IPVP


From: Laing, Michael 
<michael.la...@nytimes.com><mailto:michael.la...@nytimes.com>
Reply: user@cassandra.apache.org<mailto:user@cassandra.apache.org> 
<user@cassandra.apache.org>><mailto:user@cassandra.apache.org>
Date: January 9, 2016 at 11:51:27 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> 
<user@cassandra.apache.org>><mailto:user@cassandra.apache.org>
Subject:  Re: Modeling contact list, plain table or List


Note that in C* 3.02 the second query is invalid:

cqlsh> Select * from communication.user_contact_list where user_id = 
98f50f00-b6d5-11e5-afec-6003089bf572 and is_favorite = true order by 
contact_name asc;

InvalidRequest: code=2200 [Invalid query] message="PRIMARY KEY column 
"is_favorite" cannot be restricted as preceding column "contact_name" is not 
restricted"

On Fri, Jan 8, 2016 at 6:50 PM, Jack Krupansky 
<jack.krupan...@gmail.com<mailto:jack.krupan...@gmail.com>> wrote:
How big is each contact list expected to be? Dozens? Hundreds? Thousands? If 
just dozens, a simple list column would seem sufficient. If thousands, the row 
(not partition) would get kind of bloated.

What requirements do you have for updating? If updating contacts and lots of 
contacts, I think I'd prefer each contact as a row/clustering key. Nice to be 
able to do selective queries to return slices of the clustering key values, 
which is not so easy if they are all just a single list column.

-- Jack Krupansky

On Fri, Jan 8, 2016 at 6:31 PM, Isaac P. 
<i...@hotmail.com<mailto:i...@hotmail.com>> wrote:
Hi everyone

What would perform better while modeling a simple user contact list  that will 
be used mainly to select the recipients for/from/to messages ?

a) Individual rows to each (user, contact) pair so a select would fetch all the 
rows  to retrieve all the contacts from a given user.

or

b) A single row for each user containing the List<Contact>  UDT.

Aside of the basic CRUD, the queries will be the following ones:

Select * from user_contact_list where user_id = :userid order by contact_name 
asc

Select * from user_contact_list where user_id = :userid and is_favorite = true 
order by contact_name asc

After reading this 
https://docs.datastax.com/en/cql/3.0/cql/ddl/ddl_compound_keys_c.html  the 
table is looking like this:

CREATE TABLE communication.user_contact_list (
user_id uuid,
contact_id uuid,
contact_name text,
created_at timeuuid,
is_favorite boolean,
favorite_at timestamp,
PRIMARY KEY (user_id, contact_name, is_favorite)
);

Any guidance will be appreciated.

Thanks

--
IPVP

Re: Modeling contact list, plain table or List

Reply via email to