RE: Union data type modeling in Cassandra

Ngoc Minh VO Mon, 05 May 2014 03:13:35 -0700

Thanks for your prompt answer. It works great! (simple and efficient!)

Since I could not “bind” column name in prepared statement, I created 4 
separate ones for each data type.

It would be nice to have “INSERT INTO data_table(key, ?) VALUES (?, ?)” ☺

Regards,
Minh

From: doanduy...@gmail.com [mailto:doanduy...@gmail.com]
Sent: vendredi 2 mai 2014 12:29
To: user@cassandra.apache.org
Subject: Re: *Union* data type modeling in Cassandra

Hello Ngoc Minh

 I'd go with the first data model. To solve the null <-> tombstone issue, just 
do not insert them at runtime if value is null.

 If only numvalue double != null -> INSERT INTO data_table(key,numvalue) 
VALUES(...,...);
 If only numvalues list<double> != null -> INSERT INTO 
data_table(key,numvalues) VALUES(...,...);
and so on ...

 It means that you'll need to somehow perform null check in your code at 
runtime but it's the price to pay to avoid tombstones and avoid heavy compaction

Regards

 Duy Hai DOAN

On Fri, May 2, 2014 at 11:40 AM, Ngoc Minh VO 
<ngocminh...@bnpparibas.com<mailto:ngocminh...@bnpparibas.com>> wrote:
Hello all,

I don’t know whether it is the right place to discuss about data modeling with 
Cassandra.

We would like to have your feedbacks/recommendations on our schema modeling:

1.       Our data are stored in a CF by their unique key (K)

2.       Data type could be one of the following: Double, List<Double>, String, 
List<String>

3.       Hence we create a data table with:

CREATE TABLE data_table (

     key text,

     numvalue double,

     numvalues list<double>,

     strvalue text,

     strvalues list<text>,

     PRIMARY KEY (key)

);

4.       One and only one of the four columns contains a non-null value. The 
three others always contain null.

5.       Pros: easy to debug

This modeling works fine for us so far. But C* considers null values as 
tombstones and we start having tombstone overwhelming when the number reaches 
the threshold.

We are planning to move to a simpler schema with only two columns:

CREATE TABLE data_table (

     key text,

     value blob, -- containing serialized data

     PRIMARY KEY (key)

);
Pros: no null values, more efficient in term of storage?
Cons: deserialization is handled on client side instead of in the Java driver 
(not sure which one is more efficient…)

Could you please confirm that using “null” values in CF for non-expired “rows” 
is not a good practice?

Thanks in advance for your help.
Best regards,
Minh

This message and any attachments (the "message") is
intended solely for the intended addressees and is confidential.
If you receive this message in error,or are not the intended recipient(s),
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose,
dissemination or disclosure, either whole or partial, is prohibited. Since the 
internet
cannot guarantee the integrity of this message which may not be reliable, BNP 
PARIBAS
(and its subsidiaries) shall not be liable for the message if modified, changed 
or falsified.
Do not print this message unless it is necessary,consider the environment.

----------------------------------------------------------------------------------------------------------------------------------

Ce message et toutes les pieces jointes (ci-apres le "message")
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute
publication, totale ou partielle, est interdite. L'Internet ne permettant pas 
d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans 
l'hypothese
ou il aurait ete modifie, deforme ou falsifie.
N'imprimez ce message que si necessaire, pensez a l'environnement.

RE: *Union* data type modeling in Cassandra

Reply via email to

RE: Union data type modeling in Cassandra