Hello Ngoc Minh

 I'd go with the first data model. To solve the null <-> tombstone issue,
just do not insert them at runtime if value is null.

 If only numvalue double != null -> INSERT INTO data_table(key,numvalue)
VALUES(...,...);
 If only numvalues list<double> != null -> INSERT INTO
data_table(key,numvalues) VALUES(...,...);
and so on ...

 It means that you'll need to somehow perform null check in your code at
runtime but it's the price to pay to avoid tombstones and avoid heavy
compaction

Regards

 Duy Hai DOAN


On Fri, May 2, 2014 at 11:40 AM, Ngoc Minh VO <ngocminh...@bnpparibas.com>wrote:

>  Hello all,
>
>
>
> I don’t know whether it is the right place to discuss about data modeling
> with Cassandra.
>
>
>
> We would like to have your feedbacks/recommendations on our schema
> modeling:
>
> 1.       Our data are stored in a CF by their unique key (K)
>
> 2.       Data type could be one of the following: Double, List<Double>,
> String, List<String>
>
> 3.       Hence we create a data table with:
>
> CREATE TABLE data_table (
>
>      key text,
>
>
>
>      numvalue double,
>
>      numvalues list<double>,
>
>      strvalue text,
>
>      strvalues list<text>,
>
>
>
>      PRIMARY KEY (key)
>
> );
>
> 4.       *One and only one* of the four columns contains a non-null
> value. The three others always contain null.
>
> 5.       Pros: easy to debug
>
>
>
> This modeling works fine for us so far. But C* considers null values as
> tombstones and we start having tombstone overwhelming when the number
> reaches the threshold.
>
>
>
> We are planning to move to a simpler schema with only two columns:
>
> CREATE TABLE data_table (
>
>      key text,
>
>      value blob, -- containing serialized data
>
>      PRIMARY KEY (key)
>
> );
>
> Pros: no null values, more efficient in term of storage?
>
> Cons: deserialization is handled on client side instead of in the Java
> driver (not sure which one is more efficient…)
>
>
>
> Could you please confirm that using “null” values in CF for non-expired
> “rows” is not a good practice?
>
>
>
> Thanks in advance for your help.
>
> Best regards,
>
> Minh
>
> This message and any attachments (the "message") is
> intended solely for the intended addressees and is confidential.
> If you receive this message in error,or are not the intended recipient(s),
> please delete it and any copies from your systems and immediately notify
> the sender. Any unauthorized view, use that does not comply with its
> purpose,
> dissemination or disclosure, either whole or partial, is prohibited. Since
> the internet
> cannot guarantee the integrity of this message which may not be reliable,
> BNP PARIBAS
> (and its subsidiaries) shall not be liable for the message if modified,
> changed or falsified.
> Do not print this message unless it is necessary,consider the environment.
>
>
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Ce message et toutes les pieces jointes (ci-apres le "message")
> sont etablis a l'intention exclusive de ses destinataires et sont
> confidentiels.
> Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
> merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
> immediatement l'expediteur. Toute lecture non autorisee, toute utilisation
> de
> ce message qui n'est pas conforme a sa destination, toute diffusion ou
> toute
> publication, totale ou partielle, est interdite. L'Internet ne permettant
> pas d'assurer
> l'integrite de ce message electronique susceptible d'alteration, BNP
> Paribas
> (et ses filiales) decline(nt) toute responsabilite au titre de ce message
> dans l'hypothese
> ou il aurait ete modifie, deforme ou falsifie.
> N'imprimez ce message que si necessaire, pensez a l'environnement.
>

Reply via email to