Thanks for your prompt answer. It works great! (simple and efficient!) Since I could not “bind” column name in prepared statement, I created 4 separate ones for each data type.
It would be nice to have “INSERT INTO data_table(key, ?) VALUES (?, ?)” ☺ Regards, Minh From: doanduy...@gmail.com [mailto:doanduy...@gmail.com] Sent: vendredi 2 mai 2014 12:29 To: user@cassandra.apache.org Subject: Re: *Union* data type modeling in Cassandra Hello Ngoc Minh I'd go with the first data model. To solve the null <-> tombstone issue, just do not insert them at runtime if value is null. If only numvalue double != null -> INSERT INTO data_table(key,numvalue) VALUES(...,...); If only numvalues list<double> != null -> INSERT INTO data_table(key,numvalues) VALUES(...,...); and so on ... It means that you'll need to somehow perform null check in your code at runtime but it's the price to pay to avoid tombstones and avoid heavy compaction Regards Duy Hai DOAN On Fri, May 2, 2014 at 11:40 AM, Ngoc Minh VO <ngocminh...@bnpparibas.com<mailto:ngocminh...@bnpparibas.com>> wrote: Hello all, I don’t know whether it is the right place to discuss about data modeling with Cassandra. We would like to have your feedbacks/recommendations on our schema modeling: 1. Our data are stored in a CF by their unique key (K) 2. Data type could be one of the following: Double, List<Double>, String, List<String> 3. Hence we create a data table with: CREATE TABLE data_table ( key text, numvalue double, numvalues list<double>, strvalue text, strvalues list<text>, PRIMARY KEY (key) ); 4. One and only one of the four columns contains a non-null value. The three others always contain null. 5. Pros: easy to debug This modeling works fine for us so far. But C* considers null values as tombstones and we start having tombstone overwhelming when the number reaches the threshold. We are planning to move to a simpler schema with only two columns: CREATE TABLE data_table ( key text, value blob, -- containing serialized data PRIMARY KEY (key) ); Pros: no null values, more efficient in term of storage? Cons: deserialization is handled on client side instead of in the Java driver (not sure which one is more efficient…) Could you please confirm that using “null” values in CF for non-expired “rows” is not a good practice? Thanks in advance for your help. Best regards, Minh This message and any attachments (the "message") is intended solely for the intended addressees and is confidential. If you receive this message in error,or are not the intended recipient(s), please delete it and any copies from your systems and immediately notify the sender. Any unauthorized view, use that does not comply with its purpose, dissemination or disclosure, either whole or partial, is prohibited. Since the internet cannot guarantee the integrity of this message which may not be reliable, BNP PARIBAS (and its subsidiaries) shall not be liable for the message if modified, changed or falsified. Do not print this message unless it is necessary,consider the environment. ---------------------------------------------------------------------------------------------------------------------------------- Ce message et toutes les pieces jointes (ci-apres le "message") sont etablis a l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur ou s'il ne vous est pas destine, merci de le detruire ainsi que toute copie de votre systeme et d'en avertir immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de ce message qui n'est pas conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer l'integrite de ce message electronique susceptible d'alteration, BNP Paribas (et ses filiales) decline(nt) toute responsabilite au titre de ce message dans l'hypothese ou il aurait ete modifie, deforme ou falsifie. N'imprimez ce message que si necessaire, pensez a l'environnement.