Hey everyone, I'm new to Cassandra, going my first steps, having a problem/question regarding sorting results and proper data modelling.
First of all, I read the article "We Shall Have Order!" by Aaron Ploetz (1) to get a first view on how Cassandra works. I reproduced the example in the article with my own table. DROP TABLE sensors; CREATE TABLE sensors ( timestamp BIGINT, name VARCHAR, value VARCHAR, unit VARCHAR, PRIMARY KEY (name, timestamp) ) WITH gc_grace_seconds = 0 AND CLUSTERING ORDER BY (timestamp DESC); I'm actual running Cassandra on a single node ([cqlsh 5.0.1 | Cassandra 3.11.0 | CQL spec 3.4.4 | Native protocol v4]). Now some background information about my project: I want to store all kinds of measuring-data from all kinds sensors. No matter if the sensor is measuring a temperature, water flow, or whatever. Sensors always give a single value. Interpretation has to be done afterwards by the user. So in my example, I 'm measuring temperatures of my house which leads me to the following data: timestamp name value unit 2017-07-24 14-11-00 entrance-a 20 Celsius 2017-07-24 14-11-04 living-room 24 Celsius 2017-07-24 14-11-07 bath-room 22 Celsius 2017-07-24 14-11-15 bed-room 23 Celsius 2017-07-24 14-11-22 entrance-b 20 Celsius I'm measuring time-triggered each 15 minutes. In order to have some kind of start and end for each process, I decided to measure the entrance twice with different named sensors (entrance a and b). So above is one set of measuring-data, created by a single process. I'd say this is just another perfect example of what Aaron Ploetz describes in his article. When I query Cassandra the result set is not sorted by timestamp as long as I won't use the primary key in my WHERE clause. When I ask myself: "What will I query Cassandra for?" I'm always coming up with the same typical thoughts: * LIST all measuring's in a specific timespan ORDERED BY timestamp ASC/DESC o Requires ALLOW FILTERING o Won't be sorted * LIST all measuring's for a specific sensor ORDERED BY timestamp ASC/DESC o Sorted result. OK. * And stuff the future will bring which I simply don't know now. So in order to query Cassandra for measuring's in a specific timestamp I can't find a solid solution. My first idea was: * Add a column sequence which can be used to bundle a set of measuring's DROP TABLE sensors; CREATE TABLE sensors ( timestamp BIGINT, name VARCHAR, value VARCHAR, unit VARCHAR, sequence INT, PRIMARY KEY (sequence, timestamp) ) WITH gc_grace_seconds = 0 AND CLUSTERING ORDER BY (timestamp DESC); o I won't need to measure the entrance twice o I can query for a timespan as long as the timespan is within a sequence. ? But when I query a timespan containing more than a single sequence, then the result set is not correct sorted again sequence timestamp name value unit 123 2017-07-24 14-11-22 entrance-b 20 Celsius 123 2017-07-24 14-11-15 bed-room 23 Celsius 123 2017-07-24 14-11-07 bath-room 22 Celsius 123 2017-07-24 14-11-04 living-room 24 Celsius 123 2017-07-24 14-11-00 entrance-a 20 Celsius 124 2017-07-24 15-11-22 entrance-b 22 Celsius 124 2017-07-24 15-11-15 bed-room 25 Celsius 124 2017-07-24 15-11-07 bath-room 24 Celsius 124 2017-07-24 15-11-04 living-room 26 Celsius 124 2017-07-24 15-11-00 entrance-a 22 Celsius o Besides: it's not recommended to use a "dummy" column especially not as primary or clustering key. How to solve this problem? I believe, I can't be the only one who got this requirement. Imho "Sort it on the client-side" can't be the solution. As soon as data gets bigger we simply can't "just" sort on a client side. So my next idea was to use the table as overall data storage and create another table and periodically transfer data from the main to the child table. But I believe I'll get the same problem because Cassandra simply don't sort as an RDBMS. So here must be an idea behind the philosophy of Cassandra. Can anyone help me out? Best regards Mike Wenzel (1) https://www.datastax.com/dev/blog/we-shall-have-order