Re: Issues, understanding how CQL works

Reid Pinchback Tue, 21 Apr 2020 09:58:26 -0700

Marc, have you had any exposure to DynamoDB at all?  The API approach is 
different, but the fundamental concepts are similar.  That’s actually a better 
reference point to have than an RDBMS, because really it’s a small subset of 
usage patterns that would overlap with CQL.  If you were, for example, dealing 
with databases that did a lot of table partitions and supported apps that 
focused bulk loads and analytics on a partition level, then you would be in a 
space somewhat similar to C*.


C* is at its best when your common usage pattern, at least on reads, is 
effectively “I want a bunch of stuff, so you may as well give it to me by the 
bunch… what I do with the bunch after is my problem”.  That’s very different 
from an RDBMS, which historically has always tried to find some balance between 
minimizing disk I/O and network I/O… but if it takes developers a lot more head 
scratching to get there, it was considered an acceptable investment to help 
scale the usage of an expensive resource.

As a result, language features for the two cases are quite different.

From: Elliott Sims <elli...@backblaze.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Tuesday, April 21, 2020 at 12:13 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Issues, understanding how CQL works

Message from External Sender
The short answer is that CQL isn't SQL.  It looks a bit like it, but the 
structure of the data is totally different.  Essentially (ignoring secondary 
indexes, which have some issues in practice and I think are generally not 
recommended) the only way to look the data up is by the partition key.  
Anything else is a full-table scan and if you need more querying flexibility 
Cassandra is probably not your best option.   With only 260GB, I think I'd lean 
towards suggesting PostgreSQL or MySQL.

On Tue, Apr 21, 2020 at 7:20 AM Marc Richter 
<m...@marc-richter.info<mailto:m...@marc-richter.info>> wrote:
Hi everyone,

I'm very new to Cassandra. I have, however, some experience with SQL.

I need to extract some information from a Cassandra database that has
the following table definition:

CREATE TABLE tagdata.central (
signalid int,
monthyear int,
fromtime bigint,
totime bigint,
avg decimal,
insertdate bigint,
max decimal,
min decimal,
readings text,
PRIMARY KEY (( signalid, monthyear ), fromtime, totime)
)

The database is already of round about 260 GB in size.
I now need to know what is the most recent entry in it; the correct
column to learn this would be "insertdate".

In SQL I would do something like this:

SELECT insertdate FROM tagdata.central
ORDER BY insertdate DESC LIMIT 1;

In CQL, however, I just can't get it to work.

What I have tried already is this:

SELECT insertdate FROM "tagdata.central"
ORDER BY insertdate DESC LIMIT 1;

But this gives me an error:
ERROR: ORDER BY is only supported when the partition key is restricted
by an EQ or an IN.

So, after some trial and error and a lot of Googling, I learned that I
must include all rows from the PRIMARY KEY from left to right in my
query. Thus, this is the "best" I can get to work:


SELECT
        *
FROM
        "tagdata.central"
WHERE
        "signalid" = 4002
        AND "monthyear" = 201908
ORDER BY
        "fromtime" DESC
LIMIT 10;


The "monthyear" column, I crafted like a fool by incrementing the date
one month after another until no results could be found anymore.
The "signalid" I grabbed from one of the unrestricted "SELECT * FROM" -
query results. But these can't be as easily guessed as the "monthyear"
values could.

This is where I'm stuck!

1. This does not really feel like the ideal way to go. I think there is
something more mature in modern IT systems. Can anyone tell me what is a
better way to get these informations?

2. I need a way to learn all values that are in the "monthyear" and
"signalid" columns in order to be able to craft that query.
How can I achieve that in a reasonable way? As I said: The DB is round
about 260 GB which makes it next to impossible to just "have a look" at
the output of "SELECT *"..

Thanks for your help!

Best regards,
Marc Richter


---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org<mailto:user-unsubscr...@cassandra.apache.org>
For additional commands, e-mail: 
user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>

Re: Issues, understanding how CQL works

Reply via email to