Re: Issues, understanding how CQL works

Elliott Sims Tue, 21 Apr 2020 09:13:07 -0700

The short answer is that CQL isn't SQL.  It looks a bit like it, but the
structure of the data is totally different.  Essentially (ignoring
secondary indexes, which have some issues in practice and I think are
generally not recommended) the only way to look the data up is by the
partition key.  Anything else is a full-table scan and if you need more
querying flexibility Cassandra is probably not your best option.   With
only 260GB, I think I'd lean towards suggesting PostgreSQL or MySQL.


On Tue, Apr 21, 2020 at 7:20 AM Marc Richter <m...@marc-richter.info> wrote:

> Hi everyone,
>
> I'm very new to Cassandra. I have, however, some experience with SQL.
>
> I need to extract some information from a Cassandra database that has
> the following table definition:
>
> CREATE TABLE tagdata.central (
> signalid int,
> monthyear int,
> fromtime bigint,
> totime bigint,
> avg decimal,
> insertdate bigint,
> max decimal,
> min decimal,
> readings text,
> PRIMARY KEY (( signalid, monthyear ), fromtime, totime)
> )
>
> The database is already of round about 260 GB in size.
> I now need to know what is the most recent entry in it; the correct
> column to learn this would be "insertdate".
>
> In SQL I would do something like this:
>
> SELECT insertdate FROM tagdata.central
> ORDER BY insertdate DESC LIMIT 1;
>
> In CQL, however, I just can't get it to work.
>
> What I have tried already is this:
>
> SELECT insertdate FROM "tagdata.central"
> ORDER BY insertdate DESC LIMIT 1;
>
> But this gives me an error:
> ERROR: ORDER BY is only supported when the partition key is restricted
> by an EQ or an IN.
>
> So, after some trial and error and a lot of Googling, I learned that I
> must include all rows from the PRIMARY KEY from left to right in my
> query. Thus, this is the "best" I can get to work:
>
>
> SELECT
>         *
> FROM
>         "tagdata.central"
> WHERE
>         "signalid" = 4002
>         AND "monthyear" = 201908
> ORDER BY
>         "fromtime" DESC
> LIMIT 10;
>
>
> The "monthyear" column, I crafted like a fool by incrementing the date
> one month after another until no results could be found anymore.
> The "signalid" I grabbed from one of the unrestricted "SELECT * FROM" -
> query results. But these can't be as easily guessed as the "monthyear"
> values could.
>
> This is where I'm stuck!
>
> 1. This does not really feel like the ideal way to go. I think there is
> something more mature in modern IT systems. Can anyone tell me what is a
> better way to get these informations?
>
> 2. I need a way to learn all values that are in the "monthyear" and
> "signalid" columns in order to be able to craft that query.
> How can I achieve that in a reasonable way? As I said: The DB is round
> about 260 GB which makes it next to impossible to just "have a look" at
> the output of "SELECT *"..
>
> Thanks for your help!
>
> Best regards,
> Marc Richter
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Re: Issues, understanding how CQL works

Reply via email to