I did see a reference to deletions: “overall approaches considering volumes, 
deletion, compaction etc.” Did I merely misunderstand the reference? That’s all 
I was responding to... sorry if my misunderstanding added any confusion!

-- Jack Krupansky

From: Laing, Michael 
Sent: Monday, September 1, 2014 11:34 AM
To: user@cassandra.apache.org 
Subject: Re: Help with select IN query in cassandra

Did the OP propose that?



On Mon, Sep 1, 2014 at 10:53 AM, Jack Krupansky <j...@basetechnology.com> wrote:

  One comment on deletions – aren’t deletions kind of an anti-pattern for 
modern data processing, such as sensor data, time series data, and social 
media? I mean, isn’t it usually better to return a full history of the data, 
with some aging scheme, and manage the tracking of which values are “current” 
(or “recent”)? Shouldn’t we be looking for and promoting “write once” 
approaches as a much stronger preference/pattern? Or maybe I should say “write 
once and bulk delete on aging” rather than the exercise in futility of doing a 
massive number of deletes and updates in place?

  -- Jack Krupansky

  From: Laing, Michael 
  Sent: Monday, September 1, 2014 9:33 AM
  To: user@cassandra.apache.org 
  Subject: Re: Help with select IN query in cassandra

  This should work for your query requirements - 2 tables w same info because 
disk is cheap and writes are fast so optimize for reads: 

  CREATE TABLE sensor_asset (
    asset_id text,
    event_time timestamp,
    tuuid timeuuid,
    sensor_reading map<text, text>,
    sensor_serial_number text,
    sensor_type int,
    PRIMARY KEY ((asset_id), event_time)
  );

  CREATE TABLE sensor_tuuid (
    asset_id text,
    event_time timestamp,
    tuuid timeuuid,
    sensor_reading map<text, text>,
    sensor_serial_number text,
    sensor_type int,
    PRIMARY KEY (tuuid)
  );

  1. Give me all sensor data for an asset:

  select * from sensor_asset where asset_id = <asset>;

  2. Give me sensor data that matches a set of timeuuids:

  select * from sensor_tuuid where tuuid in (<tuuid1>, <tuuid2>, ...);

  3. Give me all sensor data for an asset collected after | before | between 
event_time(s):

  select * from sensor_asset where asset_id = <asset> and event_time > <ts1>;
  select * from sensor_asset where asset_id = <asset> and event_time < <ts1>;
  select * from sensor_asset where asset_id = <asset> and event_time < <ts1> 
and event_time > <ts2>;

  ***

  Many people (not me) handle sensor data, so there may be better overall 
approaches considering volumes, deletion, compaction etc.

  But the above is simple and should make your current approach workable as you 
iterate toward a complete solution.

  Cheers,
  ml




  On Sun, Aug 31, 2014 at 11:08 PM, Subodh Nijsure <subodh.nijs...@gmail.com> 
wrote:

    Thanks for your help Michael.

    If specifying asset_id would help I can construct queries that can
    include asset_id

    So I have been "playing" around with PRIMARY KEY definition and
    following table definition


    CREATE TABLE sensor_info_table (
      asset_id text,
      event_time timestamp,
      "timestamp" timeuuid,
      sensor_reading map<text, text>,
      sensor_serial_number text,
      sensor_type int,

      PRIMARY KEY ((asset_id, "timestamp"), event_time)
    );

    It does what I want to do, and I removed the index for timestamp item
    since now it is part of primary key and thus my query like this works.

    SELECT * from sigsense.sensor_info_table where  asset_id='3' AND
    timestamp IN ( 
17830bb0-316a-11e4-800f-b888e30f5d17,16ddbdfe-316a-11e4-9f50-b888e30f5d17
    );

    But now this doesn't work it give

    SELECT * from sensor_info_table where  asset_id='3' ;

    Bad Request: Partition key part timestamp must be restricted since
    preceding part is

    I am keeping index on event_time as I sometime need to query something
    "give me all data since time x" i.e. something like this works.

    SELECT * from sensor_info_table where  event_time > '2014-08-31
    16:54:02-0700' ALLOW FILTERING;

    However if I do this things then this don't work:

    SELECT * from sensor_info_table where  asset_id='3' AND event_time >
    '2014-08-31 16:54:02-0700';

    Bad Request: Partition key part timestamp must be restricted since
    preceding part is

    Also  I am not conformable with fact that I need to specify ALLOW FILTERING.

    I guess cassandra schema design task asks designer to write down
    queries before designing schema.

    For the above table definition I want to do following queries:

    - Give me all sensor data for given asset.
    - Give me sensor data that matches given set of timeuuids
    - Give me all sendor data for a given asset, that were collected after
    | before | between  certain event_time.

    Given these query criteria how should  I construct my schema? One
    thought has occurred to me is make three tables with each item
    asset_id , event_time, timeuuid as primary keys and depending on type
    of query choose the table to do query upon. That seems like a waste of
    resources (disk, cpu ), also increasing insert times(!) but thats the
    way things need to happen in cassandra world its okay. ( I am
    two-three weeks into learning about cassandra).

    -Subodh

    On Sun, Aug 31, 2014 at 6:44 PM, Laing, Michael

    <michael.la...@nytimes.com> wrote:
    > Oh it must be late - I missed the fact that you didn't want to specify
    > asset_id. The above queries will still work but you have to use 'allow
    > filtering' - generally not a good idea. I'll look again in the morning.
    >
    >
    > On Sun, Aug 31, 2014 at 9:41 PM, Laing, Michael 
<michael.la...@nytimes.com>
    > wrote:
    >>
    >> Hmm. Because the clustering key is (event_time, "timestamp"), event_time
    >> must be specified as well - hopefully that info is available to the ux.
    >>
    >> Unfortunately you will then hit another problem with your query: you are
    >> selecting a collection field... this will not work with IN on 
"timestamp".
    >>
    >> So you could select all the "timestamp"s for an asset_id/event_time:
    >>>
    >>> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
    >>> 1231234;
    >>
    >>
    >> Or you could apply a range of "timestamp"s:
    >>>
    >>> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
    >>> 1231234 and "timestamp" > 1d934af3-3178-11e4-ba8d-406c8f1838fa and
    >>> "timestamp" < 20b82021-3178-11e4-abc2-406c8f1838fa;
    >>
    >>
    >> BTW the secondary indices are not a good idea: high cardinality and of no
    >> use in this query that I can see.
    >>
    >> ml
    >>
    >>
    >> On Sun, Aug 31, 2014 at 8:40 PM, Subodh Nijsure 
<subodh.nijs...@gmail.com>
    >> wrote:
    >>>
    >>> Not really event time stamp is created by the sensor when it reads data
    >>> and  timestamp is something server creates when inserting data into
    >>> cassandra db.  At later point in time my django ux allows users to 
browse
    >>> this data and reference interesting data points via the timestamp 
field. The
    >>> timestamp field is my bridge between Sal and nosql world.
    >>>
    >>> Subodh
    >>>
    >>> On Aug 31, 2014 5:33 PM, "Laing, Michael" <michael.la...@nytimes.com>
    >>> wrote:
    >>>>
    >>>> Are event_time and timestamp essentially representing the same 
datetime?
    >>>>
    >>>> On Sunday, August 31, 2014, Subodh Nijsure <subodh.nijs...@gmail.com>
    >>>> wrote:
    >>>>>
    >>>>> I have following database schema
    >>>>>
    >>>>> CREATE TABLE sensor_info_table (
    >>>>>   asset_id text,
    >>>>>   event_time timestamp,
    >>>>>   "timestamp" timeuuid,
    >>>>>   sensor_reading map<text, text>,
    >>>>>   sensor_serial_number text,
    >>>>>   sensor_type int,
    >>>>>   PRIMARY KEY ((asset_id), event_time, "timestamp")
    >>>>> );
    >>>>>
    >>>>> CREATE INDEX event_time_index ON sensor_info_table (event_time);
    >>>>>
    >>>>> CREATE INDEX timestamp_index ON sensor_info_table ("timestamp");
    >>>>>
    >>>>> Now I am able to insert the data into this table, however I am unable
    >>>>> to do following query where I want to select items with specific
    >>>>> timeuuid values.
    >>>>>
    >>>>> It gives me following error.
    >>>>>
    >>>>> SELECT * from mydb.sensor_info_table where timestamp IN (
    >>>>> bfdfa614-3166-11e4-a61d-b888e30f5d17 ,
    >>>>> bf4521ac-3166-11e4-87a3-b888e30f5d17) ;
    >>>>>
    >>>>> Bad Request: PRIMARY KEY column "timestamp" cannot be restricted
    >>>>> (preceding column "event_time" is either not restricted or by a non-EQ
    >>>>> relation)
    >>>>>
    >>>>> What do I have to do to make this work?
    >>>>>
    >>>>> For what its worth I am using django for my front end development and
    >>>>> I am using "timestamp timeuuid" field as unique indentifier to
    >>>>> reference specific sensor reading from django framework -- since
    >>>>> cassandra doesn't have way to generate unique id upon insert (like
    >>>>> old-style rdms's auto-fields).
    >>>>>
    >>>>>
    >>>>> Below is software version info.
    >>>>>
    >>>>> show VERSION ; [cqlsh 4.1.1 | Cassandra 2.0.9 | CQL spec 3.1.1 |
    >>>>> Thrift protocol 19.39.0]
    >>>>>
    >>>>> I really don't understand what the error message preceeding column
    >>>>> "event_time" is either not restricted or by no-EQ relation?
    >>>>>
    >>>>> -Subodh Nijsure
    >>
    >>
    >


Reply via email to