Re: DataModelling to query date range

Vidur Malik Thu, 24 Mar 2016 05:52:50 -0700

Hi Chris,

I had something slightly different in mind. You would treat it as time
series data, and have one record for each of the days the route was valid.
In your case:
start           | end                | valid
New York   Washington     2016-01-01
New York   Washington     2016-01-02
New York   Washington     ...
New York   Washington     ...
New York   Washington     2016-01-31


Now, your queries will work, I imagine. Again, this may look wasteful, but
the whole philosophy behind Cassandra is that data duplication is all good.

On Thursday, 24 March 2016, Chris Martin <ch...@cmartinit.co.uk> wrote:

> Hi Vidur,
>
> I had a go at your solution but the problem is that it doesn't match
> routes which are valid all throughtout the range queried.  For example if I
> have  route that is valid for all of Jan 2016. I will have a table that
> looks something like this:
>
> start           | end                | valid
> New York   Washington     2016-01-01
> New York   Washington     2016-01-31
>
> So if I query for ranges that have at least one bound outside Jan (e.g Jan
> 15 - Feb 15) then the query you gave will work fine.  If, however, I query
> for a range that is completely inside Jan e.g all routes valid on Jan 15th,
>  The I think I'll end up with a query like:
>
> SELECT * from routes where start = 'New York' and end = 'Washington' and 
> valid <= 2016-01-15 and valid >= 2016-01-15.
>
> which will return 0 results as it would only match routes that have a
> valid of 2016-01-15 exactly.
>
>  thanks,
>
> Chris
>
>
> On Wed, Mar 23, 2016 at 11:19 PM, Vidur Malik <vi...@shopkeep.com
> <javascript:_e(%7B%7D,'cvml','vi...@shopkeep.com');>> wrote:
>
>> Flip the problem over. Instead of storing validTo and validFrom, simply
>> store a valid field and partition by (start, end). This may sound wasteful,
>> but disk is cheap:
>>
>> CREATE TABLE routes (
>> start text,
>> end text,
>> valid timestamp,
>> PRIMARY KEY ((start, end), valid)
>> );
>>
>> Now, you can execute something like:
>>
>> SELECT * from routes where start = 'New York' and end = 'Washington' and 
>> valid <= 2016-01-31 and valid >= 2016-01-01.
>>
>>
>> On Wed, Mar 23, 2016 at 5:08 PM, Chris Martin <ch...@cmartinit.co.uk
>> <javascript:_e(%7B%7D,'cvml','ch...@cmartinit.co.uk');>> wrote:
>>
>>> Hi all,
>>>
>>> I have a table that represents a train timetable and looks a bit like
>>> this:
>>>
>>> CREATE TABLE routes (
>>> start text,
>>> end text,
>>> validFrom timestamp,
>>> validTo timestamp,
>>> PRIMARY KEY (start, end, validFrom, validTo)
>>> );
>>>
>>> In this case validFrom is the date that the route becomes valid and
>>> validTo is the date that the route that stops becoming valid.
>>>
>>> If this was SQL I could write a query to find all valid routes between
>>> New York and Washington from Jan 1st 2016 to Jan 31st 2016 using something
>>> like:
>>>
>>> SELECT * from routes where start = 'New York' and end = 'Washington' and 
>>> validFrom <= 2016-01-31 and validTo >= 2016-01-01.
>>>
>>> As far as I can tell such a query is impossible with CQL and my current
>>> table structure.  I'm considering running a query like:
>>>
>>> SELECT * from routes where start = 'New York' and end = 'Washington' and 
>>> validFrom <= 2016-01-31
>>>
>>> And then filtering the rest of the data app side.  This doesn't seem
>>> ideal though as I'm going to end up fetching much more data (probably
>>> around an order of magnitude more) from Cassandra than I really want.
>>>
>>> Is there a better way to model the data?
>>>
>>> thanks,
>>>
>>> Chris
>>>
>>>
>>>
>>>
>>
>>
>> --
>>
>> Vidur Malik
>>
>> [image: ShopKeep] <http://www.shopkeep.com>
>>
>> 800.820.9814
>> <8008209814>
>> [image: ShopKeep] <https://www.facebook.com/ShopKeepPOS> [image:
>> ShopKeep] <https://twitter.com/shopkeep> [image: ShopKeep]
>> <https://instagram.com/shopkeep/>
>>
>
>

-- 

Vidur Malik

[image: ShopKeep] <http://www.shopkeep.com>

800.820.9814
<8008209814>
[image: ShopKeep] <https://www.facebook.com/ShopKeepPOS> [image: ShopKeep]
<https://twitter.com/shopkeep> [image: ShopKeep]
<https://instagram.com/shopkeep/>

Re: DataModelling to query date range

Reply via email to