Hi Chris, I had something slightly different in mind. You would treat it as time series data, and have one record for each of the days the route was valid. In your case: start | end | valid New York Washington 2016-01-01 New York Washington 2016-01-02 New York Washington ... New York Washington ... New York Washington 2016-01-31
Now, your queries will work, I imagine. Again, this may look wasteful, but the whole philosophy behind Cassandra is that data duplication is all good. On Thursday, 24 March 2016, Chris Martin <ch...@cmartinit.co.uk> wrote: > Hi Vidur, > > I had a go at your solution but the problem is that it doesn't match > routes which are valid all throughtout the range queried. For example if I > have route that is valid for all of Jan 2016. I will have a table that > looks something like this: > > start | end | valid > New York Washington 2016-01-01 > New York Washington 2016-01-31 > > So if I query for ranges that have at least one bound outside Jan (e.g Jan > 15 - Feb 15) then the query you gave will work fine. If, however, I query > for a range that is completely inside Jan e.g all routes valid on Jan 15th, > The I think I'll end up with a query like: > > SELECT * from routes where start = 'New York' and end = 'Washington' and > valid <= 2016-01-15 and valid >= 2016-01-15. > > which will return 0 results as it would only match routes that have a > valid of 2016-01-15 exactly. > > thanks, > > Chris > > > On Wed, Mar 23, 2016 at 11:19 PM, Vidur Malik <vi...@shopkeep.com > <javascript:_e(%7B%7D,'cvml','vi...@shopkeep.com');>> wrote: > >> Flip the problem over. Instead of storing validTo and validFrom, simply >> store a valid field and partition by (start, end). This may sound wasteful, >> but disk is cheap: >> >> CREATE TABLE routes ( >> start text, >> end text, >> valid timestamp, >> PRIMARY KEY ((start, end), valid) >> ); >> >> Now, you can execute something like: >> >> SELECT * from routes where start = 'New York' and end = 'Washington' and >> valid <= 2016-01-31 and valid >= 2016-01-01. >> >> >> On Wed, Mar 23, 2016 at 5:08 PM, Chris Martin <ch...@cmartinit.co.uk >> <javascript:_e(%7B%7D,'cvml','ch...@cmartinit.co.uk');>> wrote: >> >>> Hi all, >>> >>> I have a table that represents a train timetable and looks a bit like >>> this: >>> >>> CREATE TABLE routes ( >>> start text, >>> end text, >>> validFrom timestamp, >>> validTo timestamp, >>> PRIMARY KEY (start, end, validFrom, validTo) >>> ); >>> >>> In this case validFrom is the date that the route becomes valid and >>> validTo is the date that the route that stops becoming valid. >>> >>> If this was SQL I could write a query to find all valid routes between >>> New York and Washington from Jan 1st 2016 to Jan 31st 2016 using something >>> like: >>> >>> SELECT * from routes where start = 'New York' and end = 'Washington' and >>> validFrom <= 2016-01-31 and validTo >= 2016-01-01. >>> >>> As far as I can tell such a query is impossible with CQL and my current >>> table structure. I'm considering running a query like: >>> >>> SELECT * from routes where start = 'New York' and end = 'Washington' and >>> validFrom <= 2016-01-31 >>> >>> And then filtering the rest of the data app side. This doesn't seem >>> ideal though as I'm going to end up fetching much more data (probably >>> around an order of magnitude more) from Cassandra than I really want. >>> >>> Is there a better way to model the data? >>> >>> thanks, >>> >>> Chris >>> >>> >>> >>> >> >> >> -- >> >> Vidur Malik >> >> [image: ShopKeep] <http://www.shopkeep.com> >> >> 800.820.9814 >> <8008209814> >> [image: ShopKeep] <https://www.facebook.com/ShopKeepPOS> [image: >> ShopKeep] <https://twitter.com/shopkeep> [image: ShopKeep] >> <https://instagram.com/shopkeep/> >> > > -- Vidur Malik [image: ShopKeep] <http://www.shopkeep.com> 800.820.9814 <8008209814> [image: ShopKeep] <https://www.facebook.com/ShopKeepPOS> [image: ShopKeep] <https://twitter.com/shopkeep> [image: ShopKeep] <https://instagram.com/shopkeep/>