Thanks for the clarification. Let's say I have a partition in an SSTable where the values of time range from 100 to 10 and everything < 50 is expired. If I do a query with time < 100 and time >= 50, are there scenarios in which Cassandra will have to read cells where time < 50? In particular I am wondering if compression might have any affect.
On Sun, Jan 29, 2017 at 3:01 PM DuyHai Doan <doanduy...@gmail.com> wrote: > "Should the data be sorted by my time column regardless of the compaction > strategy" --> It does > > What I mean is that an old "chunk" of expired data in SSTABLE-12 may be > compacted together with a new chunk of SSTABLE-2 containing fresh data so > in the new resulting SSTable will contain tombstones AND fresh data inside > the same partition, but of course sorted by clustering column "time". > > On Sun, Jan 29, 2017 at 8:55 PM, John Sanda <john.sa...@gmail.com> wrote: > > Since STCS does not sort data based on timestamp, your wide partition may > span over multiple SSTables and inside each SSTable, old data (+ > tombstones) may sit on the same partition as newer data. > > > Should the data be sorted by my time column regardless of the compaction > strategy? I didn't think that the column timestamp came into play with > respect to sorting. I have been able to review some SSTables with > sstablemetadata and I can see that old/expired data is definitely living > with live data. > > > On Sun, Jan 29, 2017 at 2:38 PM, DuyHai Doan <doanduy...@gmail.com> wrote: > > Ok so give it a try with TWCS. Since STCS does not sort data based on > timestamp, your wide partition may span over multiple SSTables and inside > each SSTable, old data (+ tombstones) may sit on the same partition as > newer data. > > When reading by slice, even if you request for fresh data, Cassandra has > to scan over a lot tombstones to fetch the correct range of data thus your > issue > > On Sun, Jan 29, 2017 at 8:19 PM, John Sanda <john.sa...@gmail.com> wrote: > > It was with STCS. It was on a 2.x version before TWCS was available. > > On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan <doanduy...@gmail.com> wrote: > > Did you get this Overwhelming tombstonne behavior with STCS or with TWCS ? > > If you're using DTCS, beware of its weird behavior and tricky > configuration. > > On Sun, Jan 29, 2017 at 3:52 PM, John Sanda <john.sa...@gmail.com> wrote: > > Your partitioning key is text. If you have multiple entries per id you are > likely hitting older cells that have expired. Descending only affects how > the data is stored on disk, if you have to read the whole partition to find > whichever time you are querying for you could potentially hit tombstones in > other SSTables that contain the same "id". As mentioned previously, you > need to add a time bucket to your partitioning key and definitely use > DTCS/TWCS. > > > As I mentioned previously, the UI only queries recent data, e.g., the past > hour, past two hours, past day, past week. The UI does not query for > anything older than the TTL which is 7 days. My understanding and > expectation was that Cassandra would only scan live cells. The UI is a > separate application that I do not maintain, so I am not 100% certain about > the queries. I have been told that it does not query for anything older > than 7 days. > > On Sun, Jan 29, 2017 at 4:14 AM, kurt greaves <k...@instaclustr.com> > wrote: > > > Your partitioning key is text. If you have multiple entries per id you are > likely hitting older cells that have expired. Descending only affects how > the data is stored on disk, if you have to read the whole partition to find > whichever time you are querying for you could potentially hit tombstones in > other SSTables that contain the same "id". As mentioned previously, you > need to add a time bucket to your partitioning key and definitely use > DTCS/TWCS. > > > > > > -- > > - John > > > > > > > > > > > > > > > > -- > > - John > > > > > > > >