This is expected due to tombstones, which this explains pretty well:
http://wiki.apache.org/cassandra/DistributedDeletes

If you don't have any tombstones for the row, the bloom filter will let
Cassandra avoid doing any disk reads at all 99% of the time.

On Tue, Jul 10, 2012 at 10:50 AM, Thorsten von Eicken 
<t...@rightscale.com>wrote:

> We're finding that reading deleted columns can be very slow and I'm
> trying to get confirmation for our analysis of what happens. We wrote
> lots of data eons ago into fairly large rows (up to 1MB). We recently
> read those rows and then deleted them. After this, we ran a
> verification-type pass that attempts to re-read these rows and verifies
> that they are indeed deleted. The interval between the deletion and
> verification pass was far less than gc_grace. We noticed that the
> verification pass took as much time as the read&delete pass(!), while
> verifying the non-existence of rows that never existed is blindingly
> fast in comparison. So it seems that cassandra is reading the old data,
> reading the new tombstones, and then returning "there is no data".
> Functionally correct, but rather unexpected performance
> characteristics... Am I missing something or is this expected?
> Thanks!
> Thorsten
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Reply via email to