I've done more experimentation and the behavior persists: I start with a normal dataset which is searcheable by a secondary index. I select by that index the entries that match a certain criterion, then delete those. I tried two methods of deletion -- individual cf.remove() as well as batch removal in Pycassa. What happens after that is as follows: attempts to read the same CF, using the same index values start to time out in the Pycassa client (there is a thrift message about timeout). The entries not touched by such attempted deletion are read just fine still.

Has anyone seen such behavior?

Thanks,
Maxim

On 11/10/2011 8:30 PM, Maxim Potekhin wrote:
Hello,

My data load comes in batches representing one day in the life of a large computing facility. I index the data by the day it was produced, to be able to quickly pull data for a specific day
within the last year or two. There are 6 other indexes.

When it comes to retiring the data, I intend to delete it for the oldest date and after that add a fresh batch of data, so I control the disk space. Therein lies a problem -- and it maybe Pycassa related, so I also filed an issue on github -- then I select by 'DATE=blah' and then do a batch remove, it works fine for a while, and then after a few thousand deletions (done in batches of 1000) it grinds to a halt, i.e. I can no longer iterate the result, which manifests
in a timeout error.

Is that a behavior seen before? Cassandra version is 0.8.6, Pycassa 1.3.0.

TIA,

Maxim

Reply via email to