Hi all, I'm trying to debug some pretty weird behavior when paginating through a ColumnFamily with get_slice(). It basically looks like Cassandra does not respect the limit parameter in the SlicePredicate, sometimes returning more than limit columns. It also sometimes silently drops columns. I'm reading using QUORUM, and all data was written using QUORUM as well. The client is in Ruby.
I am seeing basically non-deterministic behavior when paginating through a column family (which is not being written concurrently). Here is example output from my pagination method (code below) with some extra debug prints added: irb(main):005:0> blah = get_entire_column_family(@cassandra, "some_key", "some_cf", 100) get_entire_column_family(@cass, "some_key", "some_cf", 100) ... 100/6648 ... 199/6648 ... 354/6648 ... 453/6648 ... 552/6648 ... 689/6648 ... 788/6648 ... 887/6648 ... 1048/6648 ... 1147/6648 ... 1246/6648 ... 1377/6648 ... 1476/6648 ... 1575/6648 ... 1674/6648 ... 1773/6648 ... 1908/6648 ... 2051/6648 ... 2150/6648 ... 2249/6648 ... 2348/6648 ... <snip> ... 6127/6648 ... 6127/6648 ... 6127/6648 ... 6127/6648 ... 6127/6648 ... The N/6648 is just printing the retrieved columns / total columns at each step of the pagination loop. It should be going up by 99 on each iteration after the first (because the start of the next slice == the last value in the current slice). But sometimes it jumps, indicating that more than 100 values were returned from a single get_slice() call (i.e. ... 199/6648 ... 354/6648 ...). And when it gets to the end of the column family, we end up with fewer than 6648 columns on the client side and the code gets stuck in an infinite loop. I've tried this several times from an interactive Ruby session and got a different number of columns each time: 6536/6648 6127/6648 6514/6648 However, once I set the limit to be > num_columns and read the entire row as a single page, everything worked. And follow-up paginated reads also return the entire row successfully. Not sure if that's because the entire row is now in cache, or because something was wrong and read-repair has fixed it. But since all of our reads and writes are done using QUORUM, read-repair shouldn't matter, right? Here is the pagination code: def get_entire_column_family(cassandra, row_key, column_family, limit_per_slice) column_parent = CassandraThrift::ColumnParent.new(:column_family => column_family, :super_column => nil) num_columns = cassandra.get_count(@keyspace, row_key, column_parent, CassandraThrift::ConsistencyLevel::QUORUM) predicate = CassandraThrift::SlicePredicate.new predicate.slice_range = CassandraThrift::SliceRange.new(:start => "", :finish => "", :reversed => false, :count => limit_per_slice) slice = cassandra.get_slice(@keyspace, row_key, column_parent, predicate, CassandraThrift::ConsistencyLevel::QUORUM) result = slice while result.size < num_columns predicate = CassandraThrift::SlicePredicate.new predicate.slice_range = CassandraThrift::SliceRange.new(:start => result.last.column.name, :finish => "", :reversed => false, :count => limit_per_slice) slice = cassandra.get_slice(@keyspace, row_key, column_parent, predicate, CassandraThrift::ConsistencyLevel::QUORUM) # Because the start parameter to get_slice() is inclusive, we should already have the first column of the # new slice in our result. We don't want to have 2 copies of it, so drop it from the slice before concatenating. unless slice.nil? || slice.empty? if result.last.column.name == slice.first.column.name result.concat slice[1 .. slice.size-1] else result.concat slice end end end # while return result end I guess I have several questions: 1) Is this the proper way to paginate through a large column family for a single row key? If not, what is the proper way? Some of our rows are very big (hundreds of thousands of columns in the worst case), and pagination is a must. 2) Could this behavior be expected under some conditions (i.e. maybe presence of tombstones or hints from when a node was down or other weirdness?) 2) Is this a known bug? (maybe related to https://issues.apache.org/jira/browse/CASSANDRA-1145 and/or https://issues.apache.org/jira/browse/CASSANDRA-1042 ?) 3) If this is not a known bug, how should I proceed with investigating it? Thanks, -- Ilya