Hi all,

I'm trying to debug some pretty weird behavior when paginating through
a ColumnFamily with get_slice(). It basically looks like Cassandra
does not respect the limit parameter in the SlicePredicate, sometimes
returning more than limit columns. It also sometimes silently drops
columns. I'm reading using QUORUM, and all data was written using
QUORUM as well. The client is in Ruby.

I am seeing basically non-deterministic behavior when paginating
through a column family (which is not being written concurrently).
Here is example output from my pagination method (code below) with
some extra debug prints added:

irb(main):005:0> blah = get_entire_column_family(@cassandra,
"some_key", "some_cf", 100)
get_entire_column_family(@cass, "some_key", "some_cf", 100) ...
100/6648 ... 199/6648 ... 354/6648 ... 453/6648 ... 552/6648 ...
689/6648 ... 788/6648 ... 887/6648 ... 1048/6648 ... 1147/6648 ...
1246/6648 ... 1377/6648 ... 1476/6648 ... 1575/6648 ... 1674/6648 ...
1773/6648 ... 1908/6648 ... 2051/6648 ... 2150/6648 ... 2249/6648 ...
2348/6648 ... <snip> ... 6127/6648 ... 6127/6648 ... 6127/6648 ...
6127/6648 ... 6127/6648 ...

The N/6648 is just printing the retrieved columns / total columns at
each step of the pagination loop. It should be going up by 99 on each
iteration after the first (because the start of the next slice == the
last value in the current slice). But sometimes it jumps, indicating
that more than 100 values were returned from a single get_slice() call
(i.e. ... 199/6648 ... 354/6648 ...). And when it gets to the end of
the column family, we end up with fewer than 6648 columns on the
client side and the code gets stuck in an infinite loop.

I've tried this several times from an interactive Ruby session and got
a different number of columns each time:
6536/6648
6127/6648
6514/6648
However, once I set the limit to be > num_columns and read the entire
row as a single page, everything worked. And follow-up paginated reads
also return the entire row successfully. Not sure if that's because
the entire row is now in cache, or because something was wrong and
read-repair has fixed it. But since all of our reads and writes are
done using QUORUM, read-repair shouldn't matter, right?

Here is the pagination code:

    def get_entire_column_family(cassandra, row_key, column_family,
limit_per_slice)
      column_parent = CassandraThrift::ColumnParent.new(:column_family
=> column_family, :super_column => nil)
      num_columns = cassandra.get_count(@keyspace, row_key,
column_parent, CassandraThrift::ConsistencyLevel::QUORUM)
      predicate = CassandraThrift::SlicePredicate.new
      predicate.slice_range = CassandraThrift::SliceRange.new(:start
=> "", :finish => "", :reversed => false, :count => limit_per_slice)
      slice = cassandra.get_slice(@keyspace, row_key, column_parent,
predicate, CassandraThrift::ConsistencyLevel::QUORUM)
      result = slice
      while result.size < num_columns
        predicate = CassandraThrift::SlicePredicate.new
        predicate.slice_range = CassandraThrift::SliceRange.new(:start
=> result.last.column.name,
            :finish => "", :reversed => false, :count => limit_per_slice)
        slice = cassandra.get_slice(@keyspace, row_key, column_parent,
predicate, CassandraThrift::ConsistencyLevel::QUORUM)
        # Because the start parameter to get_slice() is inclusive, we
should already have the first column of the
        # new slice in our result. We don't want to have 2 copies of
it, so drop it from the slice before concatenating.
        unless slice.nil? || slice.empty?
          if result.last.column.name == slice.first.column.name
            result.concat slice[1 .. slice.size-1]
          else
            result.concat slice
          end
        end
      end  # while
      return result
    end

I guess I have several questions:
1) Is this the proper way to paginate through a large column family
for a single row key? If not, what is the proper way? Some of our rows
are very big (hundreds of thousands of columns in the worst case), and
pagination is a must.
2) Could this behavior be expected under some conditions (i.e. maybe
presence of tombstones or hints from when a node was down or other
weirdness?)
2) Is this a known bug? (maybe related to
https://issues.apache.org/jira/browse/CASSANDRA-1145 and/or
https://issues.apache.org/jira/browse/CASSANDRA-1042 ?)
3) If this is not a known bug, how should I proceed with investigating it?

Thanks,

-- Ilya

Reply via email to