Re: strange get_range_slices behaviour v0.6.1

Jonathan Ellis Mon, 03 May 2010 08:28:30 -0700

Util.range returns a Range object which is end-exclusive.  (You want
"Bounds" for end-inclusive.)


On Sun, May 2, 2010 at 7:19 AM, aaron morton <aa...@thelastpickle.com> wrote:
> He there, I'm still getting odd behavior with get_range_slices. I've created
> a JUNIT test that illustrates the case.
> Could someone take a look and either let me know where my understanding is
> wrong or is this is a real issue?
>
>
> I added the following to ColumnFamilyStoreTest.java
>
>
>    private ColumnFamilyStore insertKey1Key2Key3() throws IOException,
> ExecutionException, InterruptedException
>    {
>        List<RowMutation> rms = new LinkedList<RowMutation>();
>        RowMutation rm;
>        rm = new RowMutation("Keyspace2", "key1".getBytes());
>        rm.add(new QueryPath("Standard1", null, "Column1".getBytes()),
> "asdf".getBytes(), 0);
>        rms.add(rm);
>
>        rm = new RowMutation("Keyspace2", "key2".getBytes());
>        rm.add(new QueryPath("Standard1", null, "Column1".getBytes()),
> "asdf".getBytes(), 0);
>        rms.add(rm);
>
>        rm = new RowMutation("Keyspace2", "key3".getBytes());
>        rm.add(new QueryPath("Standard1", null, "Column1".getBytes()),
> "asdf".getBytes(), 0);
>        rms.add(rm);
>        return Util.writeColumnFamily(rms);
>    }
>
>
>   �...@test
>    public void testThreeKeyRangeAll() throws IOException,
> ExecutionException, InterruptedException
>    {
>        ColumnFamilyStore cfs = insertKey1Key2Key3();
>
>        IPartitioner p = StorageService.getPartitioner();
>        RangeSliceReply result =
> cfs.getRangeSlice(ArrayUtils.EMPTY_BYTE_ARRAY,
>                                                   Util.range(p, "key1",
> "key3"),
>                                                   10,
>                                                   null,
>
> Arrays.asList("Column1".getBytes()));
>        assertEquals(3, result.rows.size());
>    }
>
>   �...@test
>    public void testThreeKeyRangeSkip1() throws IOException,
> ExecutionException, InterruptedException
>    {
>        ColumnFamilyStore cfs = insertKey1Key2Key3();
>
>        IPartitioner p = StorageService.getPartitioner();
>        RangeSliceReply result =
> cfs.getRangeSlice(ArrayUtils.EMPTY_BYTE_ARRAY,
>                                                   Util.range(p, "key2",
> "key3"),
>                                                   10,
>                                                   null,
>
> Arrays.asList("Column1".getBytes()));
>        assertEquals(2, result.rows.size());
>    }
>
> Running this with "ant test" the partial output is....
>
>    [junit] Testsuite: org.apache.cassandra.db.ColumnFamilyStoreTest
>    [junit] Tests run: 7, Failures: 2, Errors: 0, Time elapsed: 1.405 sec
>    [junit]
>    [junit] Testcase:
> testThreeKeyRangeAll(org.apache.cassandra.db.ColumnFamilyStoreTest):
>  FAILED
>    [junit] expected:<3> but was:<2>
>    [junit] junit.framework.AssertionFailedError: expected:<3> but was:<2>
>    [junit]     at
> org.apache.cassandra.db.ColumnFamilyStoreTest.testThreeKeyRangeAll(ColumnFamilyStoreTest.java:170)
>    [junit]
>    [junit]
>    [junit] Testcase:
> testThreeKeyRangeSkip1(org.apache.cassandra.db.ColumnFamilyStoreTest):
>  FAILED
>    [junit] expected:<2> but was:<1>
>    [junit] junit.framework.AssertionFailedError: expected:<2> but was:<1>
>    [junit]     at
> org.apache.cassandra.db.ColumnFamilyStoreTest.testThreeKeyRangeSkip1(ColumnFamilyStoreTest.java:184)
>    [junit]
>    [junit]
>    [junit] Test org.apache.cassandra.db.ColumnFamilyStoreTest FAILED
>
>
> Any help appreciated.
>
> Aaron
>
>
> On 27 Apr 2010, at 09:38, aaron wrote:
>
>>
>> I've broken this case down further to some pyton code that works against
>> the thrift generated
>> client and am still getting the same odd results. With keys obejct1,
>> object2 and object3 an
>> open ended get_range_slice starting with "object1" only returns object1
>> and
>> 2.
>>
>> I'm guessing that I've got something wrong or my expectation of how
>> get_range_slice works
>> is wrong, but I cannot see where I've gone wrong. Any help would be
>> appreciated.
>>
>> They python code to add and read keys is below, assumes a Cassandra.Client
>> connection.
>>
>> import time
>> from cassandra import Cassandra,ttypes
>> from thrift import Thrift
>> from thrift.protocol import TBinaryProtocol
>> from thrift.transport import TSocket, TTransport
>>
>>
>> def add_data(conn):
>>
>>   col_path = ttypes.ColumnPath(column_family="Standard1",
>> column="col_name")
>>   consistency = ttypes.ConsistencyLevel.QUORUM
>>
>>   for key in ["object1", "object2", "object3"]:
>>       conn.insert("Keyspace1", key, col_path, "col_value",
>>           int(time.time() * 1e6), consistency)
>>   return
>>
>> def read_range(conn, start_key, end_key):
>>
>>   col_parent = ttypes.ColumnParent(column_family="Standard1")
>>
>>   predicate = ttypes.SlicePredicate(column_names=["col_name"])
>>   range = ttypes.KeyRange(start_key=start_key, end_key=end_key,
>> count=1000)
>>   consistency = ttypes.ConsistencyLevel.QUORUM
>>
>>   return conn.get_range_slices("Keyspace1", col_parent,
>>               predicate, range, consistency)
>>
>>
>> Below is the result of calling read_range with different start values.
>> I've
>> also included
>> the debug log for each call, the line starting with "reading
>> RangeSliceCommand" seems to
>> show that key hash for "object2" is greater than "object3".
>>
>> #expect to return objects 1,2 and 3
>>
>> In [37]: cass_test.read_range(conn, "object1", "")
>> Out[37]:
>>
>> [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595268837,
>> name='col_name', value='col_value'), super_column=None)], key='object1'),
>>
>> KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595272693,
>> name='col_name', value='col_value'), super_column=None)], key='object3')]
>>
>> DEBUG 09:29:59,791 range_slice
>> DEBUG 09:29:59,791 RangeSliceCommand{keyspace='Keyspace1',
>> column_family='Standard1', super_column=null,
>> predicate=SlicePredicate(column_names:[...@257b40fe]),
>> range=[121587881847328893689247922008234581399,0], max_keys=1000}
>> DEBUG 09:29:59,791 Adding to restricted ranges
>> [121587881847328893689247922008234581399,0] for
>>
>> (75349581786326521367945210761838448174,75349581786326521367945210761838448174]
>> DEBUG 09:29:59,791 reading RangeSliceCommand{keyspace='Keyspace1',
>> column_family='Standard1', super_column=null,
>> predicate=SlicePredicate(column_names:[...@257b40fe]),
>> range=[121587881847328893689247922008234581399,0], max_keys=1000} from
>> 1...@localhost/127.0.0.1
>> DEBUG 09:29:59,791 Sending RangeSliceReply{rows=Row(key='object1',
>> cf=ColumnFamily(Standard1
>> [636f6c5f6e616d65:false:9...@1272315595268837,])),Row(key='object3',
>> cf=ColumnFamily(Standard1 [636f6c5f6e616d65:false:9...@1272315595272693,]))}
>> to 1...@localhost/127.0.0.1
>> DEBUG 09:29:59,791 Processing response on a callback from
>> 1...@localhost/127.0.0.1
>> DEBUG 09:29:59,791 range slices read object1
>> DEBUG 09:29:59,791 range slices read object3
>>
>>
>> In [38]: cass_test.read_range(conn, "object2", "")
>> Out[38]:
>>
>> [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595271798,
>> name='col_name', value='col_value'), super_column=None)], key='object2'),
>>
>> KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595268837,
>> name='col_name', value='col_value'), super_column=None)], key='object1'),
>>
>> KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595272693,
>> name='col_name', value='col_value'), super_column=None)], key='object3')]
>>
>> DEBUG 09:34:48,133 range_slice
>> DEBUG 09:34:48,133 RangeSliceCommand{keyspace='Keyspace1',
>> column_family='Standard1', super_column=null,
>> predicate=SlicePredicate(column_names:[...@7966340c]),
>> range=[28312518014678916505369931620527723964,0], max_keys=1000}
>> DEBUG 09:34:48,133 Adding to restricted ranges
>> [28312518014678916505369931620527723964,0] for
>>
>> (75349581786326521367945210761838448174,75349581786326521367945210761838448174]
>> DEBUG 09:34:48,133 reading RangeSliceCommand{keyspace='Keyspace1',
>> column_family='Standard1', super_column=null,
>> predicate=SlicePredicate(column_names:[...@7966340c]),
>> range=[28312518014678916505369931620527723964,0], max_keys=1000} from
>> 1...@localhost/127.0.0.1
>> DEBUG 09:34:48,133 Sending RangeSliceReply{rows=Row(key='object2',
>> cf=ColumnFamily(Standard1
>> [636f6c5f6e616d65:false:9...@1272315595271798,])),Row(key='object1',
>> cf=ColumnFamily(Standard1
>> [636f6c5f6e616d65:false:9...@1272315595268837,])),Row(key='object3',
>> cf=ColumnFamily(Standard1 [636f6c5f6e616d65:false:9...@1272315595272693,]))}
>> to 1...@localhost/127.0.0.1
>> DEBUG 09:34:48,133 Processing response on a callback from
>> 1...@localhost/127.0.0.1
>> DEBUG 09:34:48,133 range slices read object2
>> DEBUG 09:34:48,133 range slices read object1
>> DEBUG 09:34:48,133 range slices read object3
>>
>>
>> In [39]: cass_test.read_range(conn, "object3", "")
>> Out[39]:
>>
>> [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595272693,
>> name='col_name', value='col_value'), super_column=None)], key='object3')]
>>
>> DEBUG 09:35:26,090 range_slice
>> DEBUG 09:35:26,090 RangeSliceCommand{keyspace='Keyspace1',
>> column_family='Standard1', super_column=null,
>> predicate=SlicePredicate(column_names:[...@24e33e18]),
>> range=[123092639156685888118746480803115294277,0], max_keys=1000}
>> DEBUG 09:35:26,090 Adding to restricted ranges
>> [123092639156685888118746480803115294277,0] for
>>
>> (75349581786326521367945210761838448174,75349581786326521367945210761838448174]
>> DEBUG 09:35:26,090 reading RangeSliceCommand{keyspace='Keyspace1',
>> column_family='Standard1', super_column=null,
>> predicate=SlicePredicate(column_names:[...@24e33e18]),
>> range=[123092639156685888118746480803115294277,0], max_keys=1000} from
>> 1...@localhost/127.0.0.1
>> DEBUG 09:35:26,090 Sending RangeSliceReply{rows=Row(key='object3',
>> cf=ColumnFamily(Standard1 [636f6c5f6e616d65:false:9...@1272315595272693,]))}
>> to 1...@localhost/127.0.0.1
>> DEBUG 09:35:26,090 Processing response on a callback from
>> 1...@localhost/127.0.0.1
>> DEBUG 09:35:26,090 range slices read object3
>>
>>
>>
>> thanks
>> Aaron
>>
>>
>>
>>
>> On Sun, 25 Apr 2010 20:23:05 -0700, aaron <aa...@the-mortons.org> wrote:
>>>
>>> I've been looking at the get_range_slices feature and have found some odd
>>> behaviour I do not understand. Basically the keys returned in a range
>>
>> query
>>>
>>> do not match what I would expect to see. I think it may have something to
>>> do with the ordering of keys that I don't know about, but I'm just
>>> guessing.
>>>
>>> On Cassandra v 0.6.1, single node local install; RandomPartitioner. Using
>>> Python and my own thin wrapper around the Thrift Python API.
>>>
>>> Step 1.
>>>
>>> Insert 3 keys into the "Standard 1" column family, called "object 1"
>>> "object 2" and "object 3", each with a single column called 'name' with a
>>> value like 'object1'
>>>
>>> Step 2.
>>>
>>> Do a get_range_slices call in the "Standard 1" CF, for column names
>>> ["name"] with start_key "object1" and end_key "object3". I expect to see
>>> three results, but I only see results for object1 and object2. Below are
>>> the thrift types I'm passing into the Cassandra.Client object...
>>>
>>> - ColumnParent(column_family='Standard1', super_column=None)
>>> - SlicePredicate(column_names=['name'], slice_range=None)
>>> - KeyRange(end_key='object3', start_key='object1', count=4000,
>>> end_token=None, start_token=None)
>>>
>>> and the output
>>>
>>>
>>
>> [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250258810439,
>>>
>>> name='name', value='object1'), super_column=None)], key='object1'),
>>>
>>
>> KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250271620362,
>>>
>>> name='name', value='object3'), super_column=None)], key='object3')]
>>>
>>> Step 3.
>>>
>>> Modify the get_range_slices call, so the start_key is object2. In this
>>
>> case
>>>
>>> I expect to see 2 rows returned, but I get 3. Thrift args and return are
>>> below...
>>>
>>> - ColumnParent(column_family='Standard1', super_column=None)
>>> - SlicePredicate(column_names=['name'], slice_range=None)
>>> - KeyRange(end_key='object3', start_key='object2', count=4000,
>>> end_token=None, start_token=None)
>>>
>>> and the output
>>>
>>>
>>
>> [KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250265190715,
>>>
>>> name='name', value='object2'), super_column=None)], key='object2'),
>>>
>>
>> KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250258810439,
>>>
>>> name='name', value='object1'), super_column=None)], key='object1'),
>>>
>>
>> KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250271620362,
>>>
>>> name='name', value='object3'), super_column=None)], key='object3')]
>>>
>>>
>>>
>>> Can anyone explain these odd results? As I said I've got my own python
>>> wrapper around the client, so I may be doing something wrong. But I've
>>> pulled out the thrift objects and they go in and out of the thrift
>>> Cassandra.Client, so I think I'm ok. (I have not noticed a systematic
>>> problem with my wrapper).
>>>
>>> On a more general note, is there information on the sort order of keys
>>
>> when
>>>
>>> using key ranges? I'm guessing the hash of the keys is compared and I
>>> wondering if the hash's of the keys maintain the order of the original
>>> values? Also I assume the order is byte order, rather than ascii or utf8.
>>
>>>
>>> I was experimenting with the difference between column slicing and key
>>> slicing. In my I could write the keys in as column names (they are in
>>> buckets) as well and slice there first, then use the results to to make a
>>> multi key get. I'm trying to support features like, get me all the data
>>> where the key starts with "foo.bar".
>>>
>>> Thanks for the fun project.
>>>
>>> Aaron
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: strange get_range_slices behaviour v0.6.1

Reply via email to