Re: strange get_range_slices behaviour v0.6.1

aaron morton Sun, 02 May 2010 05:20:11 -0700

He there, I'm still getting odd behavior with get_range_slices. I'vecreated a JUNIT test that illustrates the case.Could someone take a look and either let me know where myunderstanding is wrong or is this is a real issue?


I added the following to ColumnFamilyStoreTest.java

private ColumnFamilyStore insertKey1Key2Key3() throwsIOException, ExecutionException, InterruptedException

    {
        List<RowMutation> rms = new LinkedList<RowMutation>();
        RowMutation rm;
        rm = new RowMutation("Keyspace2", "key1".getBytes());

rm.add(new QueryPath("Standard1", null,"Column1".getBytes()), "asdf".getBytes(), 0);

        rms.add(rm);

        rm = new RowMutation("Keyspace2", "key2".getBytes());

rm.add(new QueryPath("Standard1", null,"Column1".getBytes()), "asdf".getBytes(), 0);

        rms.add(rm);

        rm = new RowMutation("Keyspace2", "key3".getBytes());

rm.add(new QueryPath("Standard1", null,"Column1".getBytes()), "asdf".getBytes(), 0);

        rms.add(rm);
        return Util.writeColumnFamily(rms);
    }


    @Test

public void testThreeKeyRangeAll() throws IOException,ExecutionException, InterruptedException

    {
        ColumnFamilyStore cfs = insertKey1Key2Key3();

        IPartitioner p = StorageService.getPartitioner();

RangeSliceReply result =cfs.getRangeSlice(ArrayUtils.EMPTY_BYTE_ARRAY,Util.range(p,"key1", "key3"),

                                                   10,
                                                   null,

Arrays.asList("Column1".getBytes()));

        assertEquals(3, result.rows.size());
    }

    @Test

public void testThreeKeyRangeSkip1() throws IOException,ExecutionException, InterruptedException

    {
        ColumnFamilyStore cfs = insertKey1Key2Key3();

        IPartitioner p = StorageService.getPartitioner();

RangeSliceReply result =cfs.getRangeSlice(ArrayUtils.EMPTY_BYTE_ARRAY,Util.range(p,"key2", "key3"),

                                                   10,
                                                   null,

Arrays.asList("Column1".getBytes()));

        assertEquals(2, result.rows.size());
    }

Running this with "ant test" the partial output is....

    [junit] Testsuite: org.apache.cassandra.db.ColumnFamilyStoreTest

[junit] Tests run: 7, Failures: 2, Errors: 0, Time elapsed: 1.405sec

    [junit]

[junit] Testcase:testThreeKeyRangeAll(org.apache.cassandra.db.ColumnFamilyStoreTest):FAILED

    [junit] expected:<3> but was:<2>

[junit] junit.framework.AssertionFailedError: expected:<3> butwas:<2>[junit] atorg.apache.cassandra.db.ColumnFamilyStoreTest.testThreeKeyRangeAll(ColumnFamilyStoreTest.java:170)

    [junit]
    [junit]

[junit] Testcase:testThreeKeyRangeSkip1(org.apache.cassandra.db.ColumnFamilyStoreTest):FAILED

    [junit] expected:<2> but was:<1>

[junit] junit.framework.AssertionFailedError: expected:<2> butwas:<1>[junit] atorg.apache.cassandra.db.ColumnFamilyStoreTest.testThreeKeyRangeSkip1(ColumnFamilyStoreTest.java:184)

    [junit]
    [junit]
    [junit] Test org.apache.cassandra.db.ColumnFamilyStoreTest FAILED


Any help appreciated.

Aaron


On 27 Apr 2010, at 09:38, aaron wrote:

I've broken this case down further to some pyton code that worksagainst
the thrift generated
client and am still getting the same odd results. With keys obejct1,
object2 and object3 an
open ended get_range_slice starting with "object1" only returnsobject1 and
2.

I'm guessing that I've got something wrong or my expectation of how
get_range_slice works
is wrong, but I cannot see where I've gone wrong. Any help would be
appreciated.
They python code to add and read keys is below, assumes aCassandra.Client
connection.

import time
from cassandra import Cassandra,ttypes
from thrift import Thrift
from thrift.protocol import TBinaryProtocol
from thrift.transport import TSocket, TTransport


def add_data(conn):

   col_path = ttypes.ColumnPath(column_family="Standard1",
column="col_name")
   consistency = ttypes.ConsistencyLevel.QUORUM

   for key in ["object1", "object2", "object3"]:
       conn.insert("Keyspace1", key, col_path, "col_value",
           int(time.time() * 1e6), consistency)
   return

def read_range(conn, start_key, end_key):

   col_parent = ttypes.ColumnParent(column_family="Standard1")

   predicate = ttypes.SlicePredicate(column_names=["col_name"])
   range = ttypes.KeyRange(start_key=start_key, end_key=end_key,
count=1000)
   consistency = ttypes.ConsistencyLevel.QUORUM

   return conn.get_range_slices("Keyspace1", col_parent,
               predicate, range, consistency)
Below is the result of calling read_range with different startvalues. I've
also included
the debug log for each call, the line starting with "reading
RangeSliceCommand" seems to
show that key hash for "object2" is greater than "object3".

#expect to return objects 1,2 and 3

In [37]: cass_test.read_range(conn, "object1", "")
Out[37]:
[KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595268837,name='col_name', value='col_value'), super_column=None)],key='object1'),KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595272693,name='col_name', value='col_value'), super_column=None)],key='object3')]
DEBUG 09:29:59,791 range_slice
DEBUG 09:29:59,791 RangeSliceCommand{keyspace='Keyspace1',
column_family='Standard1', super_column=null,
predicate=SlicePredicate(column_names:[...@257b40fe]),
range=[121587881847328893689247922008234581399,0], max_keys=1000}
DEBUG 09:29:59,791 Adding to restricted ranges
[121587881847328893689247922008234581399,0] for
(75349581786326521367945210761838448174,75349581786326521367945210761838448174]
DEBUG 09:29:59,791 reading RangeSliceCommand{keyspace='Keyspace1',
column_family='Standard1', super_column=null,
predicate=SlicePredicate(column_names:[...@257b40fe]),
range=[121587881847328893689247922008234581399,0], max_keys=1000} from
1...@localhost/127.0.0.1
DEBUG 09:29:59,791 Sending RangeSliceReply{rows=Row(key='object1',
cf=ColumnFamily(Standard1
[636f6c5f6e616d65:false:9...@1272315595268837,])),Row(key='object3',
cf=ColumnFamily(Standard1 [636f6c5f6e616d65:false:9...@1272315595272693,]))}
to 1...@localhost/127.0.0.1
DEBUG 09:29:59,791 Processing response on a callback from
1...@localhost/127.0.0.1
DEBUG 09:29:59,791 range slices read object1
DEBUG 09:29:59,791 range slices read object3


In [38]: cass_test.read_range(conn, "object2", "")
Out[38]:
[KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595271798,name='col_name', value='col_value'), super_column=None)],key='object2'),KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595268837,name='col_name', value='col_value'), super_column=None)],key='object1'),KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595272693,name='col_name', value='col_value'), super_column=None)],key='object3')]
DEBUG 09:34:48,133 range_slice
DEBUG 09:34:48,133 RangeSliceCommand{keyspace='Keyspace1',
column_family='Standard1', super_column=null,
predicate=SlicePredicate(column_names:[...@7966340c]),
range=[28312518014678916505369931620527723964,0], max_keys=1000}
DEBUG 09:34:48,133 Adding to restricted ranges
[28312518014678916505369931620527723964,0] for
(75349581786326521367945210761838448174,75349581786326521367945210761838448174]
DEBUG 09:34:48,133 reading RangeSliceCommand{keyspace='Keyspace1',
column_family='Standard1', super_column=null,
predicate=SlicePredicate(column_names:[...@7966340c]),
range=[28312518014678916505369931620527723964,0], max_keys=1000} from
1...@localhost/127.0.0.1
DEBUG 09:34:48,133 Sending RangeSliceReply{rows=Row(key='object2',
cf=ColumnFamily(Standard1
[636f6c5f6e616d65:false:9...@1272315595271798,])),Row(key='object1',
cf=ColumnFamily(Standard1
[636f6c5f6e616d65:false:9...@1272315595268837,])),Row(key='object3',
cf=ColumnFamily(Standard1 [636f6c5f6e616d65:false:9...@1272315595272693,]))}
to 1...@localhost/127.0.0.1
DEBUG 09:34:48,133 Processing response on a callback from
1...@localhost/127.0.0.1
DEBUG 09:34:48,133 range slices read object2
DEBUG 09:34:48,133 range slices read object1
DEBUG 09:34:48,133 range slices read object3


In [39]: cass_test.read_range(conn, "object3", "")
Out[39]:
[KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272315595272693,name='col_name', value='col_value'), super_column=None)],key='object3')]
DEBUG 09:35:26,090 range_slice
DEBUG 09:35:26,090 RangeSliceCommand{keyspace='Keyspace1',
column_family='Standard1', super_column=null,
predicate=SlicePredicate(column_names:[...@24e33e18]),
range=[123092639156685888118746480803115294277,0], max_keys=1000}
DEBUG 09:35:26,090 Adding to restricted ranges
[123092639156685888118746480803115294277,0] for
(75349581786326521367945210761838448174,75349581786326521367945210761838448174]
DEBUG 09:35:26,090 reading RangeSliceCommand{keyspace='Keyspace1',
column_family='Standard1', super_column=null,
predicate=SlicePredicate(column_names:[...@24e33e18]),
range=[123092639156685888118746480803115294277,0], max_keys=1000} from
1...@localhost/127.0.0.1
DEBUG 09:35:26,090 Sending RangeSliceReply{rows=Row(key='object3',
cf=ColumnFamily(Standard1 [636f6c5f6e616d65:false:9...@1272315595272693,]))}
to 1...@localhost/127.0.0.1
DEBUG 09:35:26,090 Processing response on a callback from
1...@localhost/127.0.0.1
DEBUG 09:35:26,090 range slices read object3



thanks
Aaron
On Sun, 25 Apr 2010 20:23:05 -0700, aaron <aa...@the-mortons.org>wrote:
I've been looking at the get_range_slices feature and have foundsome odd
behaviour I do not understand. Basically the keys returned in a range
query
do not match what I would expect to see. I think it may havesomething to
do with the ordering of keys that I don't know about, but I'm just
guessing.
On Cassandra v 0.6.1, single node local install; RandomPartitioner.Using
Python and my own thin wrapper around the Thrift Python API.

Step 1.

Insert 3 keys into the "Standard 1" column family, called "object 1"
"object 2" and "object 3", each with a single column called 'name'with a
value like 'object1'

Step 2.

Do a get_range_slices call in the "Standard 1" CF, for column names
["name"] with start_key "object1" and end_key "object3". I expectto seethree results, but I only see results for object1 and object2.Below are
the thrift types I'm passing into the Cassandra.Client object...

- ColumnParent(column_family='Standard1', super_column=None)
- SlicePredicate(column_names=['name'], slice_range=None)
- KeyRange(end_key='object3', start_key='object1', count=4000,
end_token=None, start_token=None)

and the output
[KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250258810439,
name='name', value='object1'), super_column=None)], key='object1'),
KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250271620362,
name='name', value='object3'), super_column=None)], key='object3')]

Step 3.
Modify the get_range_slices call, so the start_key is object2. Inthis
case
I expect to see 2 rows returned, but I get 3. Thrift args andreturn are
below...

- ColumnParent(column_family='Standard1', super_column=None)
- SlicePredicate(column_names=['name'], slice_range=None)
- KeyRange(end_key='object3', start_key='object2', count=4000,
end_token=None, start_token=None)

and the output
[KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250265190715,
name='name', value='object2'), super_column=None)], key='object2'),
KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250258810439,
name='name', value='object1'), super_column=None)], key='object1'),
KeySlice(columns=[ColumnOrSuperColumn(column=Column(timestamp=1272250271620362,
name='name', value='object3'), super_column=None)], key='object3')]
Can anyone explain these odd results? As I said I've got my ownpythonwrapper around the client, so I may be doing something wrong. ButI've
pulled out the thrift objects and they go in and out of the thrift
Cassandra.Client, so I think I'm ok. (I have not noticed a systematic
problem with my wrapper).
On a more general note, is there information on the sort order ofkeys
when
using key ranges? I'm guessing the hash of the keys is compared and I
wondering if the hash's of the keys maintain the order of theoriginalvalues? Also I assume the order is byte order, rather than ascii orutf8.
I was experimenting with the difference between column slicing andkey
slicing. In my I could write the keys in as column names (they are in
buckets) as well and slice there first, then use the results to tomake amulti key get. I'm trying to support features like, get me all thedata
where the key starts with "foo.bar".

Thanks for the fun project.

Aaron

Re: strange get_range_slices behaviour v0.6.1

Reply via email to