Hi Fellows, 

I have the following design for a system which holds basically key->value pairs 
(aka Columns) for each user (SuperColumn Key) in different namespaces 
(SuperColumnFamily row key). 

Like this: 

Namesapce->user->column_name = column_value; 

keyspaces: 
- name: NKVP 
replica_placement_strategy: org.apache.cassandra.locator.RackUnawareStrategy 
replication_factor: 3 
column_families: 
- name: Namespaces 
column_type: Super 
compare_with: BytesType 
compare_subcolumns_with: BytesType 
rows_cached: 20000 
keys_cached: 100 

Cluster using random partitioner. 

I use multiget_slice() for fetching 1 or many columns inside the child 
supercolumn at the same time. This is an awkward performance result I get: 

100 sequential reads completed in : 0.383 this uses multiget_slice() with 1 
key, and 1 column name inside the predicate->column_names 
100 batch loaded completed in : 0.786 this uses multiget_slice() with 1 key, 
and multiple column names inside the predicate->column_names 

read/write consistency are ONE. 

Questions: 

Why doing 100 sequential reads is faster than doing 100 in batch? 
Is this a good design for my problem? 
Does my issue relate to https://issues.apache.org/jira/browse/CASSANDRA-598? 

Now on a single node with replication factor 1 I get this: 

100 sequential reads completed in : 0.438 
100 batch loaded completed in : 0.800 

Please advice as to why is this happening? 

These nodes are VMs. 1 CPU and 1 Gb. 

Best Regards, 
=Arya 







Reply via email to