Re: Efficient Paging Option in Wide Rows

Anuj Wadehra Sat, 23 Apr 2016 02:00:28 -0700

I think I complicated the question..so I am trying to put the question crisply..
We have a table defined with clustering key/column. We have  50000 different 
clustering key values. 
If we want to fetch all 50000 rowd,Which query option would be faster and why?
1. Given a single primary key/partition key with 50000 clustering keys..we will 
page through the single partition 500 records at a time.Thus, we will do 
50000/500=100 db hits but for same partition key.
2. Given 100 different primary keys with each primary key having just 500 
clustering key columns. Here also we will need 100 db hits but for different 
partitions.


Basically I want to understand any optimizations built into CQL/Cassandra which 
make paging through a single partition more efficient than querying data from 
different partitions.

ThanksAnuj
Sent from Yahoo Mail on Android 
 
  On Fri, 22 Apr, 2016 at 8:27 PM, Anuj Wadehra<anujw_2...@yahoo.co.in> wrote:  
 Hi,
I have a wide row index table so that I can fetch all row keys corresponding to 
a column value. 
Row of index_table will look like:
ColValue1:bucket1 >> rowkey1, rowkey2.. rowkeyn......ColValue1:bucketn>> 
rowkey1, rowkey2.. rowkeyn
We will have buckets to avoid hotspots. Row keys of main table are random 
numbers and we will never do column slice like:

Select * from index_table where key=xxx and Col > rowkey1 and col < rowkey10
Also, we will ALWAYS fetch all data for a given value of index column. Thus all 
buckets havr to be read.
Each index column value can map to thousands-millions of row keys in main table.
Based on our use case, there are two design choices in front of me:
1. Have large number of buckets/rows for an index column value and have lesser 
data ( around few thousands) in each row.
Thus, every time we want to fetch all row keys for an index col value, we will 
query more rows and for each row we will have to page through data 500 records 
at a time.
2. Have fewer buckets/rows for an index column value.
Every time we want to fetch all row keys for an index col value, we will query 
data less numner of wider rows and then page through each wide row reading 500 
columns at a time.

Which approach is more efficient?
 Approach1: More number of rows with less data in each row.

OR
Approach 2: less number of  rows with more data in each row

Either ways,  we are fetching only 500 records at a time in a query. Even in 
approach 2 (wider rows) , we can query only small data of 500 at a time.

ThanksAnuj

Re: Efficient Paging Option in Wide Rows

Reply via email to