Hi Anuj, That's a very good question and I'd like to hear an answer from anyone who can give a detailed answer, but in the mean time I'll try to give my two cents.
First of all I think I'd rather split all the values into different partition keys for two reasons: 1.- If you're sure you're accessing all data at the same time you'll be able to parallelize the queries by hitting more nodes on your cluster rather than creating a hotspot on the owner(s) of the data. 2.- It is a recommended good practice to keep partitions small enough. Check if your partition would fit in the good practice by applying the formulae from this video: https://academy.datastax.com/courses/ds220-data-modeling/physical-partition-size Cheers! Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso> On 23 April 2016 at 20:25, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote: > Hi, > > Can anyone take this question? > > Thanks > Anuj > > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > > On Sat, 23 Apr, 2016 at 2:30 PM, Anuj Wadehra > <anujw_2...@yahoo.co.in> wrote: > I think I complicated the question..so I am trying to put the question > crisply.. > > We have a table defined with clustering key/column. We have 50000 > different clustering key values. > > If we want to fetch all 50000 rowd,Which query option would be faster and > why? > > 1. Given a single primary key/partition key with 50000 clustering keys..we > will page through the single partition 500 records at a time.Thus, we will > do 50000/500=100 db hits but for same partition key. > > 2. Given 100 different primary keys with each primary key having just 500 > clustering key columns. Here also we will need 100 db hits but for > different partitions. > > > Basically I want to understand any optimizations built into CQL/Cassandra > which make paging through a single partition more efficient than querying > data from different partitions. > > > Thanks > Anuj > > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > > On Fri, 22 Apr, 2016 at 8:27 PM, Anuj Wadehra > <anujw_2...@yahoo.co.in> wrote: > Hi, > > I have a wide row index table so that I can fetch all row keys > corresponding to a column value. > > Row of index_table will look like: > > ColValue1:bucket1 >> rowkey1, rowkey2.. rowkeyn > ...... > ColValue1:bucketn>> rowkey1, rowkey2.. rowkeyn > > We will have buckets to avoid hotspots. Row keys of main table are random > numbers and we will never do column slice like: > > Select * from index_table where key=xxx and > Col > rowkey1 and col < rowkey10 > > Also, we will ALWAYS fetch all data for a given value of index column. > Thus all buckets havr to be read. > > Each index column value can map to thousands-millions of row keys in main > table. > > Based on our use case, there are two design choices in front of me: > > 1. Have large number of buckets/rows for an index column value and have > lesser data ( around few thousands) in each row. > > Thus, every time we want to fetch all row keys for an index col value, we > will query more rows and for each row we will have to page through data 500 > records at a time. > > 2. Have fewer buckets/rows for an index column value. > > Every time we want to fetch all row keys for an index col value, we will > query data less numner of wider rows and then page through each wide row > reading 500 columns at a time. > > > Which approach is more efficient? > > Approach1: More number of rows with less data in each row. > > > OR > > Approach 2: less number of rows with more data in each row > > > Either ways, we are fetching only 500 records at a time in a query. Even > in approach 2 (wider rows) , we can query only small data of 500 at a time. > > > Thanks > Anuj > > > > > >