Application level pagination in Cassandra

Manu Chadha Thu, 30 Jul 2020 10:22:04 -0700

Hi

This question is part-Cassandra and part ScalarDB. I am using ScalarDB which 
provide ACID support on top of `Cassandra`. The library seem to be working 
well! Unfortunately, ScalarDB doesn't support pagination though so I have to 
implement it in the application.


Consider this scenario in which `P` is primary key, `C` is clustering key and 
`E` is other data within the partition

    Partition => { P,C1,E1
    P,C2,E1
    P,C2,E2
    P,C2,E3
    P,C2,E4
    P,C3,E1
    ...
    P,Cm,En
    }

In ScalarDB, I can specify start and end values of keys so I suppose ScalarDB 
will get data only from the specified rows. I can also limit the no. of entries 
fetched.

https://scalar-labs.github.io/scalardb/javadoc/com/scalar/db/api/Scan.html

Say I want to get entries `E3` and `E4` from `P,C2`. For smaller values, I can 
specify start and end clustering keys as C2 and set fetch limit to say 4 and 
ignore `E1` and `E2`. But if there are several hundred records then this method 
will not scale.

For example say `P,C1` has 10 records, `P,C2` has 100 records and I want to 
implement pagination of 20 records per query. Then to implement this, I'll have 
to
Query 1 – Scan – primary key will be P, clustering start will be C1, clustering 
end will be Cn as I don’t know how many records are there.
- get `P,C1`. This will give 10 records
- get `P,C2`. This will give me 20 records. I'll ignore last 10 and combine 
`P,C1`'s 10 with `P,C2`'s first 10 and return the result.

I'll also have to maintain that the last cluster key queried was `C2` and also 
that 10 records were fetched from it.

Query 2 (for next pagination request) - Scan – primary key will be P, 
clustering start will be C2, clustering end will be Cn as I don’t know how many 
records are there.
Now I'll fetch `P,C2` and get 20, ignore 1st 10 (as they were sent last time), 
take the remaining 10, do another fetch using same Scan and take first 10 from 
that.

Is this how it should be done or is there a better way? My concern with above 
implementation is that every time I'll have to fetch loads of records and dump 
them. For example, say I want to get records 70-90 from `P,C2` then I'll  still 
query up to record 60 and dump the result!

Thanks
Manu

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

Application level pagination in Cassandra

Reply via email to