Re: spark optimized pagination

Teemu Heikkilä Mon, 11 Jun 2018 01:00:03 -0700

So you are now providing the data on-demand through spark?

I suggest you change your API to query from cassandra and store the results 
from Spark back there, that way you will have to process the whole dataset just 
once and cassandra is suitable for that kind of workloads.


-T

> On 10 Jun 2018, at 8.12, onmstester onmstester <onmstes...@zoho.com> wrote:
> 
> Hi,
> I'm using spark on top of cassandra as backend CRUD of a Restfull Application.
> Most of Rest API's retrieve huge amount of data from cassandra and doing a 
> lot of aggregation on them  in spark which take some seconds.
> 
> Problem: sometimes the output result would be a big list which make client 
> browser throw stop script, so we should paginate the result at the 
> server-side,
> but it would be so annoying for user to wait some seconds on each page to 
> cassandra-spark processings,
> 
> Current Dummy Solution: For now i was thinking about assigning a UUID to each 
> request which would be sent back and forth between server-side and 
> client-side,
> the first time a rest API invoked, the result would be saved in a temptable  
> and in subsequent similar requests (request for next pages) the result would 
> be fetch from
> temptable (instead of common flow of retrieve from cassandra + aggregation in 
> spark which would take some time). On memory limit, the old results would be 
> deleted.
> 
> Is there any built-in clean caching strategy in spark to handle such 
> scenarios?
> 
> Sent using Zoho Mail <https://www.zoho.com/mail/>
> 
>

Re: spark optimized pagination

Reply via email to