Re: Use case advice

Artemis User Sat, 09 Jan 2021 09:18:42 -0800

Could you please clarify what do you mean by 1)? Driver is onlyresponsible for submitting Spark job, not performing.


-- ND


On 1/9/21 9:35 AM, András Kolbert wrote:

Hi,
I would like to get your advice on my use case.
I have a few spark streaming applications where I need to keepupdating a dataframe after each batch. Each batch probably affects asmall fraction of the dataframe (5k out of 200k records).
The options I have been considering so far:
1) keep dataframe on the driver, and update that after each batch
2) keep dataframe distributed, and use checkpointing to mitigate lineage
I solved previous use cases with option 2, but I am not sure if it isthe most optimal as checkpointing is relatively expensive. I alsowondered about HBASE or some sort of quick access memory storage,however it is currently not in my stack.
Curious to hear your thoughts

Andras


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Use case advice

Reply via email to