Hi,
I was wondering if dataframe is considered thread safe. I know the spark
session and spark context are thread safe (and actually have tools to manage
jobs from different threads) but the question is, can I use the same dataframe
in both threads.
The idea would be to create a dataframe in the main thread and then in two sub
threads do different transformations and actions on it.
I understand that some things might not be thread safe (e.g. if I unpersist in
one thread it would affect the other. Checkpointing would cause similar
issues), however, I can't find any documentation as to what operations (if any)
are thread safe.
Thanks,
Assaf.