the RDD or the dataframe is distributed and partitioned by Spark so as to
leverage all your workers (CPUs) effectively. So all the Dataframe
operations are actually happening simultaneously on a section of the data.
Why do you want to use threading here?

Thanks,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>




On Tue, Nov 12, 2019 at 7:18 AM Chang Chen <baibaic...@gmail.com> wrote:

>
> Hi all
>
> I meet a case where I need cache a source RDD, and then create different
> DataFrame from it in different threads to accelerate query.
>
> I know that SparkSession is thread safe(
> https://issues.apache.org/jira/browse/SPARK-15135), but i am not sure
> whether RDD  si thread safe or not
>
> Thanks
> Chang
>

Reply via email to