Hi all,
I am trying to do some image analytics type workload using Spark. The
images are read in JPEG format and then are converted to the raw format in
map functions and this causes the size of the partitions to grow by an
order of 1. In addition to this, I am caching some of the data because my
0-13 14:50 GMT-07:00 Vadim Semenov :
>>
>>> When you do `Dataset.rdd` you actually create a new job
>>>
>>> here you can see what it does internally:
>>> https://github.com/apache/spark/blob/master/sql/core/src/mai
>>> n/scala/org/apache/spark/sql/
he the new RDD.
>
> On Fri, Oct 13, 2017 at 3:35 PM, Supun Nakandala <
> supun.nakand...@gmail.com> wrote:
>
>> Hi all,
>>
>> I have been experimenting with cache/persist/unpersist methods with
>> respect to both Dataframes and RDD APIs. However, I am experie
Hi all,
I have been experimenting with cache/persist/unpersist methods with respect
to both Dataframes and RDD APIs. However, I am experiencing different
behaviors Ddataframe API compared RDD API such Dataframes are not getting
cached when count() is called.
Is there a difference between how thes