from:"Supun Nakandala"

Spark 2.2.0 GC Overhead Limit Exceeded and OOM errors in the executors

2017-10-27 Thread Supun Nakandala

Hi all, I am trying to do some image analytics type workload using Spark. The images are read in JPEG format and then are converted to the raw format in map functions and this causes the size of the partitions to grow by an order of 1. In addition to this, I am caching some of the data because my

Re: Is there a difference between df.cache() vs df.rdd.cache()

2017-10-13 Thread Supun Nakandala

0-13 14:50 GMT-07:00 Vadim Semenov : >> >>> When you do `Dataset.rdd` you actually create a new job >>> >>> here you can see what it does internally: >>> https://github.com/apache/spark/blob/master/sql/core/src/mai >>> n/scala/org/apache/spark/sql/

Re: Is there a difference between df.cache() vs df.rdd.cache()

2017-10-13 Thread Supun Nakandala

he the new RDD. > > On Fri, Oct 13, 2017 at 3:35 PM, Supun Nakandala < > supun.nakand...@gmail.com> wrote: > >> Hi all, >> >> I have been experimenting with cache/persist/unpersist methods with >> respect to both Dataframes and RDD APIs. However, I am experie

Is there a difference between df.cache() vs df.rdd.cache()

2017-10-13 Thread Supun Nakandala

Hi all, I have been experimenting with cache/persist/unpersist methods with respect to both Dataframes and RDD APIs. However, I am experiencing different behaviors Ddataframe API compared RDD API such Dataframes are not getting cached when count() is called. Is there a difference between how thes

Spark 2.2.0 GC Overhead Limit Exceeded and OOM errors in the executors

Re: Is there a difference between df.cache() vs df.rdd.cache()

Re: Is there a difference between df.cache() vs df.rdd.cache()

Is there a difference between df.cache() vs df.rdd.cache()

4 matches

Site Navigation

Mail list logo

Footer information