Re: flatMap() returning large class

2017-12-18 Thread Richard Garris
Hi Don, It’s not so much map() vs flatMap(). You can return a collection and have Spark flatten the result. My point was more to change from Seq[BigDataStructure] to Seq[SmallDataStructure] If the use case is really storing image data - I would try to use Seq[Vector] and store the values as a

Re: flatMap() returning large class

2017-12-17 Thread Don Drake
Hey Richard, Good to hear from you as well. I thought I would ask if there was something Scala specific I was missing in handling these large classes. I can tweak my job to do a map() and then only one large object will be created at a time and returned, which should allow me to lower my

Re: flatMap() returning large class

2017-12-14 Thread Richard Garris
Hi Don, Good to hear from you. I think the problem is that regardless of whether you use yield or a generator - Spark internally will produce the entire result as a single large JVM object which will blow up your heap space. Would it be possible to shrink the overall size of the image object

Re: flatMap() returning large class

2017-12-14 Thread Marcelo Vanzin
This sounds like something mapPartitions should be able to do, not sure if there's an easier way. On Thu, Dec 14, 2017 at 10:20 AM, Don Drake wrote: > I'm looking for some advice when I have a flatMap on a Dataset that is > creating and returning a sequence of a new case

flatMap() returning large class

2017-12-14 Thread Don Drake
I'm looking for some advice when I have a flatMap on a Dataset that is creating and returning a sequence of a new case class (Seq[BigDataStructure]) that contains a very large amount of data, much larger than the single input record (think images). In python, you can use generators (yield) to