And Pedro has made sense of a world running amok, scared, and drunken stupor.
Regards, Gourav On Tue, Jul 26, 2016 at 2:01 PM, Pedro Rodriguez <ski.rodrig...@gmail.com> wrote: > I am not 100% as I haven't tried this out, but there is a huge difference > between the two. Both foreach and collect are actions irregardless of > whether or not the data frame is empty. > > Doing a collect will bring all the results back to the driver, possibly > forcing it to run out of memory. Foreach will apply your function to each > element of the DataFrame, but will do so across the cluster. This behavior > is useful for when you need to do something custom for each element > (perhaps save to a db for which there is no driver or something custom like > make an http request per element, careful here though due to overhead cost). > > In your example, I am going to assume that hrecords is something like a > list buffer. The reason that will be empty is that each worker will get > sent an empty list (its captured in the closure for foreach) and append to > it. The instance of the list at the driver doesn't know about what happened > at the workers so its empty. > > I don't know why Chanh's comment applies here since I am guessing the df > is not empty. > > On Tue, Jul 26, 2016 at 1:53 AM, kevin <kiss.kevin...@gmail.com> wrote: > >> thank you Chanh >> >> 2016-07-26 15:34 GMT+08:00 Chanh Le <giaosu...@gmail.com>: >> >>> Hi Ken, >>> >>> *blacklistDF -> just DataFrame * >>> Spark is lazy until you call something like* collect, take, write* it >>> will execute the hold process *like you do map or filter before you >>> collect*. >>> That mean until you call collect spark* do nothing* so you df would not >>> have any data -> can’t call foreach. >>> Call collect execute the process -> get data -> foreach is ok. >>> >>> >>> On Jul 26, 2016, at 2:30 PM, kevin <kiss.kevin...@gmail.com> wrote: >>> >>> blacklistDF.collect() >>> >>> >>> >> > > > -- > Pedro Rodriguez > PhD Student in Distributed Machine Learning | CU Boulder > UC Berkeley AMPLab Alumni > > ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423 > Github: github.com/EntilZha | LinkedIn: > https://www.linkedin.com/in/pedrorodriguezscience > >