Correct me if I'm wrong, but he can actually run thus code without broadcasting the users map, however the code will be less efficient.
czw., 26 lut 2015, 12:31 PM Sean Owen użytkownik <so...@cloudera.com> napisał: > Yes, but there is no concept of executors 'deleting' an RDD. And you > would want to broadcast the usersMap if you're using it this way. > > On Thu, Feb 26, 2015 at 11:26 AM, Guillermo Ortiz <konstt2...@gmail.com> > wrote: > > One last time to be sure I got it right, the executing sequence here > > goes like this?: > > > > val usersMap = contacts.collectAsMap() > > #The contacts RDD is collected by the executors and sent to the > > driver, the executors delete the rdd > > contacts.map(v => (v._1, (usersMap(v._1), v._2))).collect() > > #The userMap object is sent again to the executors to run the code, > > and with the collect(), the result is sent again back to the driver > > > > > > 2015-02-26 11:57 GMT+01:00 Sean Owen <so...@cloudera.com>: > >> Yes, in that code, usersMap has been serialized to every executor. > >> I thought you were referring to accessing the copy in the driver. > >> > >> On Thu, Feb 26, 2015 at 10:47 AM, Guillermo Ortiz <konstt2...@gmail.com> > wrote: > >>> Isn't it "contacts.map(v => (v._1, (usersMap(v._1), v._2))).collect()" > >>> executed in the executors? why is it executed in the driver? > >>> contacts are not a local object, right? > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >