Totally depends on your database, if that's a NoSQL database like MongoDB/HBase etc then you can use the native .saveAsNewAPIHAdoopFile or .saveAsHadoopDataSet etc.
For a SQL databases, i think people usually puts the overhead on driver like you did. Thanks Best Regards On Wed, Mar 18, 2015 at 10:52 PM, Praveen Balaji <prav...@soundhound.com> wrote: > I was wondering what people generally do about doing database operations > from executor nodes. I’m (at least for now) avoiding doing database updates > from executor nodes to avoid proliferation of database connections on the > cluster. The general pattern I adopt is to collect queries (or tuples) on > the executors and write to the database on the driver. > > // Executes on the executor > rdd.foreach(s => { > val query = s"insert into .... ${s}"; > accumulator += query; > }); > > // Executes on the driver > acclumulator.value.foreach(query => { > // get connection > // update database > }); > > I’m obviously trading database connections for driver heap. How do other > spark users do it? > > Cheers > Praveen > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >