Vida, What kind of database are you trying to write to?
For example, I found that for loading into Redshift, by far the easiest thing to do was to save my output from Spark as a CSV to S3, and then load it from there into Redshift. This is not a slow as you think, because Spark can write the output in parallel to S3, and Redshift, too, can load data from multiple files in parallel <http://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-single-copy-command.html> . Nick On Thu, Aug 7, 2014 at 1:52 PM, Vida Ha <v...@databricks.com> wrote: > The use case I was thinking of was outputting calculations made in Spark > into a SQL database for the presentation layer to access. So in other > words, having a Spark backend in Java that writes to a SQL database and > then having a Rails front-end that can display the data nicely. > > > On Thu, Aug 7, 2014 at 8:42 AM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> On Thu, Aug 7, 2014 at 11:25 AM, Cheng Lian <lian.cs....@gmail.com> >> wrote: >> >>> Maybe a little off topic, but would you mind to share your motivation of >>> saving the RDD into an SQL DB? >> >> >> Many possible reasons (Vida, please chime in with yours!): >> >> - You have an existing database you want to load new data into so >> everything's together. >> - You want very low query latency, which you can probably get with >> Spark SQL but currently not with the ease you can get it from your average >> DBMS. >> - Tooling around traditional DBMSs is currently much more mature than >> tooling around Spark SQL, especially in the JDBC area. >> >> Nick >> > >