Re: Save an RDD to a SQL Database

Flavio Pompermaier Thu, 07 Aug 2014 11:27:39 -0700

Isn't sqoop export meant for that?

http://hadooped.blogspot.it/2013/06/apache-sqoop-part-3-data-transfer.html?m=1
On Aug 7, 2014 7:59 PM, "Nicholas Chammas" <nicholas.cham...@gmail.com>
wrote:


> Vida,
>
> What kind of database are you trying to write to?
>
> For example, I found that for loading into Redshift, by far the easiest
> thing to do was to save my output from Spark as a CSV to S3, and then load
> it from there into Redshift. This is not a slow as you think, because Spark
> can write the output in parallel to S3, and Redshift, too, can load data
> from multiple files in parallel
> <http://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-single-copy-command.html>
> .
>
> Nick
>
>
> On Thu, Aug 7, 2014 at 1:52 PM, Vida Ha <v...@databricks.com> wrote:
>
>> The use case I was thinking of was outputting calculations made in Spark
>> into a SQL database for the presentation layer to access.  So in other
>> words, having a Spark backend in Java that writes to a SQL database and
>> then having a Rails front-end that can display the data nicely.
>>
>>
>> On Thu, Aug 7, 2014 at 8:42 AM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> On Thu, Aug 7, 2014 at 11:25 AM, Cheng Lian <lian.cs....@gmail.com>
>>> wrote:
>>>
>>>> Maybe a little off topic, but would you mind to share your motivation
>>>> of saving the RDD into an SQL DB?
>>>
>>>
>>> Many possible reasons (Vida, please chime in with yours!):
>>>
>>>    - You have an existing database you want to load new data into so
>>>    everything's together.
>>>    - You want very low query latency, which you can probably get with
>>>    Spark SQL but currently not with the ease you can get it from your 
>>> average
>>>    DBMS.
>>>    - Tooling around traditional DBMSs is currently much more mature
>>>    than tooling around Spark SQL, especially in the JDBC area.
>>>
>>> Nick
>>>
>>
>>
>

Re: Save an RDD to a SQL Database

Reply via email to