Re: using a database connection pool to write data into an RDBMS from a Spark application

Sean Owen Fri, 20 Feb 2015 10:13:39 -0800

Hm, others can correct me if I'm wrong, but is this what SPARK_CLASSPATH is for?


On Fri, Feb 20, 2015 at 6:04 PM, Mohammed Guller <moham...@glassbeam.com> wrote:
> It looks like spark.files.userClassPathFirst gives precedence to user 
> libraries only on the worker nodes. Is there something similar to achieve the 
> same behavior on the master?
>
> BTW, I am running Spark in stand-alone mode.
>
> Mohammed
>
>
> -----Original Message-----
> From: Sean Owen [mailto:so...@cloudera.com]
> Sent: Friday, February 20, 2015 9:42 AM
> To: Mohammed Guller
> Cc: Kelvin Chu; user@spark.apache.org
> Subject: Re: using a database connection pool to write data into an RDBMS 
> from a Spark application
>
> Have a look at spark.yarn.user.classpath.first and 
> spark.files.userClassPathFirst for a possible way to give your copy of the 
> libs precedence.
>
> On Fri, Feb 20, 2015 at 5:20 PM, Mohammed Guller <moham...@glassbeam.com> 
> wrote:
>> Sean,
>> I know that Class.forName is not required since Java 1.4 :-) It was just a 
>> desperate attempt  to make sure that the Postgres driver is getting loaded. 
>> Since Class.forName("org.postgresql.Driver") is not throwing an exception, I 
>> assume that the driver is available in the classpath. Is that not true?
>>
>> I did some more troubleshooting and here is what I found:
>> 1) The hive libraries used by Spark use BoneCP 0.7.1
>> 2) When Spark master is started, it initializes BoneCP, which will not
>> load any database driver at that point (that makes sense)
>> 3) When my application initializes BoneCP, it thinks it is already 
>> initialized and does not load the Postgres driver ( this is a known bug in 
>> 0.7.1). This bug is fixed in BoneCP 0.8.0 release.
>>
>> So I linked my app with BoneCP 0.8.0 release, but when I run my app using 
>> spark-submit, Spark continues to use BoneCP 0.7.1. How do I override that 
>> behavior? How do I make spark-submit script unload BoneCP 0.7.1 and load 
>> BoneCP 0.8.0? I tried the --jars and --driver-classpath flags, but it didn't 
>> help.
>>
>> Thanks,
>> Mohammed
>>
>>
>> -----Original Message-----
>> From: Sean Owen [mailto:so...@cloudera.com]
>> Sent: Friday, February 20, 2015 2:06 AM
>> To: Mohammed Guller
>> Cc: Kelvin Chu; user@spark.apache.org
>> Subject: Re: using a database connection pool to write data into an
>> RDBMS from a Spark application
>>
>> Although I don't know if it's related, the Class.forName() method of loading 
>> drivers is very old. You should be using DataSource and javax.sql; this has 
>> been the usual practice since about Java 1.4.
>>
>> Why do you say a different driver is being loaded? that's not the error here.
>>
>> Try instantiating the driver directly to test whether it's available in the 
>> classpath. Otherwise you would have to check whether the jar exists, the 
>> class exists in it, and it's really on your classpath.
>>
>> On Fri, Feb 20, 2015 at 5:27 AM, Mohammed Guller <moham...@glassbeam.com> 
>> wrote:
>>> Hi Kelvin,
>>>
>>>
>>>
>>> Yes. I am creating an uber jar with the Postgres driver included, but
>>> nevertheless tried both –jars and –driver-classpath flags. It didn’t help.
>>>
>>>
>>>
>>> Interestingly, I can’t use BoneCP even in the driver program when I
>>> run my application with spark-submit. I am getting the same exception
>>> when the application initializes BoneCP before creating SparkContext.
>>> It looks like Spark is loading a different version of the Postgres
>>> JDBC driver than the one that I am linking.
>>>
>>>
>>>
>>> Mohammed
>>>
>>>
>>>
>>> From: Kelvin Chu [mailto:2dot7kel...@gmail.com]
>>> Sent: Thursday, February 19, 2015 7:56 PM
>>> To: Mohammed Guller
>>> Cc: user@spark.apache.org
>>> Subject: Re: using a database connection pool to write data into an
>>> RDBMS from a Spark application
>>>
>>>
>>>
>>> Hi Mohammed,
>>>
>>>
>>>
>>> Did you use --jars to specify your jdbc driver when you submitted your job?
>>> Take a look of this link:
>>> http://spark.apache.org/docs/1.2.0/submitting-applications.html
>>>
>>>
>>>
>>> Hope this help!
>>>
>>>
>>>
>>> Kelvin
>>>
>>>
>>>
>>> On Thu, Feb 19, 2015 at 7:24 PM, Mohammed Guller
>>> <moham...@glassbeam.com>
>>> wrote:
>>>
>>> Hi –
>>>
>>> I am trying to use BoneCP (a database connection pooling library) to
>>> write data from my Spark application to an RDBMS. The database
>>> inserts are inside a foreachPartition code block. I am getting this
>>> exception when the code tries to insert data using BoneCP:
>>>
>>>
>>>
>>> java.sql.SQLException: No suitable driver found for
>>> jdbc:postgresql://hostname:5432/dbname
>>>
>>>
>>>
>>> I tried explicitly loading the Postgres driver on the worker nodes by
>>> adding the following line inside the foreachPartition code block:
>>>
>>>
>>>
>>> Class.forName("org.postgresql.Driver")
>>>
>>>
>>>
>>> It didn’t help.
>>>
>>>
>>>
>>> Has anybody able to get a database connection pool library to work
>>> with Spark? If you got it working, can you please share the steps?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Mohammed
>>>
>>>
>>>
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: using a database connection pool to write data into an RDBMS from a Spark application

Reply via email to