Re: SparkSQL parallelism

Rishi Mishra Thu, 11 Feb 2016 22:30:07 -0800

I am not sure why all 3 nodes should query.  If you have not mentioned any
partitions it should only be one partition of JDBCRDD where all dataset
should reside.



On Fri, Feb 12, 2016 at 10:15 AM, Madabhattula Rajesh Kumar <
mrajaf...@gmail.com> wrote:

> Hi,
>
> I have a spark cluster with One Master and 3 worker nodes. I have written
> a below code to fetch the records from oracle using sparkSQL
>
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> val employees = sqlContext.read.format("jdbc").options(
>     Map("url" -> "jdbc:oracle:thin:@xxxx:1525:SID",
>     "dbtable" -> "(select * from employee where name like '%18%')",
>     "user" -> "username",
>     "password" -> "password")).load
>
> I have a submitted this job to spark cluster using spark-submit command.
>
>
>
> *Looks like, All 3 workers are executing same query and fetching same
> data. It means, it is making 3 jdbc calls to oracle.*
> *How to make this code to make a single jdbc call to oracle(In case of
> more than one worker) ?*
>
> Please help me to resolve this use case
>
> Regards,
> Rajesh
>
>
>


-- 
Regards,
Rishitesh Mishra,
SnappyData . (http://www.snappydata.io/)

https://in.linkedin.com/in/rishiteshmishra

Re: SparkSQL parallelism

Reply via email to