Spark JDBCRDD query

sunil m Wed, 18 Nov 2015 19:44:55 -0800

Hello Spark experts!
I am new to Spark and i have the following query...

What I am trying to do:  Run a spark 1.5.1 job local[*] on a 4 core CPU.
This will ping oracle data base and fetch 5000 records each in jdbcRDD, I
 increase the number of partitions by 1 for every 5000 records i fetch.
I have taken care that all partitions get same count of records.


What i expected to happen ideally : All tasks will start at same time T0
ping oracle database in parallel  store value in JDBCRDD and finish in
parallel at T1.

What I Observed : There was one task for every partition, Tasks on Web-UI
were staggered, some were spawned or scheduled way after first task was
scheduled.

Is there a configuration to change how many tasks can run simultaneously on
a executor  core? Or in other words IS it possible that  one core get more
than one task which can run simultaneously on that core?

Thanks...

Warm regards,
Sunil M.

Spark JDBCRDD query

Reply via email to