Well Had to write a Scala code and compile it with Maven to make it work.
Still doing it. The good thing as I expected it is doing a Direct Path Read
(as opposed to the Conventional Path Read) from the source Oracle database.
+--
yes I was thinking of that. use Spark to load JDBC data from Oracle and
flush it into ORC table in Hive.
Now I am using Spark 1.6.1 and JDBC driver as I recall (I raised a thread
for it) throwing error.
This was working under Spark 1.5.2.
Cheers
Dr Mich Talebzadeh
LinkedIn *
https://www.lin
I do not think you make it faster by setting the execution engine to Spark.
Especially with such an old Spark version.
For such simple things such as "dump" bulk imports and exports, it does matter
much less if it all what execution engine you use.
There was recently a discussion on that on the
No, the execution engines are not in general interchangeable. The Hive
project uses an abstraction layer to be able to plug different execution
engines. I don't know if sqoop uses hive code, or if it uses an old
version, or what.
As with many things in the hadoop world, if you want to know if ther
Hi Marcin,
It is the speed really. The speed in which data is digested into Hive.
Sqoop is two stage as I understand.
1. Take the data out of RDMSD via JADB and put in on an external HDFS
file
2. Read that file and insert into a Hive table
The issue is the second part. In general I u
They're not simply interchangeable. sqoop is written to use mapreduce.
I actually implemented my own replacement for sqoop-export in spark, which
was extremely simple. It wasn't any faster, because the bottleneck was the
receiving database.
Is your motivation here speed? Or correctness?
On Sat,
Hi,
What is the simplest way of making sqoop import use spark engine as opposed
to the default mapreduce when putting data into hive table. I did not see
any parameter for this in sqoop command line doc.
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2