Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

2016-04-30 Thread Mich Talebzadeh
Well Had to write a Scala code and compile it with Maven to make it work. Still doing it. The good thing as I expected it is doing a Direct Path Read (as opposed to the Conventional Path Read) from the source Oracle database. +--

Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

2016-04-30 Thread Mich Talebzadeh
yes I was thinking of that. use Spark to load JDBC data from Oracle and flush it into ORC table in Hive. Now I am using Spark 1.6.1 and JDBC driver as I recall (I raised a thread for it) throwing error. This was working under Spark 1.5.2. Cheers Dr Mich Talebzadeh LinkedIn * https://www.lin

Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

2016-04-30 Thread Jörn Franke
I do not think you make it faster by setting the execution engine to Spark. Especially with such an old Spark version. For such simple things such as "dump" bulk imports and exports, it does matter much less if it all what execution engine you use. There was recently a discussion on that on the

Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

2016-04-30 Thread Marcin Tustin
No, the execution engines are not in general interchangeable. The Hive project uses an abstraction layer to be able to plug different execution engines. I don't know if sqoop uses hive code, or if it uses an old version, or what. As with many things in the hadoop world, if you want to know if ther

Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

2016-04-30 Thread Mich Talebzadeh
Hi Marcin, It is the speed really. The speed in which data is digested into Hive. Sqoop is two stage as I understand. 1. Take the data out of RDMSD via JADB and put in on an external HDFS file 2. Read that file and insert into a Hive table The issue is the second part. In general I u

Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

2016-04-30 Thread Marcin Tustin
They're not simply interchangeable. sqoop is written to use mapreduce. I actually implemented my own replacement for sqoop-export in spark, which was extremely simple. It wasn't any faster, because the bottleneck was the receiving database. Is your motivation here speed? Or correctness? On Sat,

Making sqoop import use Spark engine as opposed to MapReduce for Hive

2016-04-30 Thread Mich Talebzadeh
Hi, What is the simplest way of making sqoop import use spark engine as opposed to the default mapreduce when putting data into hive table. I did not see any parameter for this in sqoop command line doc. Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2