yes I was thinking of that. use Spark to load JDBC data from Oracle and flush it into ORC table in Hive.
Now I am using Spark 1.6.1 and JDBC driver as I recall (I raised a thread for it) throwing error. This was working under Spark 1.5.2. Cheers Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 30 April 2016 at 15:20, Marcin Tustin <mtus...@handybook.com> wrote: > No, the execution engines are not in general interchangeable. The Hive > project uses an abstraction layer to be able to plug different execution > engines. I don't know if sqoop uses hive code, or if it uses an old > version, or what. > > As with many things in the hadoop world, if you want to know if there's > something undocumented, your best bet is to look at the source code. > > My suggestion would be to (1) make sure you're executing somewhere close > to the data - i.e. on nodemanagers colocated with datanodes; (2) profile to > make sure the slowness really is where you think; and (3) if you really > can't get the speed you need, try writing a small spark job to do the > export. Newer versions of spark seem faster. > > > On Sat, Apr 30, 2016 at 10:05 AM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> Hi Marcin, >> >> It is the speed really. The speed in which data is digested into Hive. >> >> Sqoop is two stage as I understand. >> >> >> 1. Take the data out of RDMSD via JADB and put in on an external HDFS >> file >> 2. Read that file and insert into a Hive table >> >> The issue is the second part. In general I use Hive 2 with Spark 1.3.1 >> engine to put data into Hive table. I wondered if there was such a >> parameter in Sqoop to use Spark engine. >> >> Well I gather this is easier said that done. I am importing 1 billion >> rows table from Oracle >> >> sqoop import --connect "jdbc:oracle:thin:@rhes564:1521:mydb12" >> --username scratchpad -P \ >> --query "select * from scratchpad.dummy where \ >> \$CONDITIONS" \ >> --split-by ID \ >> --hive-import --hive-table "oraclehadoop.dummy" --target-dir >> "dummy" >> >> >> Now the fact that in hive-site.xml I have set hive.execution.engine=spark >> does not matter. Sqoop seems to internally set hive.execution.engine=mr >> anyway. >> >> May be there should be an option --hive-execution-engine='mr/tez/spak' >> etc in above command? >> >> Cheers, >> >> Mich >> >> >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 30 April 2016 at 14:51, Marcin Tustin <mtus...@handybook.com> wrote: >> >>> They're not simply interchangeable. sqoop is written to use mapreduce. >>> >>> I actually implemented my own replacement for sqoop-export in spark, >>> which was extremely simple. It wasn't any faster, because the bottleneck >>> was the receiving database. >>> >>> Is your motivation here speed? Or correctness? >>> >>> On Sat, Apr 30, 2016 at 8:45 AM, Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> What is the simplest way of making sqoop import use spark engine as >>>> opposed to the default mapreduce when putting data into hive table. I did >>>> not see any parameter for this in sqoop command line doc. >>>> >>>> Thanks >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> >>>> LinkedIn * >>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> >>> >>> >>> Want to work at Handy? Check out our culture deck and open roles >>> <http://www.handy.com/careers> >>> Latest news <http://www.handy.com/press> at Handy >>> Handy just raised $50m >>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> >>> led >>> by Fidelity >>> >>> >> > > Want to work at Handy? Check out our culture deck and open roles > <http://www.handy.com/careers> > Latest news <http://www.handy.com/press> at Handy > Handy just raised $50m > <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> > led > by Fidelity > >