Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

Jörn Franke Sat, 30 Apr 2016 07:22:11 -0700

I do not think you make it faster by setting the execution engine to Spark. 
Especially with such an old Spark version. 
For such simple things such as "dump" bulk imports and exports, it does matter 
much less if it all what execution engine you use.
There was recently a discussion on that on the Spark? or Sqoop mailing list.
With 1 billion rows it will be more the Oracle database which is a bottleneck.


> On 30 Apr 2016, at 16:05, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:
> 
> Hi Marcin,
>  
> It is the speed really. The speed in which data is digested into Hive.
>  
> Sqoop is two stage as I understand.
>  
> Take the data out of RDMSD via JADB and put in on an external HDFS file
> Read that file and insert into a Hive table
>  The issue is the second part. In general I use Hive 2 with Spark 1.3.1 
> engine to put data into Hive table. I wondered if there was such a parameter 
> in Sqoop to use Spark engine.
> 
> Well I gather this is easier said that done.  I am importing 1 billion rows 
> table from Oracle
> 
> sqoop import --connect "jdbc:oracle:thin:@rhes564:1521:mydb12" --username 
> scratchpad -P \
>         --query "select * from scratchpad.dummy where \
>         \$CONDITIONS" \
>         --split-by ID \
>         --hive-import  --hive-table "oraclehadoop.dummy" --target-dir "dummy"
> 
> 
> Now the fact that in hive-site.xml I have set hive.execution.engine=spark 
> does not matter. Sqoop seems to internally set  hive.execution.engine=mr 
> anyway.
> 
> May be there should be an option   --hive-execution-engine='mr/tez/spak' etc 
> in above command?
> 
> Cheers,
> 
> Mich
> 
> 
>  
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>  
> 
>> On 30 April 2016 at 14:51, Marcin Tustin <mtus...@handybook.com> wrote:
>> They're not simply interchangeable. sqoop is written to use mapreduce.
>> 
>> I actually implemented my own replacement for sqoop-export in spark, which 
>> was extremely simple. It wasn't any faster, because the bottleneck was the 
>> receiving database.
>> 
>> Is your motivation here speed? Or correctness?
>> 
>>> On Sat, Apr 30, 2016 at 8:45 AM, Mich Talebzadeh 
>>> <mich.talebza...@gmail.com> wrote:
>>> Hi,
>>> 
>>> What is the simplest way of making sqoop import use spark engine as opposed 
>>> to the default mapreduce when putting data into hive table. I did not see 
>>> any parameter for this in sqoop command line doc.
>>> 
>>> Thanks
>>> 
>>> Dr Mich Talebzadeh
>>>  
>>> LinkedIn  
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>  
>>> http://talebzadehmich.wordpress.com
>> 
>> 
>> Want to work at Handy? Check out our culture deck and open roles
>> Latest news at Handy
>> Handy just raised $50m led by Fidelity
>> 
>> 
>

Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

Reply via email to