Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

Marcin Tustin Sat, 30 Apr 2016 07:20:54 -0700

No, the execution engines are not in general interchangeable. The Hive
project uses an abstraction layer to be able to plug different execution
engines. I don't know if sqoop uses hive code, or if it uses an old
version, or what.


As with many things in the hadoop world, if you want to know if there's
something undocumented, your best bet is to look at the source code.

My suggestion would be to (1) make sure you're executing somewhere close to
the data - i.e. on nodemanagers colocated with datanodes; (2) profile to
make sure the slowness really is where you think; and (3) if you really
can't get the speed you need, try writing a small spark job to do the
export. Newer versions of spark seem faster.


On Sat, Apr 30, 2016 at 10:05 AM, Mich Talebzadeh <mich.talebza...@gmail.com
> wrote:

> Hi Marcin,
>
> It is the speed really. The speed in which data is digested into Hive.
>
> Sqoop is two stage as I understand.
>
>
>    1. Take the data out of RDMSD via JADB and put in on an external HDFS
>    file
>    2. Read that file and insert into a Hive table
>
>  The issue is the second part. In general I use Hive 2 with Spark 1.3.1
> engine to put data into Hive table. I wondered if there was such a
> parameter in Sqoop to use Spark engine.
>
> Well I gather this is easier said that done.  I am importing 1 billion
> rows table from Oracle
>
> sqoop import --connect "jdbc:oracle:thin:@rhes564:1521:mydb12" --username
> scratchpad -P \
>         --query "select * from scratchpad.dummy where \
>         \$CONDITIONS" \
>         --split-by ID \
>         --hive-import  --hive-table "oraclehadoop.dummy" --target-dir
> "dummy"
>
>
> Now the fact that in hive-site.xml I have set hive.execution.engine=spark
> does not matter. Sqoop seems to internally set  hive.execution.engine=mr
> anyway.
>
> May be there should be an option   --hive-execution-engine='mr/tez/spak'
> etc in above command?
>
> Cheers,
>
> Mich
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 30 April 2016 at 14:51, Marcin Tustin <mtus...@handybook.com> wrote:
>
>> They're not simply interchangeable. sqoop is written to use mapreduce.
>>
>> I actually implemented my own replacement for sqoop-export in spark,
>> which was extremely simple. It wasn't any faster, because the bottleneck
>> was the receiving database.
>>
>> Is your motivation here speed? Or correctness?
>>
>> On Sat, Apr 30, 2016 at 8:45 AM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> What is the simplest way of making sqoop import use spark engine as
>>> opposed to the default mapreduce when putting data into hive table. I did
>>> not see any parameter for this in sqoop command line doc.
>>>
>>> Thanks
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>
>>
>> Want to work at Handy? Check out our culture deck and open roles
>> <http://www.handy.com/careers>
>> Latest news <http://www.handy.com/press> at Handy
>> Handy just raised $50m
>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
>>  led
>> by Fidelity
>>
>>
>

-- 
Want to work at Handy? Check out our culture deck and open roles 
<http://www.handy.com/careers>
Latest news <http://www.handy.com/press> at Handy
Handy just raised $50m 
<http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
 led 
by Fidelity

Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

Reply via email to