Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

Mich Talebzadeh Sat, 30 Apr 2016 07:24:54 -0700

yes I was thinking of that. use Spark to load JDBC data from Oracle and
flush it into ORC table in Hive.


Now I am using Spark 1.6.1 and JDBC driver as I recall (I raised a thread
for it) throwing error.

This was working under Spark 1.5.2.

Cheers

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 30 April 2016 at 15:20, Marcin Tustin <mtus...@handybook.com> wrote:

> No, the execution engines are not in general interchangeable. The Hive
> project uses an abstraction layer to be able to plug different execution
> engines. I don't know if sqoop uses hive code, or if it uses an old
> version, or what.
>
> As with many things in the hadoop world, if you want to know if there's
> something undocumented, your best bet is to look at the source code.
>
> My suggestion would be to (1) make sure you're executing somewhere close
> to the data - i.e. on nodemanagers colocated with datanodes; (2) profile to
> make sure the slowness really is where you think; and (3) if you really
> can't get the speed you need, try writing a small spark job to do the
> export. Newer versions of spark seem faster.
>
>
> On Sat, Apr 30, 2016 at 10:05 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi Marcin,
>>
>> It is the speed really. The speed in which data is digested into Hive.
>>
>> Sqoop is two stage as I understand.
>>
>>
>>    1. Take the data out of RDMSD via JADB and put in on an external HDFS
>>    file
>>    2. Read that file and insert into a Hive table
>>
>>  The issue is the second part. In general I use Hive 2 with Spark 1.3.1
>> engine to put data into Hive table. I wondered if there was such a
>> parameter in Sqoop to use Spark engine.
>>
>> Well I gather this is easier said that done.  I am importing 1 billion
>> rows table from Oracle
>>
>> sqoop import --connect "jdbc:oracle:thin:@rhes564:1521:mydb12"
>> --username scratchpad -P \
>>         --query "select * from scratchpad.dummy where \
>>         \$CONDITIONS" \
>>         --split-by ID \
>>         --hive-import  --hive-table "oraclehadoop.dummy" --target-dir
>> "dummy"
>>
>>
>> Now the fact that in hive-site.xml I have set hive.execution.engine=spark
>> does not matter. Sqoop seems to internally set  hive.execution.engine=mr
>> anyway.
>>
>> May be there should be an option   --hive-execution-engine='mr/tez/spak'
>> etc in above command?
>>
>> Cheers,
>>
>> Mich
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 30 April 2016 at 14:51, Marcin Tustin <mtus...@handybook.com> wrote:
>>
>>> They're not simply interchangeable. sqoop is written to use mapreduce.
>>>
>>> I actually implemented my own replacement for sqoop-export in spark,
>>> which was extremely simple. It wasn't any faster, because the bottleneck
>>> was the receiving database.
>>>
>>> Is your motivation here speed? Or correctness?
>>>
>>> On Sat, Apr 30, 2016 at 8:45 AM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> What is the simplest way of making sqoop import use spark engine as
>>>> opposed to the default mapreduce when putting data into hive table. I did
>>>> not see any parameter for this in sqoop command line doc.
>>>>
>>>> Thanks
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>
>>>
>>> Want to work at Handy? Check out our culture deck and open roles
>>> <http://www.handy.com/careers>
>>> Latest news <http://www.handy.com/press> at Handy
>>> Handy just raised $50m
>>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
>>>  led
>>> by Fidelity
>>>
>>>
>>
>
> Want to work at Handy? Check out our culture deck and open roles
> <http://www.handy.com/careers>
> Latest news <http://www.handy.com/press> at Handy
> Handy just raised $50m
> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
>  led
> by Fidelity
>
>

Re: Making sqoop import use Spark engine as opposed to MapReduce for Hive

Reply via email to