Re: Sqoop on Spark

Mich Talebzadeh Wed, 06 Apr 2016 15:57:12 -0700

SAP Sybase IQ does that and I believe SAP Hana as well.

Dr Mich Talebzadeh




LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 6 April 2016 at 23:49, Peyman Mohajerian <mohaj...@gmail.com> wrote:

> For some MPP relational stores (not operational) it maybe feasible to run
> Spark jobs and also have data locality. I know QueryGrid (Teradata) and
> PolyBase (microsoft) use data locality to move data between their MPP and
> Hadoop.
> I would guess (have no idea) someone like IBM already is doing that for
> Spark, maybe a bit off topic!
>
> On Wed, Apr 6, 2016 at 3:29 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>
>> Well I am not sure, but using a database as a storage, such as relational
>> databases or certain nosql databases (eg MongoDB) for Spark is generally a
>> bad idea - no data locality, it cannot handle real big data volumes for
>> compute and you may potentially overload an operational database.
>> And if your job fails for whatever reason (eg scheduling ) then you have
>> to pull everything out again. Sqoop and HDFS seems to me the more elegant
>> solution together with spark. These "assumption" on parallelism have to be
>> anyway made with any solution.
>> Of course you can always redo things, but why - what benefit do you
>> expect? A real big data platform has to support anyway many different tools
>> otherwise people doing analytics will be limited.
>>
>> On 06 Apr 2016, at 20:05, Michael Segel <msegel_had...@hotmail.com>
>> wrote:
>>
>> I don’t think its necessarily a bad idea.
>>
>> Sqoop is an ugly tool and it requires you to make some assumptions as a
>> way to gain parallelism. (Not that most of the assumptions are not valid
>> for most of the use cases…)
>>
>> Depending on what you want to do… your data may not be persisted on
>> HDFS.  There are use cases where your cluster is used for compute and not
>> storage.
>>
>> I’d say that spending time re-inventing the wheel can be a good thing.
>> It would be a good idea for many to rethink their ingestion process so
>> that they can have a nice ‘data lake’ and not a ‘data sewer’. (Stealing
>> that term from Dean Wampler. ;-)
>>
>> Just saying. ;-)
>>
>> -Mike
>>
>> On Apr 5, 2016, at 10:44 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>>
>> I do not think you can be more resource efficient. In the end you have to
>> store the data anyway on HDFS . You have a lot of development effort for
>> doing something like sqoop. Especially with error handling.
>> You may create a ticket with the Sqoop guys to support Spark as an
>> execution engine and maybe it is less effort to plug it in there.
>> Maybe if your cluster is loaded then you may want to add more machines or
>> improve the existing programs.
>>
>> On 06 Apr 2016, at 07:33, ayan guha <guha.a...@gmail.com> wrote:
>>
>> One of the reason in my mind is to avoid Map-Reduce application
>> completely during ingestion, if possible. Also, I can then use Spark stand
>> alone cluster to ingest, even if my hadoop cluster is heavily loaded. What
>> you guys think?
>>
>> On Wed, Apr 6, 2016 at 3:13 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>>
>>> Why do you want to reimplement something which is already there?
>>>
>>> On 06 Apr 2016, at 06:47, ayan guha <guha.a...@gmail.com> wrote:
>>>
>>> Hi
>>>
>>> Thanks for reply. My use case is query ~40 tables from Oracle (using
>>> index and incremental only) and add data to existing Hive tables. Also, it
>>> would be good to have an option to create Hive table, driven by job
>>> specific configuration.
>>>
>>> What do you think?
>>>
>>> Best
>>> Ayan
>>>
>>> On Wed, Apr 6, 2016 at 2:30 PM, Takeshi Yamamuro <linguin....@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> It depends on your use case using sqoop.
>>>> What's it like?
>>>>
>>>> // maropu
>>>>
>>>> On Wed, Apr 6, 2016 at 1:26 PM, ayan guha <guha.a...@gmail.com> wrote:
>>>>
>>>>> Hi All
>>>>>
>>>>> Asking opinion: is it possible/advisable to use spark to replace what
>>>>> sqoop does? Any existing project done in similar lines?
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Ayan Guha
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ---
>>>> Takeshi Yamamuro
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>>
>>
>

Re: Sqoop on Spark

Reply via email to