SAP Sybase IQ does that and I believe SAP Hana as well. Dr Mich Talebzadeh
LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 6 April 2016 at 23:49, Peyman Mohajerian <mohaj...@gmail.com> wrote: > For some MPP relational stores (not operational) it maybe feasible to run > Spark jobs and also have data locality. I know QueryGrid (Teradata) and > PolyBase (microsoft) use data locality to move data between their MPP and > Hadoop. > I would guess (have no idea) someone like IBM already is doing that for > Spark, maybe a bit off topic! > > On Wed, Apr 6, 2016 at 3:29 PM, Jörn Franke <jornfra...@gmail.com> wrote: > >> Well I am not sure, but using a database as a storage, such as relational >> databases or certain nosql databases (eg MongoDB) for Spark is generally a >> bad idea - no data locality, it cannot handle real big data volumes for >> compute and you may potentially overload an operational database. >> And if your job fails for whatever reason (eg scheduling ) then you have >> to pull everything out again. Sqoop and HDFS seems to me the more elegant >> solution together with spark. These "assumption" on parallelism have to be >> anyway made with any solution. >> Of course you can always redo things, but why - what benefit do you >> expect? A real big data platform has to support anyway many different tools >> otherwise people doing analytics will be limited. >> >> On 06 Apr 2016, at 20:05, Michael Segel <msegel_had...@hotmail.com> >> wrote: >> >> I don’t think its necessarily a bad idea. >> >> Sqoop is an ugly tool and it requires you to make some assumptions as a >> way to gain parallelism. (Not that most of the assumptions are not valid >> for most of the use cases…) >> >> Depending on what you want to do… your data may not be persisted on >> HDFS. There are use cases where your cluster is used for compute and not >> storage. >> >> I’d say that spending time re-inventing the wheel can be a good thing. >> It would be a good idea for many to rethink their ingestion process so >> that they can have a nice ‘data lake’ and not a ‘data sewer’. (Stealing >> that term from Dean Wampler. ;-) >> >> Just saying. ;-) >> >> -Mike >> >> On Apr 5, 2016, at 10:44 PM, Jörn Franke <jornfra...@gmail.com> wrote: >> >> I do not think you can be more resource efficient. In the end you have to >> store the data anyway on HDFS . You have a lot of development effort for >> doing something like sqoop. Especially with error handling. >> You may create a ticket with the Sqoop guys to support Spark as an >> execution engine and maybe it is less effort to plug it in there. >> Maybe if your cluster is loaded then you may want to add more machines or >> improve the existing programs. >> >> On 06 Apr 2016, at 07:33, ayan guha <guha.a...@gmail.com> wrote: >> >> One of the reason in my mind is to avoid Map-Reduce application >> completely during ingestion, if possible. Also, I can then use Spark stand >> alone cluster to ingest, even if my hadoop cluster is heavily loaded. What >> you guys think? >> >> On Wed, Apr 6, 2016 at 3:13 PM, Jörn Franke <jornfra...@gmail.com> wrote: >> >>> Why do you want to reimplement something which is already there? >>> >>> On 06 Apr 2016, at 06:47, ayan guha <guha.a...@gmail.com> wrote: >>> >>> Hi >>> >>> Thanks for reply. My use case is query ~40 tables from Oracle (using >>> index and incremental only) and add data to existing Hive tables. Also, it >>> would be good to have an option to create Hive table, driven by job >>> specific configuration. >>> >>> What do you think? >>> >>> Best >>> Ayan >>> >>> On Wed, Apr 6, 2016 at 2:30 PM, Takeshi Yamamuro <linguin....@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> It depends on your use case using sqoop. >>>> What's it like? >>>> >>>> // maropu >>>> >>>> On Wed, Apr 6, 2016 at 1:26 PM, ayan guha <guha.a...@gmail.com> wrote: >>>> >>>>> Hi All >>>>> >>>>> Asking opinion: is it possible/advisable to use spark to replace what >>>>> sqoop does? Any existing project done in similar lines? >>>>> >>>>> -- >>>>> Best Regards, >>>>> Ayan Guha >>>>> >>>> >>>> >>>> >>>> -- >>>> --- >>>> Takeshi Yamamuro >>>> >>> >>> >>> >>> -- >>> Best Regards, >>> Ayan Guha >>> >>> >> >> >> -- >> Best Regards, >> Ayan Guha >> >> >> >