thanks bryan for the answer Envoyé de mon iPhone
> Le 15 mai 2018 à 19:06, Bryan Cutler <cutl...@gmail.com> a écrit : > > Hi Xavier, > > Regarding Arrow usage in Spark, using Arrow format to transfer data between > Python and Java has been the focus so far because this area stood to benefit > the most. It's possible that the scope of Arrow could broaden in the future, > but there still needs to be discussions about this. > > Bryan > >> On Mon, May 14, 2018 at 9:55 AM, Pierce Lamb <richard.pierce.l...@gmail.com> >> wrote: >> Hi Xavier, >> >> Along the lines of connecting to multiple sources of data and replacing ETL >> tools you may want to check out Confluent's blog on building a real-time >> streaming ETL pipeline on Kafka as well as SnappyData's blog on Real-Time >> Streaming ETL with SnappyData where Spark is central to connecting to >> multiple data sources, executing SQL on streams etc. These should provide >> nice comparisons to your ideas about Dremio + Spark as ETL tools. >> >> Disclaimer: I am a SnappyData employee >> >> Hope this helps, >> >> Pierce >> >>> On Mon, May 14, 2018 at 2:24 AM, xmehaut <xavier.meh...@gmail.com> wrote: >>> Hi Michaël, >>> >>> I'm not an expert of Dremio, i just try to evaluate the potential of this >>> techno and what impacts it could have on spark, and how they can work >>> together, or how spark could use even further arrow internally along the >>> existing algorithms. >>> >>> Dremio has already a quite rich api set enabling to access for instance to >>> metadata, sql queries, or even to create virtual datasets programmatically. >>> They also have a lot of predefined functions, and I imagine there will be >>> more an more fucntions in the future, eg machine learning functions like the >>> ones we may find in azure sql server which enables to mix sql and ml >>> functions. Acces to dremio is made through jdbc, and we may imagine to >>> access virtual datasets through spark and create dynamically new datasets >>> from the api connected to parquets files stored dynamycally by spark on >>> hdfs, azure datalake or s3... Of course a more thight integration between >>> both should be better with a spark read/write connector to dremio :) >>> >>> regards >>> xavier >>> >>> >>> >>> -- >>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> >> >