Re: [Arrow][Dremio]

Xavier Mehaut Tue, 15 May 2018 11:34:29 -0700

thanks bryan for the answer

Envoyé de mon iPhone


> Le 15 mai 2018 à 19:06, Bryan Cutler <cutl...@gmail.com> a écrit :
> 
> Hi Xavier,
> 
> Regarding Arrow usage in Spark, using Arrow format to transfer data between 
> Python and Java has been the focus so far because this area stood to benefit 
> the most.  It's possible that the scope of Arrow could broaden in the future, 
> but there still needs to be discussions about this.
> 
> Bryan
> 
>> On Mon, May 14, 2018 at 9:55 AM, Pierce Lamb <richard.pierce.l...@gmail.com> 
>> wrote:
>> Hi Xavier,
>> 
>> Along the lines of connecting to multiple sources of data and replacing ETL 
>> tools you may want to check out Confluent's blog on building a real-time 
>> streaming ETL pipeline on Kafka as well as SnappyData's blog on Real-Time 
>> Streaming ETL with SnappyData where Spark is central to connecting to 
>> multiple data sources, executing SQL on streams etc. These should provide 
>> nice comparisons to your ideas about Dremio + Spark as ETL tools.
>> 
>> Disclaimer: I am a SnappyData employee
>> 
>> Hope this helps,
>> 
>> Pierce
>> 
>>> On Mon, May 14, 2018 at 2:24 AM, xmehaut <xavier.meh...@gmail.com> wrote:
>>> Hi Michaël,
>>> 
>>> I'm not an expert of Dremio, i just try to evaluate the potential of this
>>> techno and what impacts it could have on spark, and how they can work
>>> together, or how spark could use even further arrow internally along the
>>> existing algorithms.
>>> 
>>> Dremio has already a quite rich api set enabling to access for instance to
>>> metadata, sql queries, or even to create virtual datasets programmatically.
>>> They also have a lot of predefined functions, and I imagine there will be
>>> more an more fucntions in the future, eg machine learning functions like the
>>> ones we may find in azure sql server which enables to mix sql and ml
>>> functions.  Acces to dremio is made through jdbc, and we may imagine to
>>> access virtual datasets through spark and create dynamically new datasets
>>> from the api connected to parquets files stored dynamycally by spark on
>>> hdfs, azure datalake or s3... Of course a more thight integration between
>>> both should be better with a spark read/write connector to dremio :)
>>> 
>>> regards
>>> xavier
>>> 
>>> 
>>> 
>>> --
>>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>> 
>> 
>

Re: [Arrow][Dremio]

Reply via email to