Hi Xavier,

Along the lines of connecting to multiple sources of data and replacing ETL
tools you may want to check out Confluent's blog on building a real-time
streaming ETL pipeline on Kafka
<https://www.confluent.io/blog/building-real-time-streaming-etl-pipeline-20-minutes/>
as well as SnappyData's blog on Real-Time Streaming ETL with SnappyData
<http://www.snappydata.io/blog/real-time-streaming-etl-with-snappydata> where
Spark is central to connecting to multiple data sources, executing SQL on
streams etc. These should provide nice comparisons to your ideas about
Dremio + Spark as ETL tools.

Disclaimer: I am a SnappyData employee

Hope this helps,

Pierce

On Mon, May 14, 2018 at 2:24 AM, xmehaut <xavier.meh...@gmail.com> wrote:

> Hi Michaƫl,
>
> I'm not an expert of Dremio, i just try to evaluate the potential of this
> techno and what impacts it could have on spark, and how they can work
> together, or how spark could use even further arrow internally along the
> existing algorithms.
>
> Dremio has already a quite rich api set enabling to access for instance to
> metadata, sql queries, or even to create virtual datasets programmatically.
> They also have a lot of predefined functions, and I imagine there will be
> more an more fucntions in the future, eg machine learning functions like
> the
> ones we may find in azure sql server which enables to mix sql and ml
> functions.  Acces to dremio is made through jdbc, and we may imagine to
> access virtual datasets through spark and create dynamically new datasets
> from the api connected to parquets files stored dynamycally by spark on
> hdfs, azure datalake or s3... Of course a more thight integration between
> both should be better with a spark read/write connector to dremio :)
>
> regards
> xavier
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to