Hi Jim,
Thanks for the description of the real-world use case. I like your idea of
letting Drill do the grunt work, then letting the ML/AI workload focus on that
aspect of the problem.
Charles, just brainstorming a bit, I think the easiest way to start is to
create a simple, stand-alone server
Jim,
I really like this use case. As a data scientist myself, I see the big value
of Drill as being able to rapidly get raw data ready for machine learning.
This would be great if we could do this!
> On Jan 30, 2019, at 08:43, Jim Scott wrote:
>
> Paul,
>
> Your example is exactly the same
Paul,
Your example is exactly the same as one which I spoke with some people on
the RAPIDS.ai project about. Using Drill as a tool to gather (query) all
the data to get a representative data set for an ML/AI workload, then
feeding the resultset directly into GPU memory. RAPIDS.ai is based on Arrow
Hi Aman,
Thanks for sending. I looked through the slides and really liked the
presentation.
@Paul, how would a Drill-to-Arrow bridge work exactly? Would it require
serialization/deserialization of Drill objects?
—C
> On Jan 30, 2019, at 02:16, Paul Rogers wrote:
>
> Hi Aman,
>
> Thanks
Hi Aman,
Thanks for sharing the update. Glad to hear things are still percolating.
I think Drill is an under appreciated treasure for doing queries in the complex
systems that folks seem to be building today. The ability to read multiple data
sources is something that maybe only Spark can do as
Hi Charles,
You may have seen the talk that was given on the Drill Developer Day [1] by
Karthik and me ... look for the slides on 'Drill-Arrow Integration' which
describes 2 high level options and what the integration might entail.
Option 1 corresponds to what you and Paul are discussing in this th
Hi Charles,
I didn't see anything on this on the public mailing list. Haven't seen any
commits related to it either. My guess is that this kind of interface is not
important for the kind of data warehouse use cases that MapR is probably still
trying to capture.
I followed the Arrow mailing lists
Hey Paul,
I’m curious as to what, if anything ever came of this thread? IMHO, you’re on
to something here. We could get the benefit of Arrow—specifically the
interoperability with other big data tools—without the pain of having to
completely re-work Drill. This seems like a real win-win to me
Hi Ted,
We may be confusing two very different ideas. The one is a Drill-to-Arrow
adapter on Drill's periphery, this is the "crude-but-effective" integration
suggestion. On the periphery we are not changing existing code, we're just
building an adapter to read Arrow data into Drill, or convert
Inline.
On Mon, Aug 20, 2018 at 9:20 AM Paul Rogers
wrote:
> ...
> By contrast, migrating Drill internals to Arrow has always been seen as
> the bulk of the cost; costs which the "crude-but-effective" suggestion
> seeks to avoid. Some of the full-integration costs include:
>
> * Reworking Drill
Hi Ted,
The "crude but effective" integration suggestion allows Drill to participate in
an Arrow pipeline with minimal work.
By contrast, migrating Drill internals to Arrow has always been seen as the
bulk of the cost; costs which the "crude-but-effective" suggestion seeks to
avoid. Some of th
Hi Charles,
Regarding UDFs and Arrow: if Arrow is used just as an interface format (as
outlined in the original post), then Drill's internals continue to use Drill
value vectors and UDFs are unchanged.
If Arrow is adopted internally in Drill, then vast amounts of runtime code must
change (see
This makes it sound like allocation is the important difference. As such
that might mean that converting drill would be easier than was thought.
On Sat, Aug 18, 2018, 16:44 Paul Rogers wrote:
> Hi All,
>
> Charles recently suggested why Arrow integration could be helpful. (See
> quote below.) W
Hi Paul,
This is a very interesting approach. i really like the concept in that it
sounds like we could prove the value of the Arrow integration without “major
surgery” to Drill. If it proves to be valuable we could proceed with deeper
integration, or if we determine that it is not necessary,
Hi All,
Charles recently suggested why Arrow integration could be helpful. (See quote
below.) When we've looked at reworking Drill's internals to use Arrow, we
found the project to be costly with little direct benefit in terms of
performance or stability. But, Charles points out that the real
15 matches
Mail list logo