Re: Camel + Drill + Parquet

Ron Cecchini Wed, 12 Feb 2020 21:40:11 -0800

Thanks, Omar.

As it turns out, Parquet is not the way to go since it looks like it is geared 
more toward data warehousing, whereas I need to persist streaming data - and 
from what I can gather, I would need the overhead of Spark or Hive to 
accomplish that with Parquet (appending to a growing Parquet file).


*However*, it looks like Apache Kudu is exactly what we need.  And not only 
does Camel already provide a Kudu component, as coincidence would have it it 
looks like you co-authored.  Awesome!

Moreover, Kudu takes just a Map as input, and not an Avro formatted message or 
whatever like Parquet.  So migrating this Kafka->Mongo route to Kafka->Kudu is 
almost trivial.

Anyway, time to bump up my Camel version to 3.0.1 and give Kudu a whirl...

Thanks again.

> On February 12, 2020 at 4:33 AM Omar Al-Safi <o...@oalsafi.com> wrote:
> 
> 
> Hi Ron,
> 
> By reading some introduction in Apache Drill, I'd say the file component
> would be more suitable to write parquet files.
> In regards to parquet and Camel, we don't have an example for it but the
> way I see it, you are heading into the right direction by creating a
> processor to convert the data to parquet format.
> However, we do have an open feature request
> <https://issues.apache.org/jira/browse/CAMEL-13573> to add parquet data
> format, we would love to see some contributions to add this to Camel :) .
> 
> Regards,
> Omar
> 
> 
> On Tue, Feb 11, 2020 at 11:37 PM Ron Cecchini <roncecch...@comcast.net>
> wrote:
> 
> > Hi, all.  I'm just looking for quick guidance or confirmation that I'm
> > going in the right direction here:
> >
> > - There's a small Kotlin service that uses Camel to read from Kafka and
> > write to Mongo.
> > - I need to replace Mongo with Apache Drill and write Parquet files to the
> > file system.
> >   (I know nothing about Parquet but I know a little bit about Drill.)
> >
> > - This service isn't used to do any queries, it's just for persisting data.
> >   So, given that, and the fact that Drill is just a query engine, I really
> > can't use the "Drill" component for anything.
> >
> > - But there is that "HDFS" component that I think I can use?
> >   Or maybe the "File" component is better here?
> >
> > So my thinking is that I just need to:
> >
> > 1. write a Processor to transform the JSON data into Parquet
> >    (and keep in mind that I know nothing about Parquet...)
> >
> > 2. use the HDFS (or File) component to write it to a file
> >    (I think there's some Parquet set up to do (?) outside the scope of
> > this service, but that's another matter...)
> >
> > Seems pretty straight-forward.  Does that sound reasonable?
> >
> > Are there any Camel examples I can look at?  The Google machine seems to
> > not find anything related to Camel and Parquet...
> >
> > Thank you so much!
> >
> > Ron
> >

Re: Camel + Drill + Parquet

Reply via email to