Thanks, Omar. As it turns out, Parquet is not the way to go since it looks like it is geared more toward data warehousing, whereas I need to persist streaming data - and from what I can gather, I would need the overhead of Spark or Hive to accomplish that with Parquet (appending to a growing Parquet file).
*However*, it looks like Apache Kudu is exactly what we need. And not only does Camel already provide a Kudu component, as coincidence would have it it looks like you co-authored. Awesome! Moreover, Kudu takes just a Map as input, and not an Avro formatted message or whatever like Parquet. So migrating this Kafka->Mongo route to Kafka->Kudu is almost trivial. Anyway, time to bump up my Camel version to 3.0.1 and give Kudu a whirl... Thanks again. > On February 12, 2020 at 4:33 AM Omar Al-Safi <o...@oalsafi.com> wrote: > > > Hi Ron, > > By reading some introduction in Apache Drill, I'd say the file component > would be more suitable to write parquet files. > In regards to parquet and Camel, we don't have an example for it but the > way I see it, you are heading into the right direction by creating a > processor to convert the data to parquet format. > However, we do have an open feature request > <https://issues.apache.org/jira/browse/CAMEL-13573> to add parquet data > format, we would love to see some contributions to add this to Camel :) . > > Regards, > Omar > > > On Tue, Feb 11, 2020 at 11:37 PM Ron Cecchini <roncecch...@comcast.net> > wrote: > > > Hi, all. I'm just looking for quick guidance or confirmation that I'm > > going in the right direction here: > > > > - There's a small Kotlin service that uses Camel to read from Kafka and > > write to Mongo. > > - I need to replace Mongo with Apache Drill and write Parquet files to the > > file system. > > (I know nothing about Parquet but I know a little bit about Drill.) > > > > - This service isn't used to do any queries, it's just for persisting data. > > So, given that, and the fact that Drill is just a query engine, I really > > can't use the "Drill" component for anything. > > > > - But there is that "HDFS" component that I think I can use? > > Or maybe the "File" component is better here? > > > > So my thinking is that I just need to: > > > > 1. write a Processor to transform the JSON data into Parquet > > (and keep in mind that I know nothing about Parquet...) > > > > 2. use the HDFS (or File) component to write it to a file > > (I think there's some Parquet set up to do (?) outside the scope of > > this service, but that's another matter...) > > > > Seems pretty straight-forward. Does that sound reasonable? > > > > Are there any Camel examples I can look at? The Google machine seems to > > not find anything related to Camel and Parquet... > > > > Thank you so much! > > > > Ron > >