Re: Apache Pinot Sink

2021-01-25 Thread Yupeng Fu
Hi Mats and Jakob, +1 to what Till said about non-deterministic behavior. Also I suggest you look at only Pinot's offline segment creation from Flink. Pinot provides an inbuilt lambda architecture and has the real-time and offline segments per table (architecture diagram

Re: Apache Pinot Sink

2021-01-25 Thread Till Rohrmann
Hi Mats and Jakob, In the general case, I don't think that elements from upstream Flink tasks always arrive at the same subtask of the sink. One problem is that user computations can be non-deterministic. Moreover, a rebalance operation can distribute the events of a task A among several

Re: Apache Pinot Sink

2021-01-25 Thread Poerschke, Mats
Hi all, We want to give you a short update on the Pinot Sink since we started implementing a PoC. As described earlier, we aim to use batch-uploading of segments to Pinot in combination with caching elements in the Flink sink. Our current implementation works like this: Besides the pinot

Re: Apache Pinot Sink

2021-01-06 Thread Venkata Sanath Muppalla
+1 As Yupeng mentioned, we at Uber are also looking into the Pinot Sink. It would be great to collaborate on this proposal. Thanks, Sanath On Wed, Jan 6, 2021 at 9:23 AM Yupeng Fu wrote: > Hi Mats, > > Glad to see this interest! We at Uber are also working on a Pinot sink (for > BATCH

Re: Apache Pinot Sink

2021-01-06 Thread Yupeng Fu
Hi Mats, Glad to see this interest! We at Uber are also working on a Pinot sink (for BATCH execution), with some help from the Pinot community on abstracting Pinot interfaces for segment writes and catalog retrieval. Perhaps we can collaborate on this proposal/POC. Cheers, Yupeng On Wed, Jan

Re: Apache Pinot Sink

2021-01-06 Thread Aljoscha Krettek
That's good to hear. I wasn't sure because the explanation focused a lot on checkpoints and the details of it while with the new Sink interface implementers don't need to be concerned with those. And in fact, when the Sink is used in BATCH execution mode there will be no checkpoints. Other

Re: Apache Pinot Sink

2021-01-06 Thread Poerschke, Mats
Yes, we will use the latest sink interface. Best, Mats > On 6. Jan 2021, at 11:05, Aljoscha Krettek wrote: > > It's great to see interest in this. Where you planning to use the new Sink > interface that we recently introduced? [1] > > Best, > Aljoscha > > [1] https://s.apache.org/FLIP-143 >

Re: Apache Pinot Sink

2021-01-06 Thread Aljoscha Krettek
It's great to see interest in this. Where you planning to use the new Sink interface that we recently introduced? [1] Best, Aljoscha [1] https://s.apache.org/FLIP-143 On 2021/01/05 12:21, Poerschke, Mats wrote: Hi all, we want to contribute a sink connector for Apache Pinot. The following

Re: Apache Pinot Sink

2021-01-05 Thread Poerschke, Mats
Just as a short addition: We plan to contribute the sink to Apache Bahir. Best regards Mats Pörschke > On 5. Jan 2021, at 13:21, Poerschke, Mats > wrote: > > Hi all, > > we want to contribute a sink connector for Apache Pinot. The following > briefly describes the planned control flow.

Apache Pinot Sink

2021-01-05 Thread Poerschke, Mats
Hi all, we want to contribute a sink connector for Apache Pinot. The following briefly describes the planned control flow. Please feel free to comment on any of its aspects. Background Apache Pinot is a large-scale real-time data ingestion engine working on data segments internally. The