Re: Using HybridSource

Oscar Perez via user Tue, 04 Jul 2023 12:13:34 -0700

Hei,
1) We populate state based on this CSV data and do business logic based on
this state and events coming from other unrelated streams.
2) We are using low level process function in order to process this future
hybrid source


Regardless of the aforementioned points, please note that the main
challenge is to combine in a hybridsource CSV and kafka topic that return
different datatypes so I dont know how my answers relate to the original
problem tbh. Regards,
Oscar

On Tue, 4 Jul 2023 at 20:53, Alexander Fedulov <alexander.fedu...@gmail.com>
wrote:

> @Oscar
> 1. How do you plan to use that CSV data? Is it needed for lookup from the
> "main" stream?
> 2. Which API are you using? DataStream/SQL/Table or low level
> ProcessFunction?
>
> Best,
> Alex
>
>
> On Tue, 4 Jul 2023 at 11:14, Oscar Perez via user <user@flink.apache.org>
> wrote:
>
>> ok, but is it? As I said, both sources have different data types. In the
>> example here:
>>
>>
>> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/hybridsource/
>>
>> We are using both sources as returning string but in our case, one source
>> would return a protobuf event while the other would return a pojo. How can
>> we make the 2 sources share the same datatype so that we can successfully
>> use hybrid source?
>>
>> Regards,
>> Oscar
>>
>> On Tue, 4 Jul 2023 at 12:04, Alexey Novakov <ale...@ververica.com> wrote:
>>
>>> Hi Oscar,
>>>
>>> You could use connected streams and put your file into a special Kafka
>>> topic before starting such a job:
>>> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/datastream/operators/overview/#connect
>>> But this may require more work and the event ordering (which is
>>> shuffled) in the connected streams is probably not what you are looking for.
>>>
>>> I think HybridSource is the right solution.
>>>
>>> Best regards,
>>> Alexey
>>>
>>> On Mon, Jul 3, 2023 at 3:44 PM Oscar Perez via user <
>>> user@flink.apache.org> wrote:
>>>
>>>> Hei, We want to bootstrap some data from a CSV file before reading from
>>>> a kafka topic that has a retention period of 7 days.
>>>>
>>>> We believe the best tool for that would be the HybridSource but the
>>>> problem we are facing is that both datasources are of different nature. The
>>>> KafkaSource returns a protobuf event while the CSV is a POJO with just 3
>>>> fields.
>>>>
>>>> We could hack the kafkasource implementation and then in the
>>>> valuedeserializer do the mapping from protobuf to the CSV POJO but that
>>>> seems rather hackish. Is there a way more elegant to unify both datatypes
>>>> from both sources using Hybrid Source?
>>>>
>>>> thanks
>>>> Oscar
>>>>
>>>

Re: Using HybridSource

Reply via email to