Hi Boyuan,

Do you know if it’s possible to do something similar to this with a single 
topic, essentially treat records with the same keys as their own distinct 
pipelines. The challenge I’m encountering for splitting things downstream ends 
up being related to watermarking at the partition-level (via a WatermarkPolicy) 
and I essentially need to track watermarking or treat records with a particular 
key the same/independently.

I’d assumed that would need to be done prior to reading from Kafka, which is 
where the SDF would come in.

> On Feb 1, 2021, at 12:48 PM, Boyuan Zhang <boyu...@google.com> wrote:
> 
> 
> Hi Rion,
> 
> It sounds like ReadFromKafkaDoFn could be one of the solutions. It takes 
> KafkaSourceDescritpor(basically it's a topic + partition) as input and emit 
> KafkaRecords. Then your pipeline can look like:
> testPipeline
>   .apply(your source that generates KafkaSourceDescriptor)
>   .apply(ParDo.of(ReadFromKafkaDoFn))
>   .apply(other parts)
> 
>> On Mon, Feb 1, 2021 at 8:06 AM Rion Williams <rionmons...@gmail.com> wrote:
>> Hey all,
>> 
>> I'm currently in a situation where I have a single Kafka topic with data 
>> across multiple partitions and covers data from multiple sources. I'm trying 
>> to see if there's a way that I'd be able to accomplish reading from these 
>> different sources as different pipelines and if a Splittable DoFn can do 
>> this.
>> 
>> Basically - what I'd like to do is for a given key on a record, treat this 
>> as a separate pipeline from Kafka:
>> testPipeline
>>     .apply(
>>         /*
>>             Apply some function here to tell Kafka how to describe how to 
>> split up
>>             the sources that I want to read from
>>          */
>>     )
>>     .apply("Ready from Kafka", KafkaIO.read(...))
>>     .apply("Remaining Pipeline Omitted for Brevity"
>> Is it possible to do this? I'm trying to avoid a major architectural change 
>> that would require multiple separate topics by source, however if I can 
>> guarantee that a given key (and it's associated watermark) are treated 
>> separately, that would be ideal.
>> 
>> Any advice or recommendations for a strategy that might work would be 
>> helpful!
>> 
>> Thanks,
>> 
>> Rion

Reply via email to