Are you saying, that it should process all messages from topic 1, then
topic 2, then topic 3, then 4?

Or that they need to be processed exactly at the same time?

On Mon, Mar 20, 2017 at 10:05 PM, Manasa Danda <manasada...@gmail.com>
wrote:

> Hi,
>
> I am Manasa, currently working on a project that requires processing data
> from multiple topics at the same time. I am looking for an advise on how to
> approach this problem. Below is the use case.
>
>
> We have 4 topics, with data coming in at a different rate in each topic,
> but the messages in each topic share a common unique identifier (
> attributionId). I need to process all the events in the 4 topics with same
> attributionId at the same time. we are currently using spark streaming for
> processing.
>
> Here's the steps for current logic.
>
> 1. Read and filter data in topic 1
> 2. Read and filter data in topic 2
> 3. Read and filter data in topic 3
> 4. Read and filter data in topic 4
> 5. Union of DStreams from steps 1-4, which were executed in parallel
> 6. process unified DStream
>
> However, since the data is coming at a different rate, the associated data
> ( topic 1 is generating 1000 times more than topic 2), is not coming in
> same batch window.
>
> Any ideas on how it can implemented would help.
>
> Thank you!!
>
> -Manasa
>

Reply via email to