Hi,
I am using spark streaming for processing messages from kafka for real time
analytics. I am trying to fine tune my streaming process. Currently my spark
streaming system is reading a batch of messages from kafka topic and processing
each message one at a time. I have set properties in spark streaming for
increasing parallelism for tasks it performs to process that one message.
The problem here is that the processing of one message picked up
in a batch is still taking lot of time as my workflow is like that. What I
would like to implement is a way in which other messages picked up in that
batch can be sent for processing in parallel along with that first message.
This scenario will reduce the overall processing time as some messages may take
time and some may not and others are not waiting for the one message to process.
Is this kind of implementation possible with spark streaming ?
If not then do I need to use some other tool along with spark streaming to
include this kind of processing ? What are the possible options for me?
Thanks in advance.
Thanks,
Udbhav Agarwal