Spark Streaming distributed batch locking

Legg John Thu, 12 Feb 2015 03:22:50 -0800

Hi

After doing lots of reading and building a POC for our use case we are
still unsure as to whether Spark Streaming can handle our use case:


* We have an inbound stream of sensor data for millions of devices (which
have unique identifiers).
* We need to perform aggregation of this stream on a per device level.
The aggregation will read data that has already been processed (and
persisted) in previous batches.
* Key point:  When we process data for a particular device we need to
ensure that no other processes are processing data for that particular
device.  This is because the outcome of our processing will affect the
downstream processing for that device.  Effectively we need a distributed
lock.
* In addition the event device data needs to be processed in the order
that the events occurred.

Essentially we can¹t have two batches for the same device being processed
at the same time. 

Can Spark handle our use case?

Any advice appreciated.

Regards
John 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark Streaming distributed batch locking

Reply via email to