Spark Streaming will consumer and process data in parallel. So the order of
the output will depend not only on the order of the input but also in the
time it takes for each task to process. Different options, like
repartitions, sorts and shuffles at Spark level will also affect ordering,
so the best way would be to rely on the scheme in Cassandra to ensure the
ordering expected by the application.

What is the schema you're using at the Cassandra side?  And how is the data
going to be queried?   That last question should drive the required
ordering.

-kr, Gerard.

On Mon, Nov 30, 2015 at 12:37 PM, Prateek . <prat...@aricent.com> wrote:

> Hi,
>
>
>
> I have an time critical spark application, which is taking sensor data
> from kafka stream, storing in case class, applying transformations and then
> storing in cassandra schema. The data needs to be stored in schema, in FIFO
> order.
>
>
>
> The order is maintained at kafka queue but I am observing, out of order
> data in Cassandra schema. Does Spark Streaming provide any functionality to
> retain order. Or do we need do implement some sorting based on timestamp of
> arrival.
>
>
>
>
>
> Regards,
>
> Prateek
> "DISCLAIMER: This message is proprietary to Aricent and is intended solely
> for the use of the individual to whom it is addressed. It may contain
> privileged or confidential information and should not be circulated or used
> for any purpose other than for what it is intended. If you have received
> this message in error, please notify the originator immediately. If you are
> not the intended recipient, you are notified that you are strictly
> prohibited from using, copying, altering, or disclosing the contents of
> this message. Aricent accepts no responsibility for loss or damage arising
> from the use of the information transmitted by this email including damage
> from virus."
>

Reply via email to