Option A If you can get all the messages in a session into the same Spark partition, you can use df.mapWithPartition to process the whole partition. This will allow you to control the order in which the messages are processed within the partition. This will work if messages are posted in Kafka in order and are guaranteed by Kafka to be delivered in order
Option B If the messages can come out of order, and have a timestamp associated with them, you can use window operations to sort messages within a window. You will need to make sure that messages in the same session land in the same Spark partition. This will add latency to the system though, because you won't process the messages until the watermark has expired. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/In-order-processing-using-spark-streaming-tp28457p28646.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org