One way to do that is currently possible is given here http://mail-archives.apache.org/mod_mbox/spark-user/201407.mbox/%3CCAMwrk0=b38dewysliwyc6hmze8tty8innbw6ixatnd1ue2-...@mail.gmail.com%3E
On Wed, Jul 16, 2014 at 1:16 AM, Gerard Maas <gerard.m...@gmail.com> wrote: > Hi Sargun, > > There have been few discussions on the list recently about the topic. The > short answer is that this is not supported at the moment. > This is a particularly good thread as it discusses the current state and > limitations: > > > http://apache-spark-developers-list.1001551.n3.nabble.com/brainsotrming-Generalization-of-DStream-a-ContinuousRDD-td7349.html > > -kr, Gerard. > > > On Wed, Jul 16, 2014 at 9:56 AM, Sargun Dhillon <sar...@sargun.me> wrote: > >> Does anyone here have a way to do Spark Streaming with external timing >> for windows? Right now, it relies on the wall clock of the driver to >> determine the amount of time that each batch read lasts. >> >> We have a Kafka, and HDFS ingress into our Spark Streaming pipeline >> where the events are annotated by the timestamps that they happened >> (in real time) in. We would like to keep our windows based on those >> timestamps, as opposed to based on the driver time. >> >> Does anyone have any ideas how to do this? >> > >