Minor correction: I think you want iterator.grouped(10) for non-overlapping mini batches On Dec 11, 2014 1:37 PM, "Matei Zaharia" <matei.zaha...@gmail.com> wrote:
> You can just do mapPartitions on the whole RDD, and then called sliding() > on the iterator in each one to get a sliding window. One problem is that > you will not be able to slide "forward" into the next partition at > partition boundaries. If this matters to you, you need to do something more > complicated to get those, such as the repartition that you said (where you > map each record to the partition it should be in). > > Matei > > > On Dec 11, 2014, at 10:16 AM, ll <duy.huynh....@gmail.com> wrote: > > > > any advice/comment on this would be much appreciated. > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/what-is-the-best-way-to-implement-mini-batches-tp20264p20635.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >