Minor correction:  I think you want iterator.grouped(10) for
non-overlapping mini batches
On Dec 11, 2014 1:37 PM, "Matei Zaharia" <matei.zaha...@gmail.com> wrote:

> You can just do mapPartitions on the whole RDD, and then called sliding()
> on the iterator in each one to get a sliding window. One problem is that
> you will not be able to slide "forward" into the next partition at
> partition boundaries. If this matters to you, you need to do something more
> complicated to get those, such as the repartition that you said (where you
> map each record to the partition it should be in).
>
> Matei
>
> > On Dec 11, 2014, at 10:16 AM, ll <duy.huynh....@gmail.com> wrote:
> >
> > any advice/comment on this would be much appreciated.
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/what-is-the-best-way-to-implement-mini-batches-tp20264p20635.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to