Thanks for the reply. As far as I understood mini batch is not yet supported in ML libarary. As for MLLib minibatch, I could not find any pyspark api.
<https://mailtrack.io/> Sent with Mailtrack <https://mailtrack.io/install?source=signature&lang=en&referral=saj3...@gmail.com&idSignature=22> On Wed, Aug 23, 2017 at 2:59 PM, Suzen, Mehmet <su...@acm.org> wrote: > It depends on what model you would like to train but models requiring > optimisation could use SGD with mini batches. See: > https://spark.apache.org/docs/latest/mllib-optimization. > html#stochastic-gradient-descent-sgd > > On 23 August 2017 at 14:27, Sea aj <saj3...@gmail.com> wrote: > >> Hi, >> >> I am trying to feed a huge dataframe to a ml algorithm in Spark but it >> crashes due to the shortage of memory. >> >> Is there a way to train the model on a subset of the data in multiple >> steps? >> >> Thanks >> >> >> >> <https://mailtrack.io/> Sent with Mailtrack >> <https://mailtrack.io/install?source=signature&lang=en&referral=saj3...@gmail.com&idSignature=22> >> > > > > -- > > Mehmet Süzen, MSc, PhD > <su...@acm.org> > > | PRIVILEGED AND CONFIDENTIAL COMMUNICATION This e-mail transmission, and > any documents, files or previous e-mail messages attached to it, may > contain confidential information that is legally privileged. If you are not > the intended recipient or a person responsible for delivering it to the > intended recipient, you are hereby notified that any disclosure, copying, > distribution or use of any of the information contained in or attached to > this transmission is STRICTLY PROHIBITED within the applicable law. If you > have received this transmission in error, please: (1) immediately notify me > by reply e-mail to su...@acm.org, and (2) destroy the original > transmission and its attachments without reading or saving in any manner. | >