Re: Training A ML Model on a Huge Dataframe

Sea aj Wed, 23 Aug 2017 09:22:57 -0700

Thanks for the reply.

As far as I understood mini batch is not yet supported in ML libarary. As
for MLLib minibatch, I could not find any pyspark api.




<https://mailtrack.io/> Sent with Mailtrack
<https://mailtrack.io/install?source=signature&lang=en&referral=saj3...@gmail.com&idSignature=22>

On Wed, Aug 23, 2017 at 2:59 PM, Suzen, Mehmet <su...@acm.org> wrote:

> It depends on what model you would like to train but models requiring
> optimisation could use SGD with mini batches. See:
> https://spark.apache.org/docs/latest/mllib-optimization.
> html#stochastic-gradient-descent-sgd
>
> On 23 August 2017 at 14:27, Sea aj <saj3...@gmail.com> wrote:
>
>> Hi,
>>
>> I am trying to feed a huge dataframe to a ml algorithm in Spark but it
>> crashes due to the shortage of memory.
>>
>> Is there a way to train the model on a subset of the data in multiple
>> steps?
>>
>> Thanks
>>
>>
>>
>> <https://mailtrack.io/> Sent with Mailtrack
>> <https://mailtrack.io/install?source=signature&lang=en&referral=saj3...@gmail.com&idSignature=22>
>>
>
>
>
> --
>
> Mehmet Süzen, MSc, PhD
> <su...@acm.org>
>
> | PRIVILEGED AND CONFIDENTIAL COMMUNICATION This e-mail transmission, and
> any documents, files or previous e-mail messages attached to it, may
> contain confidential information that is legally privileged. If you are not
> the intended recipient or a person responsible for delivering it to the
> intended recipient, you are hereby notified that any disclosure, copying,
> distribution or use of any of the information contained in or attached to
> this transmission is STRICTLY PROHIBITED within the applicable law. If you
> have received this transmission in error, please: (1) immediately notify me
> by reply e-mail to su...@acm.org,  and (2) destroy the original
> transmission and its attachments without reading or saving in any manner. |
>

Re: Training A ML Model on a Huge Dataframe

Reply via email to