Re: Running ALS on comparitively large RDD

Nick Pentreath Fri, 11 Mar 2016 00:44:03 -0800

Hmmm, something else is going on there. What data source are you reading
from? How much driver and executor memory have you provided to Spark?




On Fri, 11 Mar 2016 at 09:21 Deepak Gopalakrishnan <dgk...@gmail.com> wrote:

> 1. I'm using about 1 million users against few thousand products. I
> basically have around a million ratings
> 2. Spark 1.6 on Amazon EMR
>
> On Fri, Mar 11, 2016 at 12:46 PM, Nick Pentreath <nick.pentre...@gmail.com
> > wrote:
>
>> Could you provide more details about:
>> 1. Data set size (# ratings, # users and # products)
>> 2. Spark cluster set up and version
>>
>> Thanks
>>
>> On Fri, 11 Mar 2016 at 05:53 Deepak Gopalakrishnan <dgk...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>> I've been running Spark's ALS on a dataset of users and rated items. I
>>> first encode my users to integers by using an auto increment function (
>>> just like zipWithIndex), I do the same for my items. I then create an RDD
>>> of the ratings and feed it to ALS.
>>>
>>> My issue is that the ALS algorithm never completes. Attached is a
>>> screenshot of the stages window.
>>>
>>> Any help will be greatly appreciated
>>>
>>> --
>>> Regards,
>>> *Deepak Gopalakrishnan*
>>> *Mobile*:+918891509774
>>> *Skype* : deepakgk87
>>> http://myexps.blogspot.com
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>
> --
> Regards,
> *Deepak Gopalakrishnan*
> *Mobile*:+918891509774
> *Skype* : deepakgk87
> http://myexps.blogspot.com
>
>

Re: Running ALS on comparitively large RDD

Reply via email to