Re: What is the most efficient and scalable way to get all the recommendation results from ALS model ?

2016-03-20 Thread Hiroyuki Yamada
th 8GB RAM each. I only use small-sized data set so far, like about 5 users and 5000 products with only about 10 ratings. Thanks. On Sat, Mar 19, 2016 at 7:58 PM, Hiroyuki Yamada wrote: > Hi, > > I'm testing Collaborative Filtering with Milib. > Making a model by ALS.tra

What is the most efficient and scalable way to get all the recommendation results from ALS model ?

2016-03-19 Thread Hiroyuki Yamada
Hi, I'm testing Collaborative Filtering with Milib. Making a model by ALS.trainImplicit (or train) seems scalable as far as I have tested, but I'm wondering how I can get all the recommendation results efficiently. The predictAll method can get all the results, but it needs the whole user-product

spark-submit with cluster deploy mode fails with ClassNotFoundException (jars are not passed around properley?)

2016-03-11 Thread Hiroyuki Yamada
Hi, I am trying to work with spark-submit with cluster deploy mode in single node, but I keep getting ClassNotFoundException as shown below. (in this case, snakeyaml.jar is not found from the spark cluster) === 16/03/12 14:19:12 INFO Remoting: Starting remoting 16/03/12 14:19:12 INFO Remoting: R

Re: which is a more appropriate form of ratings ?

2016-02-25 Thread Hiroyuki Yamada
3:26 Sabarish Sasidharan > wrote: > >> I believe the ALS algo expects the ratings to be aggregated (A). I don't >> see why you have to use decimals for rating. >> >> Regards >> Sab >> >> On Thu, Feb 25, 2016 at 4:50 PM, Hiroyuki Yamada >

which is a more appropriate form of ratings ?

2016-02-25 Thread Hiroyuki Yamada
Hello. I just started working on CF in MLlib. I am using trainImplicit because I only have implicit ratings like page views. I am wondering which is a more appropriate form of ratings. Let's assume that view count is regarded as a rating and user 1 sees page 1 3 times and sees page 2 twice and so

Re: What is the point of alpha value in Collaborative Filtering in MLlib ?

2016-02-25 Thread Hiroyuki Yamada
to see what gives the > best result. > > I think that generally sparser input needs higher alpha, and maybe > someone tells me that really alpha should be a function of the > sparsity, but I've never seen that done. > > > > On Thu, Feb 25, 2016 at 6:33 AM, Hiroyuki Yam

Re: What is the point of alpha value in Collaborative Filtering in MLlib ?

2016-02-24 Thread Hiroyuki Yamada
Hi, I've been doing some POC for CF in MLlib. In my environment, ratings are all implicit so that I try to use it with trainImplicit method (in python). The trainImplicit method takes alpha as one of the arguments to specify a confidence for the ratings as described in < http://spark.apache.org/d