Thanks Sabarish and Nick Would you happen to have some code snippets that you can share. Best Ayman On Jun 17, 2015, at 10:35 PM, Sabarish Sasidharan <sabarish.sasidha...@manthan.com> wrote:
> Nick is right. I too have implemented this way and it works just fine. In my > case, there can be even more products. You simply broadcast blocks of > products to userFeatures.mapPartitions() and BLAS multiply in there to get > recommendations. In my case 10K products form one block. Note that you would > then have to union your recommendations. And if there lots of product blocks, > you might also want to checkpoint once every few times. > > Regards > Sab > > On Thu, Jun 18, 2015 at 10:43 AM, Nick Pentreath <nick.pentre...@gmail.com> > wrote: > One issue is that you broadcast the product vectors and then do a dot product > one-by-one with the user vector. > > You should try forming a matrix of the item vectors and doing the dot product > as a matrix-vector multiply which will make things a lot faster. > > Another optimisation that is avalailable on 1.4 is a recommendProducts method > that blockifies the factors to make use of level 3 BLAS (ie matrix-matrix > multiply). I am not sure if this is available in The Python api yet. > > But you can do a version yourself by using mapPartitions over user factors, > blocking the factors into sub-matrices and doing matrix multiply with item > factor matrix to get scores on a block-by-block basis. > > Also as Ilya says more parallelism can help. I don't think it's so necessary > to do LSH with 30,000 items. > > — > Sent from Mailbox > > > On Thu, Jun 18, 2015 at 6:01 AM, Ganelin, Ilya <ilya.gane...@capitalone.com> > wrote: > > Actually talk about this exact thing in a blog post here > http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/. > Keep in mind, you're actually doing a ton of math. Even with proper caching > and use of broadcast variables this will take a while defending on the size > of your cluster. To get real results you may want to look into locality > sensitive hashing to limit your search space and definitely look into > spinning up multiple threads to process your product features in parallel to > increase resource utilization on the cluster. > > > > Thank you, > Ilya Ganelin > > > > -----Original Message----- > From: afarahat [ayman.fara...@yahoo.com] > Sent: Wednesday, June 17, 2015 11:16 PM Eastern Standard Time > To: user@spark.apache.org > Subject: Matrix Multiplication and mllib.recommendation > > Hello; > I am trying to get predictions after running the ALS model. > The model works fine. In the prediction/recommendation , I have about 30 > ,000 products and 90 Millions users. > When i try the predict all it fails. > I have been trying to formulate the problem as a Matrix multiplication where > I first get the product features, broadcast them and then do a dot product. > Its still very slow. Any reason why > here is a sample code > > def doMultiply(x): > a = [] > #multiply by > mylen = len(pf.value) > for i in range(mylen) : > myprod = numpy.dot(x,pf.value[i][1]) > a.append(myprod) > return a > > > myModel = MatrixFactorizationModel.load(sc, "FlurryModelPath") > #I need to select which products to broadcast but lets try all > m1 = myModel.productFeatures().sample(False, 0.001) > pf = sc.broadcast(m1.collect()) > uf = myModel.userFeatures() > f1 = uf.map(lambda x : (x[0], doMultiply(x[1]))) > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Matrix-Multiplication-and-mllib-recommendation-tp23384.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > > The information contained in this e-mail is confidential and/or proprietary > to Capital One and/or its affiliates and may only be used solely in > performance of work or services for Capital One. The information transmitted > herewith is intended only for use by the individual or entity to which it is > addressed. If the reader of this message is not the intended recipient, you > are hereby notified that any review, retransmission, dissemination, > distribution, copying or other use of, or taking of any action in reliance > upon this information is strictly prohibited. If you have received this > communication in error, please contact the sender and delete the material > from your computer. > > > > > -- > > Architect - Big Data > Ph: +91 99805 99458 > > Manthan Systems | Company of the year - Analytics (2014 Frost and Sullivan > India ICT) > +++