Actually talk about this exact thing in a blog post here http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/. Keep in mind, you're actually doing a ton of math. Even with proper caching and use of broadcast variables this will take a while defending on the size of your cluster. To get real results you may want to look into locality sensitive hashing to limit your search space and definitely look into spinning up multiple threads to process your product features in parallel to increase resource utilization on the cluster.
Thank you, Ilya Ganelin -----Original Message----- From: afarahat [ayman.fara...@yahoo.com<mailto:ayman.fara...@yahoo.com>] Sent: Wednesday, June 17, 2015 11:16 PM Eastern Standard Time To: user@spark.apache.org Subject: Matrix Multiplication and mllib.recommendation Hello; I am trying to get predictions after running the ALS model. The model works fine. In the prediction/recommendation , I have about 30 ,000 products and 90 Millions users. When i try the predict all it fails. I have been trying to formulate the problem as a Matrix multiplication where I first get the product features, broadcast them and then do a dot product. Its still very slow. Any reason why here is a sample code def doMultiply(x): a = [] #multiply by mylen = len(pf.value) for i in range(mylen) : myprod = numpy.dot(x,pf.value[i][1]) a.append(myprod) return a myModel = MatrixFactorizationModel.load(sc, "FlurryModelPath") #I need to select which products to broadcast but lets try all m1 = myModel.productFeatures().sample(False, 0.001) pf = sc.broadcast(m1.collect()) uf = myModel.userFeatures() f1 = uf.map(lambda x : (x[0], doMultiply(x[1]))) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Matrix-Multiplication-and-mllib-recommendation-tp23384.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org ________________________________________________________ The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.