Actually talk about this exact thing in a blog post here 
http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/.
 Keep in mind, you're actually doing a ton of math. Even with proper caching 
and use of broadcast variables this will take a while defending on the size of 
your cluster. To get real results you may want to look into locality sensitive 
hashing to limit your search space and definitely look into spinning up 
multiple threads to process your product features in parallel to increase 
resource utilization on the cluster.



Thank you,
Ilya Ganelin



-----Original Message-----
From: afarahat [ayman.fara...@yahoo.com<mailto:ayman.fara...@yahoo.com>]
Sent: Wednesday, June 17, 2015 11:16 PM Eastern Standard Time
To: user@spark.apache.org
Subject: Matrix Multiplication and mllib.recommendation


Hello;
I am trying to get predictions after running the ALS model.
The model works fine. In the prediction/recommendation , I have about 30
,000 products and 90 Millions users.
When i try the predict all it fails.
I have been trying to formulate the problem as a Matrix multiplication where
I first get the product features, broadcast them and then do a dot product.
Its still very slow. Any reason why
here is a sample code

def doMultiply(x):
        a = []
        #multiply by
        mylen = len(pf.value)
        for i in range(mylen) :
          myprod = numpy.dot(x,pf.value[i][1])
          a.append(myprod)
        return a


myModel = MatrixFactorizationModel.load(sc, "FlurryModelPath")
#I need to select which products to broadcast but lets try all
m1 = myModel.productFeatures().sample(False, 0.001)
pf = sc.broadcast(m1.collect())
uf = myModel.userFeatures()
f1 = uf.map(lambda x : (x[0], doMultiply(x[1])))



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Matrix-Multiplication-and-mllib-recommendation-tp23384.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.

Reply via email to