Re: Matrix Multiplication and mllib.recommendation

Ayman Farahat Thu, 18 Jun 2015 08:21:03 -0700

Thanks Sabarish and Nick
Would you happen to have some code snippets that you can share. 
Best
Ayman
On Jun 17, 2015, at 10:35 PM, Sabarish Sasidharan 
<[email protected]> wrote:


> Nick is right. I too have implemented this way and it works just fine. In my 
> case, there can be even more products. You simply broadcast blocks of 
> products to userFeatures.mapPartitions() and BLAS multiply in there to get 
> recommendations. In my case 10K products form one block. Note that you would 
> then have to union your recommendations. And if there lots of product blocks, 
> you might also want to checkpoint once every few times.
> 
> Regards
> Sab
> 
> On Thu, Jun 18, 2015 at 10:43 AM, Nick Pentreath <[email protected]> 
> wrote:
> One issue is that you broadcast the product vectors and then do a dot product 
> one-by-one with the user vector.
> 
> You should try forming a matrix of the item vectors and doing the dot product 
> as a matrix-vector multiply which will make things a lot faster.
> 
> Another optimisation that is avalailable on 1.4 is a recommendProducts method 
> that blockifies the factors to make use of level 3 BLAS (ie matrix-matrix 
> multiply). I am not sure if this is available in The Python api yet. 
> 
> But you can do a version yourself by using mapPartitions over user factors, 
> blocking the factors into sub-matrices and doing matrix multiply with item 
> factor matrix to get scores on a block-by-block basis.
> 
> Also as Ilya says more parallelism can help. I don't think it's so necessary 
> to do LSH with 30,000 items.
> 
> —
> Sent from Mailbox
> 
> 
> On Thu, Jun 18, 2015 at 6:01 AM, Ganelin, Ilya <[email protected]> 
> wrote:
> 
> Actually talk about this exact thing in a blog post here 
> http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/.
>  Keep in mind, you're actually doing a ton of math. Even with proper caching 
> and use of broadcast variables this will take a while defending on the size 
> of your cluster. To get real results you may want to look into locality 
> sensitive hashing to limit your search space and definitely look into 
> spinning up multiple threads to process your product features in parallel to 
> increase resource utilization on the cluster.
> 
> 
> 
> Thank you,
> Ilya Ganelin
> 
> 
> 
> -----Original Message-----
> From: afarahat [[email protected]]
> Sent: Wednesday, June 17, 2015 11:16 PM Eastern Standard Time
> To: [email protected]
> Subject: Matrix Multiplication and mllib.recommendation
> 
> Hello;
> I am trying to get predictions after running the ALS model.
> The model works fine. In the prediction/recommendation , I have about 30
> ,000 products and 90 Millions users.
> When i try the predict all it fails.
> I have been trying to formulate the problem as a Matrix multiplication where
> I first get the product features, broadcast them and then do a dot product.
> Its still very slow. Any reason why
> here is a sample code
> 
> def doMultiply(x):
>         a = []
>         #multiply by
>         mylen = len(pf.value)
>         for i in range(mylen) :
>           myprod = numpy.dot(x,pf.value[i][1])
>           a.append(myprod)
>         return a
> 
> 
> myModel = MatrixFactorizationModel.load(sc, "FlurryModelPath")
> #I need to select which products to broadcast but lets try all
> m1 = myModel.productFeatures().sample(False, 0.001)
> pf = sc.broadcast(m1.collect())
> uf = myModel.userFeatures()
> f1 = uf.map(lambda x : (x[0], doMultiply(x[1])))
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Matrix-Multiplication-and-mllib-recommendation-tp23384.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 
> 
> 
> The information contained in this e-mail is confidential and/or proprietary 
> to Capital One and/or its affiliates and may only be used solely in 
> performance of work or services for Capital One. The information transmitted 
> herewith is intended only for use by the individual or entity to which it is 
> addressed. If the reader of this message is not the intended recipient, you 
> are hereby notified that any review, retransmission, dissemination, 
> distribution, copying or other use of, or taking of any action in reliance 
> upon this information is strictly prohibited. If you have received this 
> communication in error, please contact the sender and delete the material 
> from your computer.
> 
> 
> 
> 
> -- 
> 
> Architect - Big Data
> Ph: +91 99805 99458
> 
> Manthan Systems | Company of the year - Analytics (2014 Frost and Sullivan 
> India ICT)
> +++

Re: Matrix Multiplication and mllib.recommendation

Reply via email to