Nick is right. I too have implemented this way and it works just fine. In
my case, there can be even more products. You simply broadcast blocks of
products to userFeatures.mapPartitions() and BLAS multiply in there to get
recommendations. In my case 10K products form one block. Note that you
would then have to union your recommendations. And if there lots of product
blocks, you might also want to checkpoint once every few times.

Regards
Sab

On Thu, Jun 18, 2015 at 10:43 AM, Nick Pentreath <nick.pentre...@gmail.com>
wrote:

> One issue is that you broadcast the product vectors and then do a dot
> product one-by-one with the user vector.
>
> You should try forming a matrix of the item vectors and doing the dot
> product as a matrix-vector multiply which will make things a lot faster.
>
> Another optimisation that is avalailable on 1.4 is a recommendProducts
> method that blockifies the factors to make use of level 3 BLAS (ie
> matrix-matrix multiply). I am not sure if this is available in The Python
> api yet.
>
> But you can do a version yourself by using mapPartitions over user
> factors, blocking the factors into sub-matrices and doing matrix multiply
> with item factor matrix to get scores on a block-by-block basis.
>
> Also as Ilya says more parallelism can help. I don't think it's so
> necessary to do LSH with 30,000 items.
>
> —
> Sent from Mailbox <https://www.dropbox.com/mailbox>
>
>
> On Thu, Jun 18, 2015 at 6:01 AM, Ganelin, Ilya <
> ilya.gane...@capitalone.com> wrote:
>
>> Actually talk about this exact thing in a blog post here
>> http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/.
>> Keep in mind, you're actually doing a ton of math. Even with proper caching
>> and use of broadcast variables this will take a while defending on the size
>> of your cluster. To get real results you may want to look into locality
>> sensitive hashing to limit your search space and definitely look into
>> spinning up multiple threads to process your product features in parallel
>> to increase resource utilization on the cluster.
>>
>>
>>
>> Thank you,
>> Ilya Ganelin
>>
>>
>>
>> -----Original Message-----
>> *From: *afarahat [ayman.fara...@yahoo.com]
>> *Sent: *Wednesday, June 17, 2015 11:16 PM Eastern Standard Time
>> *To: *user@spark.apache.org
>> *Subject: *Matrix Multiplication and mllib.recommendation
>>
>> Hello;
>> I am trying to get predictions after running the ALS model.
>> The model works fine. In the prediction/recommendation , I have about 30
>> ,000 products and 90 Millions users.
>> When i try the predict all it fails.
>> I have been trying to formulate the problem as a Matrix multiplication
>> where
>> I first get the product features, broadcast them and then do a dot
>> product.
>> Its still very slow. Any reason why
>> here is a sample code
>>
>> def doMultiply(x):
>>         a = []
>>         #multiply by
>>         mylen = len(pf.value)
>>         for i in range(mylen) :
>>           myprod = numpy.dot(x,pf.value[i][1])
>>           a.append(myprod)
>>         return a
>>
>>
>> myModel = MatrixFactorizationModel.load(sc, "FlurryModelPath")
>> #I need to select which products to broadcast but lets try all
>> m1 = myModel.productFeatures().sample(False, 0.001)
>> pf = sc.broadcast(m1.collect())
>> uf = myModel.userFeatures()
>> f1 = uf.map(lambda x : (x[0], doMultiply(x[1])))
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Matrix-Multiplication-and-mllib-recommendation-tp23384.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>> ------------------------------
>>
>> The information contained in this e-mail is confidential and/or
>> proprietary to Capital One and/or its affiliates and may only be used
>> solely in performance of work or services for Capital One. The information
>> transmitted herewith is intended only for use by the individual or entity
>> to which it is addressed. If the reader of this message is not the intended
>> recipient, you are hereby notified that any review, retransmission,
>> dissemination, distribution, copying or other use of, or taking of any
>> action in reliance upon this information is strictly prohibited. If you
>> have received this communication in error, please contact the sender and
>> delete the material from your computer.
>>
>
>


-- 

Architect - Big Data
Ph: +91 99805 99458

Manthan Systems | *Company of the year - Analytics (2014 Frost and Sullivan
India ICT)*
+++

Reply via email to